June / October 2014 - Record digitization for beginners

Sascha Foerster

This is an English translation of "Aktendigitalisierung für Anfänger.
Oder: Die kurze Geschichte einer rasanten technischen Entwicklung"

http://zakunibonn.hypotheses.org/1119

How long will the digitization of the “German postwar children” studies take?

Due to fast technical developments, the answer to this question is constantly changing. On June 5^th, together with a student assistant, I tried to answer this question on the basis of today’s available technology. In this process, I’m also reviewing the existing possibilities so far.

Just a few days ago, I found that the University Library of Bonn owns a very pretty and new book scanner, with allows everyone to make scans to USB free of charge. Only a few years ago, in the same library, a digital scan cost the same as a paper copy! Fortunately these times are over. Hence we scanned different originals, looked how much time we needed, and optimized workflows and adjustments.

Scanner in the year one (2009)

But let us take one step back to the year 2009 when I first encountered the researcher group “German postwar children – revisited.” At that time I was a student assistant. I had a scanner in my office which was connected by parallel interface to the working computer: a Kodak i80. This flatbed scanner cost – at that time – a four-digit amount. It was not state-of-the-art and was terribly slow. Each scan took at least one minute, from insertion of the sheet up to the completed scan on the computer. Usually, it took much more time. To digitize an entire archive in this way would take a person’s lifetime.

Smartphone Scanner (2012)

With the easy availability of smartphones with good cameras in 2012, it made no sense anymore to wait more than a minute for a scan when I could have a digital image of the file immediately. In 2012, I discovered a Crowdfunding project for a box which allowed scanning with the iPhone without holding the camera shakily in the hand and with partially built-in illumination. For testing purposes, I built myself a small smartphone scan machine. Today such boxes, ready to use, can be purchased for less money.

The problem with smartphone scanning is the lack of post-processing. Images are often distorted, the color and light conditions are not rendered correctly, or the images are unsharp when looking on large monitors. Newer apps cut the image with algorithms to a perfect scan, which optimize color and light and this is no longer such a problem.

In the daily routine, I no longer use a scanner, I just the Scanbot app from the Bonner developers of “doo”. What is this? The paper margins are recognized automatically, and the image is automatically released after sharpening and document recognition. After, the image will end up in my Dopbox. Thus I only have to adjust my camera and then examine the result on my computer.


Automatic document recognition	Post-processing and Upload

Of course, the quality of these images is very different and depends, despite all algorithms, very much on the external light relation. Sometimes the image will be distorted because the camera was not adjusted exactly straight over the sheet of paper. Nevertheless, with the mobile camera and an optimized app, in my tests I could make an acceptable scan every 30 seconds.

Scanning with the Zeutschel zeta (2014)

One year ago, I saw the scanner from the company Zeutschel for the first time on Twitter. I was surprised that a book scanner company is working with social media. To be honest, the Zeutschel zeta has been on the market since 2011. But only one week before writing this blog, I discovered it in a library and wanted to test it. Unfortunately such a scanner cannot be self-made and is also out of a student’s budget. So it is so much better that the ULB Bonn has purchased one.

The design of the zeta brings Apple to mind – it is quite nice for a book scanner which is usually designed more functionally. The only, but really disturbing weak point is the touchscreen which is unfortunately not from Apple. That the system software is Windows 7, we realized as soon as the system crashed system. I don’t understand this. Those who are used to working with multi-touch gestures and efficient monitor keyboards will be disappointed. Letters must be pressed various times, the keyboard has no umlauts, and the adaptation of the scanning area can only be used after several trials, despite multi-touch capability of the monitor. This is really disappointing for such an upscale device.

By klicking here you can watch a short video showing the function of the zeta.

However, when you work with the zeta, you realize that some algorithms, working in the background, ease the work with the book scanner unnoticed. Fingers which keep hold of the sheet will be removed automatically from the scan. And it seems to me that the scanner calculates the possibility where the next book page could be, in preparation for the next scans.

The scans can be saved in jpg, .tiff or .pdf format. They have a resolution of 300 ppi (with certain adjustments even up to 600 ppi). When saving as .pdf file, you have the possibility to save several scans in one file (multi-page). Also the layout can be changed by adjustments (only left side, only right side, automatic splitting or no splitting).

The scanning process itself is really very convenient and as fast as lightning. With your fingers you hold the sheet and as soon as the green LED flashes up, touch the scan button. The scan buttons can be easily reached with the fingers. Then check the image. As soon as the next scan will be started, the former image will be saved at the USB. Only on the last scan you have to watch out that you do not forget saving before remove the USB.

After we have tried all adjustments and we developed an optimum workflow (one sorted the file sheets, put them on the scanner and fixed them; the other scanned, checked and released the scan), we measured the time taken to scan an example file. In one hour we scanned 184 file sheets. The concentration declines a little, but due to the practice, the whole procedure accelerated.

Scan samples

Hollerith key for the
teachers report

Manual of 1953 for execution
and evaluation of the
psychological analysis

File for the constitution data


File for the psychological analysis results	Drawing Test

Result

In order to evaluate the time duration of the complete digitization of all files, we have to know how many file sheets exist. In average one record contains approx. 100 file sheets. In total we have found 4095 records, what means, up to 409.500 file sheets have to be scanned.

Appreciated Time for Digitization of The
Project Files “German Postwar Children”

Scanner Smartphone Bookscanner
(60 Scans per hour) (120 Scans per hour) (184 Scans per hour)

The above graph shows the estimation of working hours for the digitization of 409.500 file sheets with a classical scanner, smartphone and book scanner

By my count, the digitization with actual technology of the "German postwar children-studies,” 4.451 working hours will be necessary. This is calculated from 187 file sheets, scanned by two persons per hour, at a number of 409.500 file sheets to scan.

These numbers are only rough estimations and can be improved within short by practice, better workflow and better technology. Not included is the time for key-wording and enrichment with metadata, which will possibly double the time.

We would be pleased to receive suggestions, experiences, hints to other articles, best practices as well as comments from which we can learn more about digitization. The introduction on which I will take a more detailed look is the “Check list digitization”, which you can find under http://dx.doi.org/10.12752/2.0.001.1

Disclaimer: There is no conflict of interests, as this article was written out of private interest.

*****************************

Update 3. September 2014

Zeutschel has contacted me after release of this blog, and answered some of my questions. I report this email in short:

The user interface of zeta

Since May 2014, due to the use of a new generation of touch screens, we can meet the actual tablets and iPads in regards to handling at eye level. The actual zeta comes up with a better, smoother touch.

Now, for scaling up, scaling down or shift the scan result, you can work at the touch screen of the zeta with the two-finger-zoom – like you are used from your mobiles. This function can be upgraded if you already use a zeta.

The answers to your question about “Functions in the document processing” will be included in the revised context help for these pages.

Explanations for document processing you will actually find also in the manual, but we assume you will not have a manual available with each scan. Therefore following some help for the moment:

DocumentProcessing Auto	The kind of original will automatically be identified.
DocumentProcessing Buch	A book as original is expected, book curve correction will be made, the page will be cut out, and if requested, separated
DocumentProcessing Magazin	A magazine/newspaper as original is expected, book curve correction will be made, and, if requested, separated.
DocumentProcessing AUS	No specific image processing will be made, instead the document is shown as it lies on the scanner

License: CC-BY 4.0
The blog and its contents stand under Creative Commons Namensnennung 4.0 Lizenz, if nothing different is noted.
ISSN 2198-4603

« back