文章基本信息

标题：Statistical approaches used in automatic merge of scanned images.
作者：Boiangiu, Costin Anton ; Bucur, Ion ; Spataru, Andrei Cristian 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2008
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：As digital libraries worldwide tend to expand significantly in size, and at a high rate, the need for document digitization and content extraction has been ever more present in the last years. The emphasis is placed on the speed and quality of the image digitization process, however satisfactory end-results cannot be obtained without a thorough pre-processing phase of the input (Baird, 2003), as in the case of large newspaper page scans that do not completely fit the scanner, or magazines that have a headline or paragraph composition spread logically over two consecutive pages. When working with such documents, a given number of logically connected page elements will appear on different scanned images, making the content extraction and conversion increasingly difficult. Hence, a solution must be found in order to perform a connection between these images.

Statistical approaches used in automatic merge of scanned images.

Boiangiu, Costin Anton ; Bucur, Ion ; Spataru, Andrei Cristian 等

1. INTRODUCTION

As digital libraries worldwide tend to expand significantly in size, and at a high rate, the need for document digitization and content extraction has been ever more present in the last years. The emphasis is placed on the speed and quality of the image digitization process, however satisfactory end-results cannot be obtained without a thorough pre-processing phase of the input (Baird, 2003), as in the case of large newspaper page scans that do not completely fit the scanner, or magazines that have a headline or paragraph composition spread logically over two consecutive pages. When working with such documents, a given number of logically connected page elements will appear on different scanned images, making the content extraction and conversion increasingly difficult. Hence, a solution must be found in order to perform a connection between these images.

2. INPUT SCENARIOS

While performing image analysis on a wide array of scanned images, two main scenarios were isolated. The first situation can be described as a page scanned in two parts, these parts having common elements, as exemplified in Fig. 1 and Fig. 2.

As it can be observed, the scanned documents in Fig. 1 and Fig. 2 contain common elements, and in following, this scenario will be referred to as "overlapping" input images. This case of overlapping images can be encountered in pages having either a Top-Bottom or a Left-Right displacement.

The second situation is encountered when the two parts of the document have no common elements. This situation will be referred to in following as "non-overlapping" input images, and can be seen in Fig. 3.

Because the most common scenario of non-overlapping input images is found in magazine articles, these are generally left-right separated. Also, in Fig. 3, the scanned images are skewed, an issue that will have to be addressed in order to accomplish the image merging task.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

3. IMAGE MERGING SOLUTIONS

The image merging algorithm will be composed of several steps: first, the input images will undergo a pre-processing stage, comprising a transformation from a given colour space into Black & White, and an extraction of connected foreground (black) pixels, which will be referred to in following as "Entities".

After this step, a specific routine will be run, according to the input scenario (overlapping or non-overlapping). The steps are presented in detail in the following sub-sections.

3.1 Pre-processing stage:

The actual Black & White conversion algorithms are beyond the scope of this paper, so further attention will not be given to this subject. However, a simple threshold method can be easily implemented in order to obtain a binary image. This is also the case of Entities extraction, where a run-length algorithm can accomplish the task. For further reading, algorithms for Black & White image conversion can be found in (Di Zenzo et al., 1996), while run-length connected pixels extraction algorithms are available in (Bovik, 2000).

3.2 Overlapping input images:

In the case of overlapping input images, a "comparison band" is chosen on each image, at a safety ratio of half of the scanned document. For example, for the Top-Bottom case, the lower half of the Top part and the upper half of the Bottom part are chosen. Inside these bands, random entities are selected, and a geometrical comparison is made between entities in the upper and the lower comparison bands, until a positive match is found. When a sufficient number of entity matches is achieved, the algorithm considers segments between the matching entities in both bands, and searches for the best rotation-translation function that can superimpose corresponding segments from the bands. This resulting function represents the transformation that has to be made in order to correctly merge overlapping images, and has the output parameters: x-axis translation ([t.sub.x]), y-axis translation ([t.sub.y]) and rotation angle (a).

The criteria by which the entities are compared are of geometrical nature: area of the bounding rectangle, bounding rectangle fill ratio, number of black pixels.

The values returned by the function described above are placed in three histograms ([t.sub.x], [t.sub.y], [alpha]). These histograms are then filtered, using a triangle window filter of size 7, where the peak is 100% of the initial value, decreasing by 25% for each neighbouring value. The algorithm then tries to match the two images according to all plausible height differences of peaks on the histograms, choosing the transformation that obtains the smallest difference in height of the peaks. Once these measurements are done on all three types of histograms, the rotation-translation function is computed.

[FIGURE 4 OMITTED]

3.3 Non-overlapping input images:

The first step of the algorithm for merging non-overlapping images is the detection of text lines (Zheng et al., 2003) from an input array of Entities, and comparing the elements of the two sections at the level of text lines.

As the input images may be skewed when scanned, a deskewing algorithm (Yuan & Tan, 2003) has to be applied independently on the input images, as to have the text lines horizontal in the output. This will lead to a better "fit" of the two page portions after applying the third and final step.

The third step is obtaining a "Text Characteristic" measurement for each detected text line, and, based on this comparison, the algorithm finds the translation function (only on the y axis) that best fits the lines in the two input images.

The "Text Characteristic" measurement is composed of a number of routines that detect parameters related to the geometry of the text. The main stages of the "Text Characteristics" extraction are:

* Filtering of entities: in order to take into consideration only relevant entities (excluding noise, punctuation marks, etc.), the input entity array is filtered. These filters join broken parts of a letter, eliminate separator lines and punctuation marks.

* Font Size measurement: the heights of the bounding boxes of the entities are placed in a histogram upon which a triangle filter is applied. The histogram will present three distinct peaks, the first (lowest heights) representing noise or various punctuation marks that have not been eliminated by the filters, the second being the small caps of letters, and the third--the high caps of letters. The last two peaks will be considered, representing the font size of the text.

* Boldness measurement: the boldness, or pen width, of the text is measured by finding, for each pixel inside a letter, the longest vertical and horizontal (black) segments it belongs to. The smaller of these two values is placed in a histogram, and the highest peak on this histogram represents the boldness.

* Italics measurement: in order to check if a letter is italic, the width of the bounding rectangle of the letter is compared to the width of the rotated letter's bounding rectangle. The rotation is done by -16 and the +16 degrees (considered a common value for the italic characters' slant angle). The decision is taken by applying the rule: if the width of the bounding rectangle increases after the rotation with +16 degrees but decreases when rotating by -16 degrees, that letter is italic.

By applying all the above measurements, "Text Characteristics" values are assigned to each text line in both images. The images are compared line by line in the following way: costs are computed for the differences in characteristics between a line in the first image and a line in the second one, and a match is found between lines with the lowest cost between them. When lines are matched from the characteristics point of view, the algorithm finds the y-translation value necessary to match these lines geometrically.

4. CONCLUSIONS

The approach presented in this paper is based on a geometrical statistic, and, due to its' nature, is of high confidence. In the case of overlapping input images, geometrical measurements are obtained and used in order to search for corresponding elements in the separation bands of the two images. When the input images are non-overlapping, the algorithm searches for matching characteristics of the text, relying on higher-order geometrical computations. By using the methods presented in this paper, the task of image merging can be accomplished in most scenarios, as tests have shown. In both cases, a singular output image is obtained, serving as a better starting point for content extraction procedures. As a consequence, the digital content resulted from these images is closer to the physical documents, thus reducing or eliminating altogether the need for manual corrections.

5. REFERENCES

Baird, H.S. (2003). Digital Libraries and Document Image Analysis, Proceedings of the Seventh International Conference on Document Analysis and Recognition, Vol. 1, pp. 2-15, ISBN 0-7695-1960-1, Scotland, August 2003, Edinburgh

Bovik, A. (2000). Handbook of Video and Image Processing, Academic Press, ISBN-10: 0-12-119792-1

Di Zenzo, S; Cinque, L. & Levialdi, S. (1996). Run-Based Algorithms for Binary Image Analysis and Processing, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 18, No. 1, January 1996, pp. 83-89, ISSN: 0162-8828.

Yuan, B. & Tan, C.L. (2003). Skewscope: The Textual Document Skew Detector, Proceedings of the Seventh International Conference on Document Analysis and Recognition, Vol. 1, pp. 49-53, ISBN 0-7695-1960-1, Scotland, August 2003, Edinburgh

Zheng, Y; Li, H. & Doermann, D (2003). A Model-based Line Detection Algorithm in Documents, Proceedings of the Seventh International Conference on Document Analysis and Recognition, Vol. 1, pp. 44-48, ISBN 0-7695-1960-1, Scotland, August 2003, Edinburgh