Statistical approaches used in automatic merge of scanned images.
Boiangiu, Costin Anton ; Bucur, Ion ; Spataru, Andrei Cristian 等
1. INTRODUCTION
As digital libraries worldwide tend to expand significantly in
size, and at a high rate, the need for document digitization and content
extraction has been ever more present in the last years. The emphasis is
placed on the speed and quality of the image digitization process,
however satisfactory end-results cannot be obtained without a thorough
pre-processing phase of the input (Baird, 2003), as in the case of large
newspaper page scans that do not completely fit the scanner, or
magazines that have a headline or paragraph composition spread logically
over two consecutive pages. When working with such documents, a given
number of logically connected page elements will appear on different
scanned images, making the content extraction and conversion
increasingly difficult. Hence, a solution must be found in order to
perform a connection between these images.
2. INPUT SCENARIOS
While performing image analysis on a wide array of scanned images,
two main scenarios were isolated. The first situation can be described
as a page scanned in two parts, these parts having common elements, as
exemplified in Fig. 1 and Fig. 2.
As it can be observed, the scanned documents in Fig. 1 and Fig. 2
contain common elements, and in following, this scenario will be
referred to as "overlapping" input images. This case of
overlapping images can be encountered in pages having either a
Top-Bottom or a Left-Right displacement.
The second situation is encountered when the two parts of the
document have no common elements. This situation will be referred to in
following as "non-overlapping" input images, and can be seen
in Fig. 3.
Because the most common scenario of non-overlapping input images is
found in magazine articles, these are generally left-right separated.
Also, in Fig. 3, the scanned images are skewed, an issue that will have
to be addressed in order to accomplish the image merging task.
[FIGURE 1 OMITTED]
[FIGURE 2 OMITTED]
[FIGURE 3 OMITTED]
3. IMAGE MERGING SOLUTIONS
The image merging algorithm will be composed of several steps:
first, the input images will undergo a pre-processing stage, comprising
a transformation from a given colour space into Black & White, and
an extraction of connected foreground (black) pixels, which will be
referred to in following as "Entities".
After this step, a specific routine will be run, according to the
input scenario (overlapping or non-overlapping). The steps are presented
in detail in the following sub-sections.
3.1 Pre-processing stage:
The actual Black & White conversion algorithms are beyond the
scope of this paper, so further attention will not be given to this
subject. However, a simple threshold method can be easily implemented in
order to obtain a binary image. This is also the case of Entities
extraction, where a run-length algorithm can accomplish the task. For
further reading, algorithms for Black & White image conversion can
be found in (Di Zenzo et al., 1996), while run-length connected pixels
extraction algorithms are available in (Bovik, 2000).
3.2 Overlapping input images:
In the case of overlapping input images, a "comparison
band" is chosen on each image, at a safety ratio of half of the
scanned document. For example, for the Top-Bottom case, the lower half
of the Top part and the upper half of the Bottom part are chosen. Inside
these bands, random entities are selected, and a geometrical comparison
is made between entities in the upper and the lower comparison bands,
until a positive match is found. When a sufficient number of entity
matches is achieved, the algorithm considers segments between the
matching entities in both bands, and searches for the best
rotation-translation function that can superimpose corresponding
segments from the bands. This resulting function represents the
transformation that has to be made in order to correctly merge
overlapping images, and has the output parameters: x-axis translation
([t.sub.x]), y-axis translation ([t.sub.y]) and rotation angle (a).
The criteria by which the entities are compared are of geometrical
nature: area of the bounding rectangle, bounding rectangle fill ratio,
number of black pixels.
The values returned by the function described above are placed in
three histograms ([t.sub.x], [t.sub.y], [alpha]). These histograms are
then filtered, using a triangle window filter of size 7, where the peak
is 100% of the initial value, decreasing by 25% for each neighbouring
value. The algorithm then tries to match the two images according to all
plausible height differences of peaks on the histograms, choosing the
transformation that obtains the smallest difference in height of the
peaks. Once these measurements are done on all three types of
histograms, the rotation-translation function is computed.
[FIGURE 4 OMITTED]
3.3 Non-overlapping input images:
The first step of the algorithm for merging non-overlapping images
is the detection of text lines (Zheng et al., 2003) from an input array
of Entities, and comparing the elements of the two sections at the level
of text lines.
As the input images may be skewed when scanned, a deskewing
algorithm (Yuan & Tan, 2003) has to be applied independently on the
input images, as to have the text lines horizontal in the output. This
will lead to a better "fit" of the two page portions after
applying the third and final step.
The third step is obtaining a "Text Characteristic"
measurement for each detected text line, and, based on this comparison,
the algorithm finds the translation function (only on the y axis) that
best fits the lines in the two input images.
The "Text Characteristic" measurement is composed of a
number of routines that detect parameters related to the geometry of the
text. The main stages of the "Text Characteristics" extraction
are:
* Filtering of entities: in order to take into consideration only
relevant entities (excluding noise, punctuation marks, etc.), the input
entity array is filtered. These filters join broken parts of a letter,
eliminate separator lines and punctuation marks.
* Font Size measurement: the heights of the bounding boxes of the
entities are placed in a histogram upon which a triangle filter is
applied. The histogram will present three distinct peaks, the first
(lowest heights) representing noise or various punctuation marks that
have not been eliminated by the filters, the second being the small caps of letters, and the third--the high caps of letters. The last two peaks
will be considered, representing the font size of the text.
* Boldness measurement: the boldness, or pen width, of the text is
measured by finding, for each pixel inside a letter, the longest
vertical and horizontal (black) segments it belongs to. The smaller of
these two values is placed in a histogram, and the highest peak on this
histogram represents the boldness.
* Italics measurement: in order to check if a letter is italic, the
width of the bounding rectangle of the letter is compared to the width
of the rotated letter's bounding rectangle. The rotation is done by
-16 and the +16 degrees (considered a common value for the italic
characters' slant angle). The decision is taken by applying the
rule: if the width of the bounding rectangle increases after the
rotation with +16 degrees but decreases when rotating by -16 degrees,
that letter is italic.
By applying all the above measurements, "Text
Characteristics" values are assigned to each text line in both
images. The images are compared line by line in the following way: costs
are computed for the differences in characteristics between a line in
the first image and a line in the second one, and a match is found
between lines with the lowest cost between them. When lines are matched
from the characteristics point of view, the algorithm finds the
y-translation value necessary to match these lines geometrically.
4. CONCLUSIONS
The approach presented in this paper is based on a geometrical
statistic, and, due to its' nature, is of high confidence. In the
case of overlapping input images, geometrical measurements are obtained
and used in order to search for corresponding elements in the separation
bands of the two images. When the input images are non-overlapping, the
algorithm searches for matching characteristics of the text, relying on
higher-order geometrical computations. By using the methods presented in
this paper, the task of image merging can be accomplished in most
scenarios, as tests have shown. In both cases, a singular output image
is obtained, serving as a better starting point for content extraction
procedures. As a consequence, the digital content resulted from these
images is closer to the physical documents, thus reducing or eliminating
altogether the need for manual corrections.
5. REFERENCES
Baird, H.S. (2003). Digital Libraries and Document Image Analysis,
Proceedings of the Seventh International Conference on Document Analysis
and Recognition, Vol. 1, pp. 2-15, ISBN 0-7695-1960-1, Scotland, August
2003, Edinburgh
Bovik, A. (2000). Handbook of Video and Image Processing, Academic
Press, ISBN-10: 0-12-119792-1
Di Zenzo, S; Cinque, L. & Levialdi, S. (1996). Run-Based
Algorithms for Binary Image Analysis and Processing, IEEE Transactions
on Pattern Analysis and Machine Intelligence, Vol. 18, No. 1, January
1996, pp. 83-89, ISSN: 0162-8828.
Yuan, B. & Tan, C.L. (2003). Skewscope: The Textual Document
Skew Detector, Proceedings of the Seventh International Conference on
Document Analysis and Recognition, Vol. 1, pp. 49-53, ISBN
0-7695-1960-1, Scotland, August 2003, Edinburgh
Zheng, Y; Li, H. & Doermann, D (2003). A Model-based Line
Detection Algorithm in Documents, Proceedings of the Seventh
International Conference on Document Analysis and Recognition, Vol. 1,
pp. 44-48, ISBN 0-7695-1960-1, Scotland, August 2003, Edinburgh