Local thresholding image binarization using variable-window standard deviation response.
Boiangiu, Costin-Anton ; Olteanu, Alexandra ; Stefanescu, Alexandru Victor 等
1. INTRODUCTION
In content conversion systems image binarization aims to
differentiate pixels into two regions, one represented by information
and another one by background. The result of binarization is affected by
degradations of documents in poor preservation state, as well as by
inappropriate lightening conditions and handling during the acquisition
stage. Since there is no solution that overcomes all the aforementioned
problems, this remains an open and active area for research.
Because the document image binarization represents the processing
step at the base of a conversion system, it demands high quality for the
output in view of the fact that it influences all subsequent processing
steps. The algorithm presented in this paper aims to provide a reliable
binarization mechanism.
2. RELATED WORK
Binarization techniques are mostly related with thresholding
methods, which are classified according to the way they perform as
either global (Otsu, 1979) or local (Niblack, 1986), (Gatos et. all,
2006). The major problem with binarization, especially with global
thresholding, is that it does not consider any relation between pixels,
besides their intensity. In addition, a single threshold is computed and
used to classify the image pixels into foreground and background. To
overcome these problems, local adaptive methods have been proposed
(Boiangiu et. all, 2009). Moreover, adaptive methods use local area
information for determining a threshold value for each pixel. However,
minor modifications to the threshold value can bring significant changes
in the output in some cases and only minor differences in others, due to
the density of objects in the local region.
One of the most popular local threshold binarization methods is
proposed by Niblack (Niblack, 1986). This method involves calculating
for each image pixel the mean and the standard deviation of the gray
level value of the neighboring pixels that are found in a window of a
predefined size. This size influences the quality of the output and it
is recommended to be small enough to conserve local details and large
enough to suppress noise. The formula for determining the threshold is:
T = mean + k x stdev (1)
where mean is the average value of the pixels in the local area,
stdev the standard deviation of the same pixels, and k is a constant,
preselected coefficient. Shortcomings of this method comprise the
persistence of background noise and its significant sensitivity to the
window size. To reduce the amount of background noise in homogeneous
regions larger than the window size an improved version of
Niblack's method was proposed by Sauvola (Kasar et. all, 2007).
This method use a hypothesis that considers that the gray values of
pixels for text, respectively for background, is closer to 0,
respectively to 255. The formula for determining the threshold is:
T = mean x [1 + k x (stdev/R - 1)] (2)
where parameters R, the dynamic range of the standard deviation,
and k are fixed to 128 and 0.5, respectively. Sauvola's and
Niblack's algorithms are very rigid in their approach because they
compute local statistical functions by operating on an image dependent,
fixed window size. The current approach tries to take into account the
advantages of both local statistics and variable window size for local
threshold computation.
3. ALGORITHM DESCRIPTION
The problem with binarization is that there is no fully-grown
algorithm able to deal effectively with all of the problems that can be
identified in the plethora of scanned image documents. In order to
obtain this goal we propose a modification to Niblack's method,
which is based on sliding a square window over the document image and
calculating the mean and standard deviation of the grey pixel values in
each window. However, instead of using predefined values, the window
size is computed dynamically.
The window is gradually grown until the value of the standard
deviation of gray pixel levels within the window multiplied by the
logarithmic function applied to the window size reaches the first local
maximum (see Figure 1). This is a compromise between the quality of the
output and the speed of conversion.
[FIGURE 1 OMITTED]
The window size determined as above is then used for computing the
mean of gray level values of pixels within the current window. The
threshold T for determining if the pixel is converted to black or white
is calculated using:
T = mean x (1 + t) (3)
where t is a preselected coefficient. Because the mean value is
dependent on the window size, and consequently to the standard deviation
threshold, the direct contribution of the standard deviation to the
threshold value has been removed.
Furthermore, the preselected coefficient t plays a major role. This
coefficient should be chosen negative when the window size imposed by
the standard deviation threshold is small, in order to suppress noise,
and positive when the computed window size is larger, for preserving
local details.
The main algorithm steps are:
1. For each image pixel, compute the sum and the squared sum of the
gray levels of pixels contained in a rectangular area defined by the
current pixel and the pixel in the top-left corner;
2. For each image pixel do:
a) grow the window size until the standard deviation of the current
window multiplied by the logarithm of the window size is smaller than
the value computed for the previous window;
b) determine the mean of the current window and then compute the
local threshold using (3);
c) set the pixel to 0 (black) if lower than the obtained threshold,
or to 255 (white), otherwise.
This approach assumes that a neighborhood defined by a rectangular
window of size corresponding to the first local maximum in the series of
standard deviation values mentioned above offers adequate local
statistics so that the most appropriate threshold value for the
window's center pixel can be determined based on the mean gray
level value in the neighborhood window. In this way, the most
appropriate window size is also selected in order to preserve local
proprieties and to suppress noise at the same time.
4. VISUAL EXAMPLES
The tests have been performed on scanned documents consisting of
various old library documents and newspapers. The majority of document
scans used for testing presented problems such a slow and uneven
contrast, noisy aspect, inconsistent lightening across the page, etc.
The results obtained using the proposed algorithm were compared with the
results obtained by employing Niblack's original method (15x15
window size and k = -0.2), Sauvola's method (He et. all, 2005)
(15x15 window size and k = 0.5) and a global threshold method proposed
by Otsu. For the proposed method the parameter t was set to 0.
[FIGURE 2 OMITTED]
Two test images are used to depict the obtained results. The first
image, highlighted in Figure 2, is representative for historical books
containing Fraktur style fonts, with heavy background noise, high
density text and presenting difficulties due to uneven exposure during
document acquisition. In comparison to the other methods used for
testing, the proposed approach managed to correctly classify the object
pixels without incorrectly classifying noise pixels, offering a result
very similar to the one corresponding to Sauvola's method and
succeeding to obtain the best results. In addition, in the second test,
presented in Figure 3, the proposed method produced by far the best
results, even in comparison to Sauvola's. This case is
representative for historical newspapers having large areas of uneven
contrast and lighting, Antiqua-style fonts, medium to low text density
and blurry textual regions.
[FIGURE 3 OMITTED]
5. CONCLUSIONS AND FUTURE WORK
The proposed method is very useful in content conversion systems,
mainly due to its resistance to noise and imperfections, allowing
subsequent stages to benefit from a clean objects/background
classification of image data. The comparison between the results
obtained using well known image binarization algorithms, both local and
global, and the proposed algorithm has shown that the latter succeeded
to produce excellent output. Our future work will focus on improving the
results in the case of large compact noise zones.
6. ACKNOWLEDGMENT
The research presented in this paper is supported by the national
project "Excelenta in cercetare prin programe postdoctorale in
domenii prioritare ale societatii bazate pe cunoastere (EXCEL)",
Project POSDRU/89/1.5/S/62557.
7. REFERENCES
C.-A. Boiangiu, A. I. Dvornic, and D. C. Cananau, Binarization for
digitization projects using hybrid foreground-reconstruction.
Proceedings of the IEEE 5th International Conference on Intelligent
Computer Communication and Processing, Cluj-Napoca, Romania, August
27-29, 2009, pp. 141-144
B. Gatos, I. Pratikakis, and S. J. Perantonis. Adaptive degraded
document image binarization. New York, NY, USA: Elsevier Science Inc.,
2006, vol. 39, no. 3, pp. 317-327
J. He, Q. D. M. Do, A. C. Downton, and J. H. Kim. A comparison of
binarization methods for historical archive documents. Proceedings of
the Eighth International Conference on Document Analysis and
Recognition. Washington, DC, USA: IEEE Computer Society, 2005, pp.
538-542
T. Kasar, J. Kumar, and A. G. Ramakrishnan. Font and background
color independent text binarization. Second International Workshop on
Camera-Based Document Analysis and Recognition, Bangalore, India, 2007
W. Niblack. (1986). An introduction to digital image processing.
Englewood Cliffs, NJ: Prentice-Hall. 1986, pp. 115 -116
N. Otsu. (1979). A threshold selection method from gray-level
histograms. IEEE Transactions on Systems, Man and Cybernetics. 1979,
vol. 9, pp. 62-66. ISSN 0018-9472