期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2019
卷号:10
期号:12
页码:537-541
出版社:Science and Information Society (SAI)
摘要:Line segmentation is a critical phase of the Optical
Character Recognition (OCR) which separates the individual
lines from the image documents. The accuracy rate of the OCR
tool is directly proportional to the line segmentation accuracy
followed by the word/character segmentation. In this context, an
algorithm, named height_based_segmentation is proposed for the
text line segmentation of printed Odia documents. The proposed
algorithm finds the average height of a text line and it helps to
minimize the overlapped text line cases. The algorithm also
includes post-processing steps to combine the modifier zone with
the base zone. The performance of the algorithm is evaluated
through the ground truth and also by comparing it with the
existing segmentation approaches.
关键词:Document image analysis; line segmentation; word
segmentation; database creation; printed Odia document