期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2019
卷号:10
期号:12
DOI:10.14569/IJACSA.2019.0101271
出版社:Science and Information Society (SAI)
摘要:Line segmentation is a critical phase of the Optical Character Recognition (OCR) which separates the individual lines from the image documents. The accuracy rate of the OCR tool is directly proportional to the line segmentation accuracy followed by the word/character segmentation. In this context, an algorithm, named height_based_segmentation is proposed for the text line segmentation of printed Odia documents. The proposed algorithm finds the average height of a text line and it helps to minimize the overlapped text line cases. The algorithm also includes post-processing steps to combine the modifier zone with the base zone. The performance of the algorithm is evaluated through the ground truth and also by comparing it with the existing segmentation approaches.
关键词:Document image analysis; line segmentation; word segmentation; database creation; printed Odia document