期刊名称:International Journal of Image Processing (IJIP)
电子版ISSN:1985-2304
出版年度:2010
卷号:4
期号:2
页码:175-191
出版社:Computer Science Journals
摘要:In most of our official papers, school text books, it is observed that English words interspersed within the Indian languages. So there is need for an Optical Character Recognition (OCR) system which can recognize these bilingual documents and store it for future use. In this paper we present an OCR system developed for the recognition of Indian language i.e. Oriya and Roman scripts for printed documents. For such purpose, it is necessary to separate different scripts before feeding them to their individual OCR system. Firstly, we need to correct the skew followed by segmentation. Here we propose the script differentiation line-wise. We emphasize on Upper and lower matras associated with Oriya and absent in English. We have used horizontal histogram for line distinction belonging to different script. After separation different scripts are sent to their individual recognition engines.
关键词:Script separation; Indian script; Bilingual (English-Oriya) OCR; Horizontal profiles