期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2013
卷号:52
期号:2
出版社:Journal of Theoretical and Applied
摘要:Optical Character Recognition (OCR) system aims to convert optically scanned text image to a machine editable text form. Multiple approaches to preprocessing and segmentation exist for various scripts. However, only a restricted combination of the same has been experimented on Devanagari script. This paper proposes a study which aims to explore and bring out an alternative and efficient strategy of pre-processing and segmentation in handling OCR for Devanagari scripts. Efficiency evaluation of the proposed alternative has been undertaken by subjecting it to documents with varying degree of noise severity and border artifacts. The experimental results confirm our proposition to be superior approach over other conventional methodologies to OCR system implementation for Devanagari scripts. Also described is detailed approach to conventional pre-processing involved in initial stage of OCR, including noise removal techniques, along with the other conventional approaches to segmentation. The proposed alternative has been deployed to reach character and top character segmentation level.