首页    期刊浏览 2025年02月26日 星期三
登录注册

文章基本信息

  • 标题:A Comparative Analysis of Classifiers Accuracies for Bilingual Printed Documents (Oriya-English)
  • 本地全文:下载
  • 作者:Sanghamitra Mohanty ; Himadri Nandini Das Bebartta
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2011
  • 卷号:2
  • 期号:2
  • 页码:916-923
  • 出版社:TechScience Publications
  • 摘要:Bilingual document recognition has been the subject of intensive research and our focus is on the recognition of an Oriya-English bilingual documents. In most of our official papers, school text books, it is observed that English words interspersed within the Indian languages. So there is need for an Optical Character Recognition (OCR) system which can recognize these bilingual documents and store it for future use. In this paper we present an OCR system developed for the recognition of Indian language i.e. Oriya and Roman scripts for printed documents. For such purpose, it is necessary to separate different scripts before feeding them to their individual OCR system. Firstly, we need to correct the skew followed by segmentation. Here we propose the script differentiation line-wise. We emphasize on Upper and lower matras associated with Oriya and absent in English. We have used horizontal histogram for line distinction belonging to different script. After separation different scripts are sent to their individual recognition engines. Recognition of bilingual script in an image of a document page is of primary importance for a system processing bilingual document. Earlier we had communicated a paper using a single classifier and now three classifiers such as k-nearest neighbor (KNN), convolutional neural networks (CNN) and Support Vector Machines (SVM) schemes have been proposed for analyzing the accuracies for recognition. It has been observed that SVM outperform among all the classifiers.
  • 关键词:Script separation; Indian script; Bilingual;(English-Oriya) OCR; Horizontal profiles; nearest;neighbour.
国家哲学社会科学文献中心版权所有