首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:Analyzing Word Error Rate on Optical Character Recognition (OCR) for Myanmar Printed Document Image
  • 本地全文:下载
  • 作者:Saeedeh SHOJAEE ; Hossein KESHAVARZ ; Mahboobeh SALIMI
  • 期刊名称:International Journal of Computer Trends and Technology
  • 电子版ISSN:2231-2803
  • 出版年度:2019
  • 卷号:67
  • 期号:8
  • 页码:51-57
  • DOI:10.14445/22312803/IJCTT-V67I8P109
  • 出版社:Seventh Sense Research Group
  • 摘要:The printed document is used Myanmar language in Myanmar. Sometime, we want to convert this printed document to text document easily. So, this paper describes an effective recognition and calculate error rate for Myanmar printed document image to editing text. Myanmar language contains many words, and most of them are similar, especially for small fonts, the accuracy of the Optical Character Recognition, OCR system for Myanmar may be low. In order to get more accurate system, enhance the input image by removing noise and making some correction on variants. A method for isolation of the character image is proposed by using connected component analysis for wrongly segmented characters produced by projection only. So, this paper proposes a method for obtaining more detail about actual translation errors in the generated output by using word error rate (WER) based the neural network classifier for recognition of the character image. We investigate the use of WER for automatic error analysis using a dynamic programming algorithm like Levenshtein distance over segmentation. This paper gives a better overview of the nature of translation errors. Finally, the proposed algorithms have been tested on a variety of Myanmar printed documents and the results of the experiments indicate that the methods can reduce the segmentation error rate as well as translation rates.
  • 关键词:Neural Network; OCR; Printed Document; WER
国家哲学社会科学文献中心版权所有