首页    期刊浏览 2025年06月25日 星期三
登录注册

文章基本信息

  • 标题:Ontologies and Bigram-based approach for Isolated Non-word Errors Correction in OCR System
  • 其他标题:Ontologies and Bigram-based approach for Isolated Non-word Errors Correction in OCR System
  • 本地全文:下载
  • 作者:Aicha Eutamene ; Mohamed Khireddine Kholladi ; Hacene Belhadef
  • 期刊名称:International Journal of Electrical and Computer Engineering
  • 电子版ISSN:2088-8708
  • 出版年度:2015
  • 卷号:5
  • 期号:6
  • 页码:1458-1467
  • DOI:10.11591/ijece.v5i6.pp1458-1467
  • 语种:English
  • 出版社:Institute of Advanced Engineering and Science (IAES)
  • 摘要:In this paper, we describe a new and original approach for post-processing step in an OCR system. This approach is based on new method of spelling correction to correct automatically misspelled words resulting from a character recognition step of scanned documents by combining both ontologies and bigram code in order to create a robust system able to solve automatically the anomalies of classical approaches. The proposed approach is based on a hybrid method which is spread over two stages, first one is character recognition by using the ontological model and the second one is word recognition based on spelling correction approach based on bigram codification for detection and correction of errors. The spelling error is broadly classified in two categories namely non-word error and real-word error. In this paper, we interested only on detection and correction of non-word errors because this is the only type of errors treated by an OCR. In addition, the use of an online external resource such as WordNet proves necessary to improve its performances.
  • 其他摘要:In this paper, we describe a new and original approach for post-processing step in an OCR system. This approach is based on new method of spelling correction to correct automatically misspelled words resulting from a character recognition step of scanned documents by combining both ontologies and bigram code in order to create a robust system able to solve automatically the anomalies of classical approaches. The proposed approach is based on a hybrid method which is spread over two stages, first one is character recognition by using the ontological model and the second one is word recognition based on spelling correction approach based on bigram codification for detection and correction of errors. The spelling error is broadly classified in two categories namely non-word error and real-word error. In this paper, we interested only on detection and correction of non-word errors because this is the only type of errors treated by an OCR. In addition, the use of an online external resource such as WordNet proves necessary to improve its performances.
  • 关键词:Information System; Computer and Informatics;Bigram; Ontology; OCR; Spelling Correction; WordNet; Word Recognition
国家哲学社会科学文献中心版权所有