首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:Optical Character Recognition of Amharic Documents
  • 其他标题:Optical Character Recognition of Amharic Documents
  • 本地全文:下载
  • 作者:Million Meshesha ; C V Jawahar ; International Institute of Information Technology, Hyderabad - , India
  • 期刊名称:African Journal of Information & Communication Technology
  • 印刷版ISSN:1449-2679
  • 出版年度:2007
  • 卷号:3
  • 期号:2
  • DOI:10.5130/ajict.v3i2.543
  • 语种:English
  • 出版社:University of Technology, Sydney
  • 摘要:In Africa around 2,500 languages are spoken. Some of these languages have their own indigenous scripts. Accordingly, there is a bulk of printed documents available in libraries, information centers, museums and offices. Digitization of these documents enables to harness already available information technologies to local information needs and developments. This paper presents an Optical Character Recognition (OCR) system for converting digitized documents in local languages. An extensive literature survey reveals that this is the first attempt that report the challenges towards the recognition of indigenous African scripts and a possible solution for Amharic script. Research in the recognition of African indigenous scripts faces major challenges due to (i) the use of large number characters in the writing and (ii) existence of large set of visually similar characters. In this paper, we propose a novel feature extraction scheme using principal component and linear discriminant analysis, followed by a decision directed acyclic graph based support vector machine classifier. Recognition results are presented on real-life degraded documents such as books, magazines and newspapers to demonstrate the performance of the recognizer.
  • 关键词:Optical Character Recognition; African Scripts; Feature Extraction; Classification; Amharic Documents
国家哲学社会科学文献中心版权所有