首页    期刊浏览 2024年09月07日 星期六
登录注册

文章基本信息

  • 标题:A Zone Classification Approach for Arabic Documents using Hybrid Features
  • 本地全文:下载
  • 作者:Amany M.Hesham ; Sherif Abdou ; Amr Badr
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2016
  • 卷号:7
  • 期号:7
  • DOI:10.14569/IJACSA.2016.070722
  • 出版社:Science and Information Society (SAI)
  • 摘要:Zone segmentation and classification is an important step in document layout analysis. It decomposes a given scanned document into zones. Zones need to be classified into text and non-text, so that only text zones are provided to a recognition engine. This eliminates garbage output resulting from sending non-text zones to the engine. This paper proposes a framework for zone segmentation and classification. Zones are segmented using morphological operation and connected component analysis. Features are then extracted from each zone for the purpose of classification into text and non-text. Features are hybrid between texture-based and connected component based features. Effective features are selected using genetic algorithm. Selected features are fed into a linear SVM classifier for zone classification. System evaluation shows that the proposed zone classification works well on multi-font and multi-size documents with a variety of layouts even on historical documents.
  • 关键词:thesai; IJACSA; thesai.org; journal; IJACSA papers; segmentation; layout analysis; texture features; connected component analysis; Arabic script; genetic algorithms
国家哲学社会科学文献中心版权所有