首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:An efficient, font independent word and character segmentation algorithm for printed Arabic text
  • 本地全文:下载
  • 作者:Aziz Qaroush ; Bassam Jaber ; Khader Mohammad
  • 期刊名称:Journal of King Saud University @?C Computer and Information Sciences
  • 印刷版ISSN:1319-1578
  • 出版年度:2022
  • 卷号:34
  • 期号:1
  • 页码:1330-1344
  • 语种:English
  • 出版社:Elsevier
  • 摘要:Characters segmentation is a necessity and the most critical stage in Arabic OCR system. It has attracted the interest of a wide range of researchers. However, the nature of the Arabic cursive script poses extra challenges that need further investigation. Therefore, having a reliable and efficient Arabic OCR system that is independent of font variations is highly required. In this paper, an indirect, font-in dependent word and character segmentation algorithm for printed Arabic text investigated. The proposed algorithm takes a binary line image as an input and produces a set of binary images consisting of one character or ligature as an output. The segmentation performed at two levels: a word segmentation performed in the first level, by employing a vertical projection at the input line image along with using Interquartile Range (IQR) method to differentiate between word gaps and within word gaps. A projection profile method used as a second level of segmentation along with a set of statistical and topological features, which are font-independent, to identify the correct segmentation points from all potential points. The APTI dataset used to test the proposed algorithm with a variety of font type, size, and style. The algorithm experimented on 1800 lines (approximately 24,816 words) with an average accuracy of 97.7% for words segmentation and 97.51% for characters segmentation.
国家哲学社会科学文献中心版权所有