首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:QTID: Quran Text Image Dataset
  • 作者:Mahmoud Badry ; Hesham Hassan ; Hanaa Bayomi
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2018
  • 卷号:9
  • 期号:3
  • DOI:10.14569/IJACSA.2018.090351
  • 出版社:Science and Information Society (SAI)
  • 摘要:Improving the accuracy of Arabic text recognition in imagery requires a big modern dataset as data is the fuel for many modern machine learning models. This paper proposes a new dataset, called QTID, for Quran Text Image Dataset, the first Arabic dataset that includes Arabic marks. It consists of 309,720 different 192x64 annotated Arabic word images that contain 2,494,428 characters in total, which were taken from the Holy Quran. These finely annotated images were randomly divided into 90%, 5%, 5% sets for training, validation, and testing, respectively. In order to analyze QTID, a different dataset statistics were shown. Experimental evaluation shows that current best Arabic text recognition engines like Tesseract and ABBYY FineReader cannot work well with word images from the proposed dataset.
  • 关键词:HDF5 dataset; Arabic script; Holy Quran text image; handwritten text recognition; Arabic OCR; text image datasets
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有