期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2022
卷号:13
期号:4
DOI:10.14569/IJACSA.2022.0130447
语种:English
出版社:Science and Information Society (SAI)
摘要:Character segmentation in Unconstrained Arabic handwriting is a complex and challenging task due to the overlapping and touching of words or letters. Such issues have not been widely investigated in the literature. Addressing these issues in the segmentation stage reduces errors in the segmentation process, which plays a significant role in enhancing the accuracy of the Arabic optical character recognition. Therefore, this paper proposes a hybrid approach to improve the accuracy for interconnection, overlapping or touching character segmentation. The proposed method includes several stages: removing extra shapes such as signatures from the document. Using morphological operations, connected components and bounding box detection, detect and extract individual words directly from the document. Finally, the touching characters segmentation is achieved based on background thinning and computational analysis of the word's region. The proposed method has been tested on KHATT, IFN/ENIT database and our own collected dataset. The experimental results showed that the proposed method obtained high performance and improved the accuracy compared to other methods.
关键词:Arabic handwritten character recognition; connected components; word segmentation; character segmentation; morphological operators; overlapping and touching characters