首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Text segmentation in degraded historical document images
  • 作者:A.S. Kavitha ; P. Shivakumara ; G.H. Kumar
  • 期刊名称:Egyptian Informatics Journal
  • 印刷版ISSN:1110-8665
  • 出版年度:2016
  • 卷号:17
  • 期号:2
  • 页码:189-197
  • DOI:10.1016/j.eij.2015.11.003
  • 出版社:Elsevier
  • 摘要:Text segmentation from degraded Historical Indus script images helps Optical Character Recognizer (OCR) to achieve good recognition rates for Hindus scripts; however, it is challenging due to complex background in such images. In this paper, we present a new method for segmenting text and non-text in Indus documents based on the fact that text components are less cursive compared to non-text ones. To achieve this, we propose a new combination of Sobel and Laplacian for enhancing degraded low contrast pixels. Then the proposed method generates skeletons for text components in enhanced images to reduce computational burdens, which in turn helps in studying component structures efficiently. We propose to study the cursiveness of components based on branch information to remove false text components. The proposed method introduces the nearest neighbor criterion for grouping components in the same line, which results in clusters. Furthermore, the proposed method classifies these clusters into text and non-text cluster based on characteristics of text components. We evaluate the proposed method on a large dataset containing varieties of images. The results are compared with the existing methods to show that the proposed method is effective in terms of recall and precision.
  • 关键词:Text enhancement ; Sobel and Laplacian operations ; Indus document ; Clustering ; Text line segmentation
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有