期刊名称:International Journal of Advanced Research In Computer Science and Software Engineering
印刷版ISSN:2277-6451
电子版ISSN:2277-128X
出版年度:2013
卷号:3
期号:8
出版社:S.S. Mishra
摘要:Image archival and retrieval systems effectively archive and help in retrieving the documents for content-based retrieval systems, image search engines and script recognition systems. This paper proposes a combined edge-based technique for separating text and non-text regions in a document image. The maximum magnitude of the edge is detected by using the compass masks filtering convolution in eight major directions. Successively, in the localization process the magnitude of the edge can be compared with a threshold value to generate the edge map. Then, morphological operators are applied to find the regions of the non-text connected component. Finally, a statistical feature analysis of the text region is performed and extracted from the background of the image so that a sensible reading is provided for the OCR system. This combined algorithm has been tested on a large set of document images and the result seems to be better when compared to the existing techniques
关键词:Compass mask; Threshold; Morphological Operators; Statistical Measures; Text extraction