期刊名称:International Journal of Computer Science and Engineering Communications
电子版ISSN:2347-8586
出版年度:2015
卷号:3
期号:3
页码:1062-1068
语种:English
出版社:Scientist Link Group of Publications
摘要:Text detection in natural scene images is significant for many content based image analysis tasks. In this paper an accurate method is used for detecting texts in natural scene images. An effective pruning algorithm and the Tesseract algorithm is designed to extract Maximally Stable Extreme Regions (MSERs) as character candidates. Character candidates are grouped into text candidates by the single link clustering algorithm. Distance weights and clustering threshold are learned automatically by novel self training distance metric learning algorithm. The text candidates corresponding to non-text are identified by a character classifier. Non-text candidates are eliminated by a text classifier. In our system the documents will be scanned as images and after the scanning process, the data from the image is identified by Tesseract algorithm and the text is extracted automatically. The Text To Speech (TTS) Engine will convert the extracted text to voice. The translation will translate the text into user defined language. The text and the images are stored in a database. The stored images can be retrieved from the database for further use.