标题:Development of text extraction technique using optical character recognition and morphological reconstruction to eliminate artifacts of image’s background
期刊名称:Eastern-European Journal of Enterprise Technologies
印刷版ISSN:1729-3774
电子版ISSN:1729-4061
出版年度:2022
卷号:1
期号:2
页码:50-57
DOI:10.15587/1729-4061.2022.252803
语种:English
出版社:PC Technology Center
摘要:Text recognition of images is beneficial in a wide range of computer vision purposes such as robot navigation, document analysis, and image search. The optical character recognition (OCR) technique presents a simple tool to combine text recognition functionality to many industrial and educational applications. Best OCR results can be acquired when the background of the text image is uniform and appears as a document picture. In contrast, the challenges to recognizing accurate texts occur when the image has a non-uniform background that require further preprocessing to obtain acceptable OCR result. This work discusses three scenarios. Initially, this work will test the OCR on a normal business card as an image with a uniform background. Next, discusses the text recognition of a keypad image including digits with a non-uniform background. Here, there are two preprocessing algorithms used to enhance the OCR function to overcome the negative effect of the non-uniform background of images and to detect text with high accuracy. Finally, the developed OCR method is tested on different scanned bills and discusses the variation of the obtained results. The two algorithms are the morphological reconstruction to eliminate artifacts and create cleaner images to be further processed by OCR and the Region of Interest ROI-based OCR to spot explicit regions in a tested image. Verification for the effectiveness of the Morphological-based OCR over the ROI-based method has been conducted on a dataset of scanned electricity bills images with an accuracy of 98.2 % for Morphological-based while it is only about 89.3 % for ROI-based OCR.
关键词:Morphological Reconstruction;Optical Character Recognition (OCR);document images;non-uniform illumination images