期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2016
卷号:39
期号:1
页码:32-39
DOI:10.14445/22312803/IJCTT-V39P107
出版社:Seventh Sense Research Group
摘要:With the advent of digital cameras and other hand held imaging devices a new type of text containing images have emerged that are unable to handle with traditional optical character recognition (OCR) technology. These camera based images imposes a number of challenges that are absent in scanned or borndigital images. To detect and extract text from camera based images, Text Information Extraction (TIE) process is carried out that detects presence of text in an image and separates it from the background. In this paper a detailed comparison between camera based, born digital and scanned images is presented. A database of camera based images containing text particularly in Devanagari script is created. A survey of various available bench mark databases is done and keeping in view the challenges of camera based images an exhaustive dataset of images is prepared. The paper also discusses the evaluation metrics used to compute the accuracy of text detection and extraction from camera based images.
关键词:Born Digital Images; Camera Images; Scanned Images; Text Information Extraction (TIE); optical character recognition (OCR).