期刊名称:International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN:2278-1323
出版年度:2016
卷号:5
期号:10
页码:2540-2543
出版社:Shri Pannalal Research Institute of Technolgy
摘要:Processing huge data simultaneously in a distributedplatform is difficult. Our present database like RDBMs whichis SQL base database cannot be done. So we need a NoSQLdata organization that can perform huge processing. Parallelytaken an image and perform OCR creates huge complexity inthe form of computational speed and ETL. Hence we combinethe processing of OCR using Tesseract, which is effective OCRbut underlying database document not support its fullefficiency. Hence we create a model that combine tesseract towhich in a distributed database where NoSQL basedCassandra. By through tesseract with Cassandra into weachieve high efficiency and high throughput.