期刊名称:International Journal of Computer Trends and Technology
电子版ISSN:2231-2803
出版年度:2014
卷号:10
期号:3
页码:177-179
DOI:10.14445/22312803/IJCTT-V10P130
出版社:Seventh Sense Research Group
摘要:Data mining is playing vital role in text extraction as now a day’s large amount of data available in scientific research, biomedical literature and web data. Data retrieval using existing approaches use sequential approach to process the data. It suitable for one time processing whereas using this approach performance will prunes. whenever the new data is added to the existing information we need to reprocess the entire data to perform extraction and it consumes large amount of time as same the initial time of processing .If at all there is any frequent modification in the existing data, it will require large amount of time to reprocess .This scenario will be repeats same even new extraction of goal is required for the same existing data. There is a high demand in the information extraction but available method such as UIMA and GATE performs IE by file based approach will not use any relational database in the extraction process. Key challenge of data extraction for incremental data, we need to identify which part of the data is getting affected by the change of any component or goal .To achieve this large corpus data will be stored using special type of data storage and optimized queries for data retrieval. It requires more storage compare to existing approach but now a days storage size not a key requirement. New approach also introduces automated query generation based on available input data for efficient performance. This method will reduce ninety percent of processing time whenever there is any modification of data comparatively to existing approach.
关键词:information extraction; data mining; incremental extraction; PTDB;PTQL