文章基本信息

标题：A Vision Based Approach for Web Data Extraction Using Enhanced Cocitation Algorithm
本地全文：下载
作者：R.Vijay ; K.Prasadh
期刊名称：International Journal of Computer Science Issues
印刷版ISSN：1694-0784
电子版ISSN：1694-0814
出版年度：2013
卷号：10
期号：5
出版社：IJCSI Press
摘要：Normally, the World Wide Web maintains a set of databases which can store several data records retrieved by web query interface. The information maintained in web is hidden in the database that can be retrieved through dynamic script pages are termed as deep web content. These forms of deep web contents are normally accessed by the web queries, but, extracting the structured data from web database involves complexity. To address the issue, Wei Liu et. al., presented programming language independent vision based approach that use the visual features of deep web pages for web data extraction. The vision based approach also includes the process of extraction of data record and data item. But the unsolved issues in Lius vision based approach is that it not only process the deep web pages in one data region of the web page but also consumes additional time to extract the visual information of web pages. To address the demerit present in ViDE, a novel technique called vision based approach for deep web data extraction is presented. In this work, we describe a framework that processes the deep web pages present in multi data regions. The framework uses enhanced co-citation algorithm that, instead of developing a new set of APIs for the extraction of visual information, the algorithm retrieve the visual information of the deep web pages directly from the web database. Empirical studies with large set of database for web data extraction demonstrate that the performance of the proposed vision based approach [VBEC] are capable of offering high precision while enabling efficient and accurate recall value of similar queries with better time consumption compared to other extraction processes.
关键词：Deep web data; vision based approach; multi data regions; co;citation algorithm; visual features; and web data extraction