期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2011
卷号:2
期号:6
页码:2734-2736
出版社:TechScience Publications
摘要:The problem of extracting data records on the response pages returned from web databases or search engines. World Wide Web has posed a challenging problem in extracting relevant data. Traditional web crawlers focus only on the surface web while the deep web keeps expanding behind the scene. Deep web pages are created dynamically as a result of queries posed to specific web databases. Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. The large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language dependent or independent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. This paper, a novel and vision-based approach for extracting data from the deep web. Deep splits the process into two phases. The first phase includes Query analysis and Query translation and the second covers vision-based extraction of data from the dynamically created deep web pages. There are several established approaches for the extraction of deep web pages but the proposed method aims at overcoming the inherent limitations, it aims to comparing the data items and presenting them in the proper order.