首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:Dynamic Vision-Based Approach in Web Data Extraction
  • 本地全文:下载
  • 作者:D.Raghu ; V.Sridhar Reddy ; Ch.Raja Jacob
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2011
  • 卷号:2
  • 期号:6
  • 页码:2734-2736
  • 出版社:TechScience Publications
  • 摘要:The problem of extracting data records on the response pages returned from web databases or search engines. World Wide Web has posed a challenging problem in extracting relevant data. Traditional web crawlers focus only on the surface web while the deep web keeps expanding behind the scene. Deep web pages are created dynamically as a result of queries posed to specific web databases. Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. The large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language dependent or independent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. This paper, a novel and vision-based approach for extracting data from the deep web. Deep splits the process into two phases. The first phase includes Query analysis and Query translation and the second covers vision-based extraction of data from the dynamically created deep web pages. There are several established approaches for the extraction of deep web pages but the proposed method aims at overcoming the inherent limitations, it aims to comparing the data items and presenting them in the proper order.
国家哲学社会科学文献中心版权所有