首页    期刊浏览 2025年12月04日 星期四
登录注册

文章基本信息

  • 标题:Visual Architecture based Web Information Extraction
  • 本地全文:下载
  • 作者:S. Oswalt Manoj ; Nisha Soms ; N.V. Shibu
  • 期刊名称:Bonfring International Journal of Data Mining
  • 印刷版ISSN:2250-107X
  • 电子版ISSN:2277-5048
  • 出版年度:2011
  • 卷号:1
  • 期号:Inaugural Special Issue
  • 页码:06-11
  • DOI:10.9756/BIJDM.I1002
  • 语种:English
  • 出版社:Bonfring
  • 摘要:The World Wide Web has more online web database which can be searched through their web query interface. Deep Web contents are accessed by queries submitted to Web databases and the returned data records are enwrapped in dynamically generated Web pages. Extracting structured data from deep Web pages is a challenging task due to the underlying complicate structures of such pages. Until now, a large number of techniques have been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language dependent. As the popular two-dimensional media, the contents on Web pages are always displayed regularly for users to browse. This motivates us to seek a different way for deep Web data extraction to overcome the limitations of previous works by utilizing some interesting common visual features on the deep Web pages. In this paper, a novel vision-based approach that is Web-page programming- language-independent is proposed. This approach primarily utilizes the visual features on the deep Web pages to implement deep Web data extraction, including data record extraction and data item extraction.
  • 关键词:Data Records; Data Items; Crawler; Web Data Extractor; Vision-based Data Item Extractor (ViDE); Vision based Data Record Extractor (ViDRE)
国家哲学社会科学文献中心版权所有