首页    期刊浏览 2025年06月17日 星期二
登录注册

文章基本信息

  • 标题:DEUDS: Data Extraction Using DOM Tree and Selectors
  • 本地全文:下载
  • 作者:Vinayak B. Kadam ; Ganesh K. Pakle
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2014
  • 卷号:5
  • 期号:2
  • 页码:1403-1410
  • 出版社:TechScience Publications
  • 摘要:Web data analysis applications such as extracting mutual funds information from a website, daily extracting opening and closing price of stock from a web page involves web data extraction. Every time you need analyze data, you need to visit number of web sites. It is very time consuming process to construct wrapper to visit those sites and collect data. In this paper, we propose technique called DEUDS, a page level data extraction system that automatically discovers extraction pattern from web pages for selected data section and extracts data. DEUDS uses visual cues to identify data records while ignoring noise items such as advertises and navigation bars.
  • 关键词:DOM Tree; CSS selector; semi structured web;pages and Web data extraction
国家哲学社会科学文献中心版权所有