期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2014
卷号:5
期号:2
页码:1403-1410
出版社:TechScience Publications
摘要:Web data analysis applications such as extracting mutual funds information from a website, daily extracting opening and closing price of stock from a web page involves web data extraction. Every time you need analyze data, you need to visit number of web sites. It is very time consuming process to construct wrapper to visit those sites and collect data. In this paper, we propose technique called DEUDS, a page level data extraction system that automatically discovers extraction pattern from web pages for selected data section and extracts data. DEUDS uses visual cues to identify data records while ignoring noise items such as advertises and navigation bars.
关键词:DOM Tree; CSS selector; semi structured web;pages and Web data extraction