期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2014
卷号:5
期号:2
页码:1655-1658
出版社:TechScience Publications
摘要:Vast amount of information is available on web. Data analysis applications such as extracting mutual funds information from a website, daily extracting opening and closing price of stock from a web page involves web data extraction. Huge efforts are made by lots of researchers to automate the process of web data scraping. Lots of techniques depends on the structure of web page i.e. html structure or DOM tree structure to scrap data from web page. In this paper we are presenting survey of HTML aware web scrapping techniques
关键词:DOM Tree; HTML structure; semi structured;web pages; web scrapping and Web data extraction