期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2013
卷号:4
期号:6
页码:766-770
出版社:TechScience Publications
摘要:Web Page Noise Cleaning is one of the new research area of study for removing the noise patterns of web pages for effective web mining. The World Wide Web contains large amount of web pages which are accessible by users. With conventional data or text, Web pages generally contain a large amount of noise information that is not part of the main contents of the web pages, e.g., advertisement banners, navigation bars, and disclaimer/copyright notices. The main objective of this area is removing such irrelevant information (i.e. Web Page Noise or Local Noise) in Web pages that can seriously harm Web mining task such as clustering and classification etc. The main purpose of this paper is to review and discuss the major research work that has been done in this area and identifying the challenges and issues in this area.
关键词:WWW; Web Page Cleaning; Noise Block; DOM;Tree; Web Mining; Web pages.