期刊名称:International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN:2347-6710
电子版ISSN:2319-8753
出版年度:2014
卷号:3
期号:5
页码:12146
出版社:S&S Publications
摘要:Data mining is the process of mining information fro m the large set of data. A Web Page contains many blocks such as content blocks. Other than content blocks, there are such blocks like copyright, privacy notices and advertisements. These blocks don't come under main content blocks, but these are known as noisy blocks or noisy information. Eliminating these noises will improve web data mining. In this paper, we will discuss how to identify these noises to improve efficiency of web mining. And also removal of noises using simple LRU algorithm. Least Recent Used algorithm is less time consuming and less complex algorithm for web mining.
关键词:Content Extraction; DOM Tree; LRU; Web Mining.