期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2014
卷号:60
期号:3
出版社:Journal of Theoretical and Applied
摘要:Web mining is the application of data mining techniques to automatically discover and extract information from Web data. Furthermore, it uses the data mining techniques to make the web more profitable and to enhance the effectiveness of our interaction with the web. Users always expect maximum accurate results from search engines. But, unfortunately most of the web pages contain more unnecessary information than actual contents. The unnecessary information present in web pages is termed as templates. Template leads to poor performance of search engines due to the retrieval of non-contents for users. Therefore the performance of search engines can be improved by making web pages free of templates. Our method focuses on detecting and extracting templates from web pages that are heterogeneous in nature by means of an algorithm. Locality sensitive hashing algorithm finds the similarity between the input web documents and provides good performance compared to Minimum Description Length(MDL) principle and hash cluster process in terms of execution time.