首页    期刊浏览 2025年02月21日 星期五
登录注册

文章基本信息

  • 标题:SIMPLIFIED SCHEME FOR PERFORMANCE AUGMENTATION OF WEB DATA EXTRACTION
  • 本地全文:下载
  • 作者:G.NAVEENSUNDAR ; D.NARMADHA ; DR.A.P.HARAN
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2014
  • 卷号:60
  • 期号:3
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Web mining is the application of data mining techniques to automatically discover and extract information from Web data. Furthermore, it uses the data mining techniques to make the web more profitable and to enhance the effectiveness of our interaction with the web. Users always expect maximum accurate results from search engines. But, unfortunately most of the web pages contain more unnecessary information than actual contents. The unnecessary information present in web pages is termed as templates. Template leads to poor performance of search engines due to the retrieval of non-contents for users. Therefore the performance of search engines can be improved by making web pages free of templates. Our method focuses on detecting and extracting templates from web pages that are heterogeneous in nature by means of an algorithm. Locality sensitive hashing algorithm finds the similarity between the input web documents and provides good performance compared to Minimum Description Length(MDL) principle and hash cluster process in terms of execution time.
  • 关键词:Cluster; Non-Content Path; Template Detection; Locality Sensitive Hash; Minimum Description Length
国家哲学社会科学文献中心版权所有