首页    期刊浏览 2024年12月01日 星期日
登录注册

文章基本信息

  • 标题:Focused Web Crawling Using Decay Concept and Genetic Programming
  • 本地全文:下载
  • 作者:Mahdi Bazarganigilani ; Ali Syed ; Sandid Burki
  • 期刊名称:International Journal of Data Mining & Knowledge Management Process
  • 印刷版ISSN:2231-007X
  • 电子版ISSN:2230-9608
  • 出版年度:2011
  • 卷号:1
  • 期号:1
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:The ongoing rapid growth of web information is a theme of research in many papers. In this paper, we introduce a new optimized method for web crawling. Using genetic programming enhances the accuracy of simialrity measurement. This measurement applies to different parts of the web pages including the title and the body. Consequently, the crawler uses such optimized similarity measurement to traverse the pages .To enhance the accuracy of crawling, we use the decay concept to limit the crawler to the effective web pages in accordance to search criteria. The decay measurements give every page a score according to the search criteria. It decreases while traversing in more depth. This value could be revised according to the similarity of the page to the search criteria. In such case, we use three kinds of measurement to set the thresholds. The results show using Genetic programming along the dynamic decay thresholds leads to the best accuracy.
  • 关键词:Focused Web Crawler; Genetic Programming; Decay Concept; Similarity Space Model.
国家哲学社会科学文献中心版权所有