首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:CT-FC: more Comprehensive Traversal Focused Crawler
  • 本地全文:下载
  • 作者:Siti Maimunah ; Husni S Sastramihardja ; Dwi H Widyantoro
  • 期刊名称:TELKOMNIKA (Telecommunication Computing Electronics and Control)
  • 印刷版ISSN:2302-9293
  • 出版年度:2012
  • 卷号:10
  • 期号:1
  • 页码:189-198
  • DOI:10.12928/telkomnika.v10i1.777
  • 语种:English
  • 出版社:Universitas Ahmad Dahlan
  • 摘要:In t oday’s world, people depend more on the WWW information, including professionals who have to analyze the data according their domain to maintain and improve their business. A data analysis would require information that is comprehensive and relevant to their domain. Focused crawler as a topical based Web indexer agent is used to meet this application’s information need. In order to increase the precision, focused crawler face the problem of low recall. The study on WWW hyperlink structure characteristics indicates that many Web documents are not strong connected but through co-citation & co-reference. Conventional focused crawler that uses forward crawling strategy could not visit the documents in these characteristics. This study proposes a more comprehensive traversal framework. As a proof, CT-FC (a focused crawler with the new traversal framework) ran on DMOZ data that is representative to WWW characteristics. The results show that this strategy can increase the recall significantly.
国家哲学社会科学文献中心版权所有