首页    期刊浏览 2025年02月21日 星期五
登录注册

文章基本信息

  • 标题:REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
  • 本地全文:下载
  • 作者:I. SHAKIR ; S. ABDUL SAMAD ; H. BURAIRAH
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2014
  • 卷号:67
  • 期号:1
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Web crawler visits websites for the purpose of indexing. The dynamic nature of today�s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs.
  • 关键词:Distributed systems; Web Crawler; GUID; Search Engine.
国家哲学社会科学文献中心版权所有