首页    期刊浏览 2024年08月24日 星期六
登录注册

文章基本信息

  • 标题:Estimating the size of Arabic indexed web content
  • 本地全文:下载
  • 作者:Abdulrahman Alarifi ; Mansour Alghamdi ; Mohammad Zarour
  • 期刊名称:Scientific Research and Essays
  • 印刷版ISSN:1992-2248
  • 出版年度:2012
  • 卷号:7
  • 期号:28
  • 页码:2472-2483
  • DOI:10.5897/SRE11.1708
  • 语种:English
  • 出版社:Academic Journals
  • 摘要:Various initiatives designed to increase Arabic Web content have been undertaken in recent years, and now search engines are reporting that the Arabic portion of Web content has grown relative to the overall Web content. An accurate estimate of Arabic Web content is crucial for those interested in studying it and enriching it. In this paper, we propose a statistics-based system to estimate the size of Arabic indexed Web content using three popular search engines; Google, Yahoo and Bing. Our system relies on selecting sample words from an Arabic corpus to estimate the size of the Arabic Web content indexed by the search engines and the overlap among them. We have used Arabic Wikipedia as a corpus, as it provides diversified content accessed by a large number of Internet users. Our results show that, as of December 2010, the size of the Arabic indexed Web content was estimated at 2 to 2.1 billion pages.
  • 关键词:World Wide Web; the Web; search engine; index size; Arabic content; Internet; corpus
国家哲学社会科学文献中心版权所有