首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:Language Specific Crawler for Myanmar Web Pages
  • 本地全文:下载
  • 作者:Pann Yu Mon ; Chew Yew Choong ; Yoshiki Mikami
  • 期刊名称:International Journal of Computer Science Issues
  • 印刷版ISSN:1694-0784
  • 电子版ISSN:1694-0814
  • 出版年度:2011
  • 卷号:8
  • 期号:2
  • 出版社:IJCSI Press
  • 摘要:With the enormous growth of the World Wide Web, search engines play a critical role in retrieving information from the borderless Web. Although many search engines can search for content in numerous major languages, they are not capable of searching pages of less-computerized languages such as Myanmar due to the use of multiple non-standard encodings in the Myanmar Web pages. Since the Web is a distributed, dynamic and rapidly growing information resource, a normal Web crawler cannot download all pages. For a Language specific search engine, Language Specific Crawler (LSC) is needed to collect targeted pages. This paper presents a LSC implemented as multi-threaded objects that run concurrently with language identifier. The LSC is capable of collecting as many Myanmar Web pages as possible. In experiments, the implemented algorithm collected Myanmar pages at a satisfactory level of coverage. The results of an evaluation of the LSC by two criteria, recall and precision and a method to measure the total number of Myanmar Web pages on the entire Web are also discussed. Finally, another analysis was conducted to determine the location of the servers of Myanmar Web content, and those results are presented.
  • 关键词:Language Specific Crawling; Myanmar; Web Search; Language Identification
国家哲学社会科学文献中心版权所有