首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment
  • 本地全文:下载
  • 作者:Chang-Su Kim
  • 期刊名称:International Journal of Multimedia and Ubiquitous Engineering
  • 印刷版ISSN:1975-0080
  • 出版年度:2015
  • 卷号:10
  • 期号:10
  • 页码:375-384
  • DOI:10.14257/ijmue.2015.10.10.37
  • 出版社:SERSC
  • 摘要:Due to the development of IT, distribution of smart phone, and an increase of use of SNS, various types of contents are being produced and consumed in Internet. Therefore, information searching technology has become important due to a sharp rise in data. However, information searching technology requires much of background knowledge and hence has been recognized as what was difficult to access to. Issues with previous search engine were how many of qualified personnel with background knowledge along with huge amount of development expenses were required. Therefore, search engines have been recognized as what was exclusively possessed by leading IT companies or specialized organizations. This study is intended to suggest a search engine with an index structure for making it convenient to effectively search information by distributed crawling massive amount of websites and web-documents in the distributed environment. Search engine suggested in this study has been realized by Hadoop structure for supporting the distributed processing.
  • 关键词:Distributed Crawling; Hadoop; Massive Data; MapReduce; Nutch; Search ; Engine; Solr; YARN
国家哲学社会科学文献中心版权所有