首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:A Focused Crawler Based on Correlation Analysis
  • 本地全文:下载
  • 作者:Qiuli Qin ; Xin Peng
  • 期刊名称:International Journal of Future Generation Communication and Networking
  • 印刷版ISSN:2233-7857
  • 出版年度:2014
  • 卷号:7
  • 期号:6
  • 页码:13-20
  • DOI:10.14257/ijfgcn.2014.7.6.02
  • 出版社:SERSC
  • 摘要:With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it's a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TF- IDF text correlation analysis. We take the seed URL as a collection entrance and fetch web pages from internet. Then analysis page information though technological like web content extraction, page link analysis technology and get the main content of one page. By the correlation analysis method based on VSM and TF-IDF text, we calculation the correlation between pages and the topics what have been defined, so we can achieve the purpose of the focus areas of the web.
  • 关键词:Focused Crawler; web crawler; VSM; TF-IDF
国家哲学社会科学文献中心版权所有