首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names
  • 本地全文:下载
  • 作者:Hou, Dongyang ; Wu, Hao ; Chen, Jun
  • 期刊名称:Sustainability
  • 印刷版ISSN:2071-1050
  • 出版年度:2014
  • 卷号:6
  • 期号:10
  • 页码:6529-6552
  • 出版社:MDPI, Open Access Journal
  • 摘要:Place name is an important ingredient of borderlands situation information and plays a significant role in collecting them from the Internet with focused crawlers. However, current focused crawlers treat place name in the same way as any other common keyword, which has no geographical properties. This may reduce the effectiveness of focused crawlers. To solve the problem, this paper firstly discusses the importance of place name in focused crawlers in terms of location and spatial relation, and, then, proposes the two-tuple-based topic representation method to express place name and common keyword, respectively. Afterwards, spatial relations between place names are introduced to calculate the relevance of given topics and webpages, which can make the calculation process more accurately. On the basis of the above, a focused crawler prototype for borderlands situation information collection is designed and implemented. The crawling speed and F-Score are adopted to evaluate its efficiency and effectiveness. Experimental results indicate that the efficiency of our proposed focused crawler is consistent with the polite access interval and it could meet the daily demand of borderlands situation information collection. Additionally, the F-Score value of our proposed focused crawler increases by around 7%, which means that our proposed focused crawler is more effective than the traditional best-first focused crawler.
  • 关键词:focused crawler; place name; web information collection; borderlands situation; relevance calculation; spatial relations
国家哲学社会科学文献中心版权所有