期刊名称:International Journal of Grid and Distributed Computing
印刷版ISSN:2005-4262
出版年度:2016
卷号:9
期号:1
页码:135-144
DOI:10.14257/ijgdc.2016.9.1.14
出版社:SERSC
摘要:With the rapid development of Weibo, which is the most popular microblog in china, more and more attention was paid to relative studies about it. With the objective of gathering precise information data from Weibo, which is the groundwork of these researches, a novel high efficient Weibo crawler (WCrawler) based on loginning simulation is designed. The priority evaluation is described to ensure the correlation between entires. MD5 is introduced to check for duplicates of URL crawled. Experiments demonstrate that the novel crawler has an efficiency and integrity of information collecting compared with API crawler. In addition, we present a summary of the data that collected from Weibo social network by WCrawler.
关键词:Weibo Crawler; Loginning Simulation; Web Information Extraction