期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2010
卷号:1
期号:2
DOI:10.14569/IJACSA.2010.010209
出版社:Science and Information Society (SAI)
摘要:The massive adoption of social media has provided new ways for individuals to express their opinions online. The blogosphere, an inherent part of this trend, contains a vast array of information about a variety of topics. It is a huge think tank that creates an enormous and ever-changing archive of open source intelligence. Mining and modeling this vast pool of data to extract, exploit and describe meaningful knowledge in order to leverage structures and dynamics of emerging networks within the blogosphere is the higher-level aim of the research presented here. Our proprieteary development of a tailor-made feed-crawler-framework meets exactly this need. While the main concept, as well as the basic techniques and implementation details of the crawler have already been dealt with in earlier publications, this paper focuses on several recent optimization efforts made on the crawler framework that proved to be crucial for the performance of the overall framework.
关键词:thesai; IJACSA; thesai.org; journal; IJACSA papers; weblogs; rss-feeds; data mining; knowledge discovery; blogosphere; crawler; information extraction