期刊名称:Journal of Emerging Technologies in Web Intelligence
印刷版ISSN:1798-0461
出版年度:2011
卷号:3
期号:1
页码:11-19
DOI:10.4304/jetwi.3.1.11-19
语种:English
出版社:Academy Publisher
摘要:This paper presents a method for finding the content similarity for microblogs. In particular, we process data from Twitter for a breaking news detection and tracking application. The goal is to find a collection of similar messages. The method gives two levels of collections. In the first level, similarity is defined by TF-IDF. Since contents in microblogs have short lengths, we emphasize on specific terms called named entities. Message groups are obtained in the first level. In the second level, we construct a network from the message groups and named entities and perform a community detection. We evaluate and visualize the community results based on several community detection algorithms. We demonstrate that this method can be used to explore similar messages with results in both tightly and loosely coupled manners.
关键词:Twitter; Topic Detection and Tracking; Information Retrieval; Network Analysis