期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2013
卷号:36
期号:3
出版社:IEEE Computer Society
摘要:The firehose of data generated by users on social networking and microblogging sites such as Face-book and Twitter is enormous. The data can be classified into two categories: the textual contentwritten by the users and the topological structure of the connections among users. Real-time analyt-ics on such data is challenging with most current efforts largely focusing on the efficient queryingand retrieval of data produced recently. In this article, we present a dynamic pattern driven ap-proach to summarize social network content and topology. The resulting family of algorithms relieson the common principles of summarization via pattern utilities and ranking (SPUR). SPUR and itsdynamic variant (D-SPUR) relies on an in-memory summary while retaining sufficient informationto facilitate a range of user-specific and topic-specific temporal analytics. We then follow up by de-scribing variants that take the implicit graph of connections into account to realize the Graph-basedSPUR variant (G-SPUR). Finally we describe scalable algorithms for implementing these ideas ona commercial GPU-based systems. We examine the effectiveness of the summarization approachesalong the axes of storage cost, query accuracy, and efficiency using real data from Twitter