期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:3
页码:5292
DOI:10.15680/IJIRCCE.2017.0503301
出版社:S&S Publications
摘要:Documents created and distributed on the Internet are ever changing in various forms. Most of existingworks are devoted to topic modeling and the evolution of individual topics, while sequential relations of topics insuccessive documents published by a specific user are ignored. In order to characterize and detect personalized andabnormal behaviours of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem ofmining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on thewhole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-timemonitoring on abnormal user behaviours. Here present solutions to solve this innovative mining problem through threephases: pre-processing to extract probabilistic topics and identify sessions for different users, generating all the STPcandidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making userawarerarity analysis on derived STPs. Experiments on both real (Twitter) and synthetic datasets show that ourapproach can indeed discover special users and interpretable URSTPs effectively and efficiently, which significantlyreflect users’ characteristics.