首页    期刊浏览 2025年02月22日 星期六
登录注册

文章基本信息

  • 标题:Analysis of Efficient Way to Identify User Aware Rare Sequential Pattern in Document Stream
  • 本地全文:下载
  • 作者:Swati V.Mengje ; Prof. Rajeshri R. Shelke
  • 期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
  • 印刷版ISSN:2320-9798
  • 电子版ISSN:2320-9801
  • 出版年度:2017
  • 卷号:5
  • 期号:3
  • 页码:5292
  • DOI:10.15680/IJIRCCE.2017.0503301
  • 出版社:S&S Publications
  • 摘要:Documents created and distributed on the Internet are ever changing in various forms. Most of existingworks are devoted to topic modeling and the evolution of individual topics, while sequential relations of topics insuccessive documents published by a specific user are ignored. In order to characterize and detect personalized andabnormal behaviours of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem ofmining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on thewhole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-timemonitoring on abnormal user behaviours. Here present solutions to solve this innovative mining problem through threephases: pre-processing to extract probabilistic topics and identify sessions for different users, generating all the STPcandidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making userawarerarity analysis on derived STPs. Experiments on both real (Twitter) and synthetic datasets show that ourapproach can indeed discover special users and interpretable URSTPs effectively and efficiently, which significantlyreflect users’ characteristics.
  • 关键词:Web mining; sequential patterns; document streams; rare events; pattern-growth; dynamic;programming.
国家哲学社会科学文献中心版权所有