首页    期刊浏览 2025年02月21日 星期五
登录注册

文章基本信息

  • 标题:Mining Asynchronous Interesting Sequential Patterns based on Frequency and Self-Information
  • 本地全文:下载
  • 作者:Junpei Murata ; Koji Iwanuma ; Naoki Ohtsuka
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2010
  • 卷号:25
  • 期号:3
  • 页码:464-474
  • DOI:10.1527/tjsai.25.464
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:In this paper, we propose new methods and gave a system, called IFMAP , for extracting interesting patterns from a long sequential data based on frequency and self-information, and experimentally evaluate the proposed methods in the application of handling a newspaper article corpus. Sequential data mining methods based on frequency have intensively beenstudied so far. These methods, however, are not effective nor valuable for some applications where almost all high-frequent patterns should beregarded just as meaningless noisy patterns. An information-gain concept is quite important in order to restrain these noisy patterns, and was already studied for integrating it with a frequency criteria. Yang et.~al. gave a sequential mining system InfoMiner which can find periodic synchronous patterns being interesting and well-balanced from the both view-points of frequency and self-information. In this paper, we refine and extend the InfoMiner technologies in the following points: firstly, our method can handle ordinary, i.e., asynchronous and non-periodic patterns by using a sliding window mechanism, whereas InfoMiner cannot; secondly we give several combination measures for choosing valuable patterns based on frequency and self-information, while InfoMiner has just one measure which, we show in this paper, is not appropriate nor effective for handling newspaper article corpora; thirdly, we proposed a new unified method for pruning the search space of sequential data mining, which can uniformally be applied to any combination measures proposed here. We conduct experiments for evaluating the effectiveness and efficiency of the proposed method with respect to the runtime and the amount of excluding noisy patterns.
  • 关键词:pattern mining ; self-information ; frequency ; sequential data mining
国家哲学社会科学文献中心版权所有