摘要:In recent years, there are a great deal of efforts on sequential pattern mining, but some challenges have not been resolved, such as large search spaces and the ineffectiveness in handling highly similar, dense and long sequences. This paper mainly focuses on how to design some effective search space pruning methods to accelerate the mining process. We present a novel structure, Prefix-Frequent-Items Graph (PFI-Graph), which presents the prefix frequent items of other items in sequential patterns. An efficient algorithm PFI-PrefixSpan (Prefix-Frequent- Items PrefixSpan) based on PFI-Graph is proposed in this paper. It avoids redundant data scanning, and thus can effectively speed up the discovery process of new patterns. Extensive experimental results on some synthetic and real sequence datasets show that the proposed novel structure is substantially more efficient than PrefixSpan with physical-projection and pseudo-projection, especially for dense and highly similar sequence databases.
关键词:sequential pattern mining; dense database; highly similar sequence; long sequence; prefix frequent items