摘要:With the explosive growth of data available on the World Wide Web, discovery and analysis of useful
information from the World Wide Web becomes a practical necessity. Web access pattern, which is the sequence of
accesses pursued by users frequently, is a kind of interesting and useful knowledge in practice. Sequential Pattern
mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the
correlation relationships that exist among an ordered list of events. Web access pattern tree (WAP-tree) mining is a
sequential pattern mining technique for web log access sequences, which first stores the original web access
sequence database on a prefix tree, similar to the frequent pattern tree (FP-tree) for storing non-sequential data.
WAP-tree algorithm then, mines the frequent sequences from the WAP-tree by recursively re-constructing
intermediate trees, starting with suffix sequences and ending with prefix sequences. An attempt has been made to
modify WAP tree approach for improving efficiency. mWAP totally eliminates the need to engage in numerous reconstruction
of intermediate WAP-trees during mining and considerably reduces execution time.
关键词:WAP tree, data mining, sequential data mining, frequent pattern tree