文章基本信息

标题：Mining Approximate Frequent Itemsets Using Pattern Growth Approach
本地全文：下载
作者：Shariq Bashir ; Daphne Teck Ching Lai
期刊名称：Engineering Economics
印刷版ISSN：2029-5839
出版年度：2021
卷号：50
期号：4
DOI：10.5755/j01.itc.50.4.29060
语种：English
出版社：Kaunas University of Technology
摘要：Approximate frequent itemsets (AFI) mining from noisy databases are computationally more expensive than traditional frequent itemset mining. This is because the AFI mining algorithms generate large number of candidate itemsets. This article proposes an algorithm to mine AFIs using pattern growth approach. The major contribution of the proposed approach is it mines core patterns and examines approximate conditions of candidate AFIs directly with single phase and two full scans of database. Related algorithms apply Apriori-based candidate generation and test approach and require multiple phases to obtain complete AFIs. First phase generates core patterns, and second phase examines approximate conditions of core patterns. Specifically, the article proposes novel techniques that how to map transactions on approximate FP-tree, and how to mine AFIs from the conditional patterns of approximate FP-tree. The approximate FP-tree maps transactions on shared branches when the transactions share a similar set of items. This reduces the size of databases and helps to efficiently compute the approximate conditions of candidate itemsets. We compare the performance of our algorithm with the state of the art AFI mining algorithms on benchmark databases. The experiments are analyzed by comparing the processing time of algorithms and scalability of algorithms on varying database size and transaction length. The results show pattern growth approach mines AFIs in less processing time than related Apriori-based algorithms.
关键词：Data Mining;Pattern Growth;Association Rules Mining;Frequent Itemset Mining;Approximate Frequent Itemset Mining;Data Science