期刊名称:International Journal of Database Management Systems
印刷版ISSN:0975-5985
电子版ISSN:0975-5705
出版年度:2011
卷号:3
期号:3
DOI:10.5121/ijdms.2011.3302
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Finding frequent itemsets in a data source is a fundamental operation behind Association Rule Mining. Generally, many algorithms use either the bottom-up or top-down approaches for finding these frequent itemsets. When the length of frequent itemsets to be found is large, the traditional algorithms find all the frequent itemsets from 1-length to n-length, which is a difficult process. This problem can be solved by mining only the Maximal Frequent Itemsets (MFS). Maximal Frequent Itemsets are frequent itemsets which have no proper frequent superset. Thus, the generation of only maximal frequent itemsets reduces the number of itemsets and also time needed for the generation of all frequent itemsets as each maximal itemset of length m implies the presence of 2m-2 frequent itemsets. Furthermore, mining only maximal frequent itemset is sufficient in many data mining applications like minimal key discovery and theory extraction. In this paper, we suggest a novel method for finding the maximal frequent itemset from huge data sources using the concept of segmentation of data source and prioritization of segments. Empirical evaluation shows that this method outperforms various other known methods.
关键词:Knowledge discovery in data sources; maximal frequent itemset; association rules; data mining;segmentation