期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2012
卷号:III
期号:I – Ver 4
出版社:Ayushmaan Technologies
摘要:This paper describes the issues and remedies in mining distributed datab ases. A direct appl ication of sequ enti al algori thms to distributed databases is not effective, as it requires a large amount of communication overhead. In this paper, an efficient algorithm for mining distributed databases is proposed. It minimizes the number of candidate sets and exchange messages by local and global pruning. In local sites, it runs the ap plication b ased on the imp rov ed algorithm-C Matrix, which is used to calculate local support counts. By numbering the global frequent itemsets generated at the end of k-th iteration from 1 to m, the algorithm codes every candidate (k+1)-item set into a pair of those number formed as-(x,y) to compress the content transmitted and query corresponding support counts in C Matrix. This approach also reduces the size of average transactions and datasets that leads to reduction of scan time. The p erforman ce study shows that the proposed algorithm has superior running efficiency, lower communication cost and stronger scalability than direct application of a sequential algorithm in distributed databases
关键词:Association Rule Mining; Distributed Database; C-Matrix; Meta-;vector; Global Frequent Itemsets