期刊名称:International Journal of Computer Science & Technology
印刷版ISSN:2229-4333
电子版ISSN:0976-8491
出版年度:2012
卷号:3
期号:1
页码:794-797
语种:English
出版社:Ayushmaan Technologies
摘要:This paper describes the issues and remedies in mining distributeddatabases. A direct application of sequential algorithms todistributed databases is not effective, as it requires a large amountof communication overhead. In this paper, an effcient algorithmfor mining distributed databases is proposed. It minimizes thenumber of candidate sets and exchange messages by local andglobal pruning. In local sites, it runs the application based onthe improved algorithm-C Matrix, which is used to calculatelocal support counts. By numbering the global frequent itemsetsgenerated at the end of k-th iteration from 1 to m, the algorithmcodes every candidate (k+1)-item set into a pair of those numberformed as-(x,y) to compress the content transmitted and querycorresponding support counts in C Matrix. This approach alsoreduces the size of average transactions and datasets that leadsto reduction of scan time. The performance study shows thatthe proposed algorithm has superior running effciency, lowercommunication cost and stronger scalability than direct applicationof a sequential algorithm in distributed databases.