期刊名称:International Journal of Software Engineering and Its Applications
印刷版ISSN:1738-9984
出版年度:2008
卷号:2
期号:2
出版社:SERSC
摘要:In this paper, we propose a new boosting algorithm for distributed databases. The main idea of the proposed method is to utilize the parallelism of the distributed databases to build an ensemble of classifiers. At each round of the algorithm, each site processes its own data locally, and calculates all needed information. A center site will collect information from all sites and build the global classifier, which is then a classifier in the ensemble. This global classifier is also used by each distributed site to compute required information for the next round. By repeating this process, an ensemble of classifiers, which is almost identical to the one built on the whole data, will be produced from the distributed databases. The experiments were performed on 5 different datasets from the UCI repository [9]. The experimental results show that the accuracy of the proposed algorithm is almost equal to or higher than the accuracy when applying boosting algorithm to the whole database.