期刊名称:Indian Journal of Computer Science and Engineering
印刷版ISSN:2231-3850
电子版ISSN:0976-5166
出版年度:2021
卷号:12
期号:2
页码:445-455
DOI:10.21817/indjcse/2021/v12i2/211202131
出版社:Engg Journals Publications
摘要:Class imbalance is very common in many real-world datasets and is an active research topic in machine learning and data mining. In this paper, a hybrid sampling algorithm is proposed to deal with class imbalance with minimum loss of existing information and additional computing power. The proposed algorithm undersamples the majority class data by identifying within-class sub-concepts and then deriving an optimal subset of majority class data using ant colony optimization. A representative training dataset is derived by combining the optimal subset of majority class data and minority class data. New synthetic instances are created in the minority class when imbalance still exists in the representative training dataset. The selection of majority class instances to solve the imbalanced data classification problem is novel in the proposed hybrid sampling algorithm. The proposed algorithm is examined on 15 imbalanced datasets. The experiment results revealed that the proposed algorithm achieves good performance in the aspects of performance measures, G-Mean and AUC. The proposed algorithm is compared with the popular sampling ensemble algorithms, and the results revealed that the proposed algorithm outperforms the popular sampling ensemble algorithms.
关键词:Imbalanced data; Classification; k-means clustering; Hybrid sampling; Ant colony optimization