期刊名称:The International Arab Journal of Information Technology
印刷版ISSN:1683-3198
出版年度:2021
卷号:18
期号:1
DOI:10.34028/iajit/18/1/12
语种:English
出版社:Zarqa Private University
摘要:One of the noteworthy difficulties in the classification of nonstationary data is handling data with class imbalance. Imbalanced data possess the characteristics of having a lot of samples of one class than the other. It, thusly, results in the biased accuracy of a classifier in favour of a majority class. Streaming data may have inherent imbalance resulting from the nature of dataspace or extrinsic imbalance due to its nonstationary environment. In streaming data, timely varying class priors may lead to a shift in imbalance ratio. The researchers have contemplated ensemble learning, online learning, issue of class imbalance and cost-sensitive algorithms autonomously. They have scarcely ever tended to every one of these issues mutually to deal with imbalance shift in nonstationary data. This correspondence shows a novel methodology joining these perspectives to augment G-mean in no stationary data with Recurrent Imbalance Shifts (RIS). This research modifies the state-of-the-art boosting algorithms,1) AdaC2 to get G-mean based Online AdaC2 for Recurrent Imbalance Shifts (GOA-RIS) and AGOA-RIS (Ageing and G-mean based Online AdaC2 for Recurrent Imbalance Shifts), and 2) CSB2 to get G-mean based Online CSB2 for Recurrent Imbalance Shifts (GOC-RIS) and Ageing and G-mean based Online CSB2 for Recurrent Imbalance Shifts (AGOC-RIS). The study has empirically and statistically analysed the performances of the proposed algorithms and Online AdaC2 (OA) and Online CSB2 (OC) algorithms using benchmark datasets. The test outcomes demonstrate that the proposed algorithms globally beat the performances of OA and OC.
关键词:Cost-sensitive algorithms;data stream classification;imbalanced data;online learning;population shift;skewed data stream