期刊名称:International Journal of Applied Mathematics and Computer Science
电子版ISSN:2083-8492
出版年度:2019
卷号:29
期号:4
页码:1-13
DOI:10.2478/amcs-2019-0057
出版社:De Gruyter Open
摘要:The relations between multiple imbalanced classes can be handled with a specialized approach which evaluates types of
examples’ difficulty based on an analysis of the class distribution in the examples’ neighborhood, additionally exploiting
information about the similarity of neighboring classes. In this paper, we demonstrate that such an approach can be implemented
as a data preprocessing technique and that it can improve the performance of various classifiers on multiclass
imbalanced datasets. It has led us to the introduction of a new resampling algorithm, called Similarity Oversampling and
Undersampling Preprocessing (SOUP), which resamples examples according to their difficulty. Its experimental evaluation
on real and artificial datasets has shown that it is competitive with the most popular decomposition ensembles and better
than specialized preprocessing techniques for multi-imbalanced problems.
关键词:imbalanced data; multi;class learning; re;sampling; data difficulty factors; similarity degrees