期刊名称:International Journal of Electrical and Computer Engineering
电子版ISSN:2088-8708
出版年度:2020
卷号:10
期号:4
页码:4331-4339
DOI:10.11591/ijece.v10i4.pp4331-4339
出版社:Institute of Advanced Engineering and Science (IAES)
摘要:STORET is one method to determine the river water quality into four classes (very good , good, medium and bad) based on the data of water for each attribute or feature. The success of the formation of pattern recognition model much depends on the quality of data. There are two issues as the concern of this research as follows: the data having disproportionate amount among the classes (imbalance class) and the finding of noise on its attribute. Therefore, this research integrates the SMOTE Technique and bootstrapping to handle the problem of imbalance class. While an experiment is conducted to eliminate the noise on the attribute by using some feature selection algorithms with filter approach (information gain, rule, derivation, correlation and chi square). This research has some stages as follows: data understanding, pre-processing, imbalance class, feature selection, classification and performance evaluation. Based on the result of testing using 10-fold cross validation, it shows that the use of the SMOTE-bootstrapping technique is able to increase the accurate value from 83.3% to be 98.8%. While the process of noise elimination on the data attribute is also able to increase the accuracy to be 99.5% (the use of feature subset produced by the information gain algorithm and the decision tree classification algorithm).
关键词:Feature selection water quality status;Imbalance class;Bootstrapping;SMOTE;STORET