期刊名称:BVICAM's International Journal of Information Technology
印刷版ISSN:0973-5658
出版年度:2013
卷号:5
期号:2
语种:English
出版社:Bharati Vidyapeeth's Institute of Computer Applications and Management
摘要:Data preprocessing is a very important task in machine learning applications. It includes the methods of data cleaning, normalization, integration, transformation, reduction, feature extraction and selection. Feature selection is the technique for selecting smaller feature subsets from the superset of original features/attributes in order to avoid irrelevant and additional features/attributes in the dataset and hence increases the accuracy rate of machine learning algorithms. However, the problem exists when the further removal of such features results in the decrease of the accuracy rate. Therefore, we need to find an optimal subset of features that is neither too large nor too small from the superset of original features. This paper reviews different feature selection methods- filter, wrapper and embedded, that help in selecting the optimal feature subsets. Further, the paper shows effects of feature selection on different machine learning algorithms- NaiveBayes, RandomForest and kNN). The results have shown different effects on the accuracy rates while selecting the features at different margins.
关键词:Index Terms - Data preprocessing;feature extraction;feature selection;dataset.