文章基本信息

标题：Efficient Feature Subset Selection Algorithm for High Dimensional Data
其他标题：Efficient Feature Subset Selection Algorithm for High Dimensional Data
本地全文：下载
作者：Smita Chormunge ; Sudarson Jena
期刊名称：International Journal of Electrical and Computer Engineering
电子版ISSN：2088-8708
出版年度：2016
卷号：6
期号：4
页码：1880-1888
DOI：10.11591/ijece.v6i4.pp1880-1888
语种：English
出版社：Institute of Advanced Engineering and Science (IAES)
摘要：Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance.
其他摘要：Feature selection approach solves the dimensionality problem by removing irrelevant and redundant features. Existing Feature selection algorithms take more time to obtain feature subset for high dimensional data. This paper proposes a feature selection algorithm based on Information gain measures for high dimensional data termed as IFSA (Information gain based Feature Selection Algorithm) to produce optimal feature subset in efficient time and improve the computational performance of learning algorithms. IFSA algorithm works in two folds: First apply filter on dataset. Second produce the small feature subset by using information gain measure. Extensive experiments are carried out to compare proposed algorithm and other methods with respect to two different classifiers (Naive bayes and IBK) on microarray and text data sets. The results demonstrate that IFSA not only produces the most select feature subset in efficient time but also improves the classifier performance.
关键词：Computer and Informatics;Feature selection; Information gain; filters; naive bayes; k-nearest neighbors; classifiers