期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2017
卷号:95
期号:24
页码:7020
出版社:Journal of Theoretical and Applied
摘要:Dimensionality reduction or feature selection is an essential pre-processing step to apply machine learning algorithm further on any data set. But at for medium dimensional datasets it is optional or on-demand requirement. But it is mandatory in high dimensional datasets. Its significance is increased to get the accurate and relevant output from machine learning algorithm. Most of the existing methods are divided into 2 types one is Dimensionality reduction and the other one is feature selection. There is very narrow gap between these two methods. Dimensionality reduction is more mathematical analysis with transformations and may or may not have same subset of features from original features. Feature selection is application of feature engineering and requires domain knowledge. But any algorithm applicable for high dimensional data requires more processing time and storage resources. We considered the processing time as basis for our problem statement and implemented a distributed algorithm for Feature Selection and named as Distributed Progressive Feature selection algorithm with Knn+Relieff for high dimensional data. In this paper applied MapReduce concept to select final sub set of relevant features in progressive manner. Simulation results showthe feature with its weights for various parameters.