文章基本信息

标题：Survey On: Comparison of Clustering Based Feature Subset Selection Algorithms for High Dimensional Data
本地全文：下载
作者：Vishnu M. Tore ; Prof. P. M. Chawan ; Prof. S. A. Khedkar 等
期刊名称：International Journal of Innovative Research in Science, Engineering and Technology
印刷版ISSN：2347-6710
电子版ISSN：2319-8753
出版年度：2016
卷号：5
期号：2
页码：1505
DOI：10.15680/IJIRSET.2016.0502046
出版社：S&S Publications
摘要：In data mining Feature selection is the area which is mostly used as input for high dimensional data foreffective data mining. Feature selection is used to identify most relevant feature amongst all.This selection can bemeasured in terms of effectiveness and efficiency. While effectiveness concerns the quality of the subset of featuresand efficiency is related to time required to find subset of features. The main idea while using a feature selectionalgorithm is that input data contains many redundant as well as irrelevant features. There are some or less drawbacksof these algorithms, amongst these some algorithms can eliminate irrelevant features but fail to handle redundantfeatures and others can remove the irrelevant features taking care of the redundant features. Based on these constraint,a FAST (fast clustering-based feature selection algorithm is proposed).The FAST algorithm has two steps. 1. featuresare divided into clusters by using non-linear clustering methods. 2. the most relevant feature that is strongly related totarget classes is selected from each cluster to form a subset of features.For this we use MST(Minimum Spanning Tree)using Kruskal’s Algorithm clustering based method.
关键词：Data mining; Feature subset selection; Feature selection; relevant features; redundant features