文章基本信息

标题：An Approach for Hybrid Cluster Based Feature Selection on High Dimensional Data
本地全文：下载
作者：P. Ramasita ; S. Rama Sree
期刊名称：International Journal of Computer Science and Information Technologies
电子版ISSN：0975-9646
出版年度：2015
卷号：6
期号：5
页码：4736-4739
出版社：TechScience Publications
摘要：Data Mining is a term that refers to searching a large datasets in an attempt to detect hidden or low-level patterns. Feature selection is also called variable selection or else attributes selection. Feature selection is an algorithm, used as a preprocessing step in machine learning task. It is a method of selecting best subset of exclusive features, as a result that features gap is optimally reduced. In existing system, they failed to remove irrelevant data because computational complication is high and, the result of the datasets is not assured. In proposed method, removing irrelevant data can be done by using the T-relevance and Fcorrelation metrics, and then build the spanning tree by using greedy algorithm is a graph theory that finds a minimum spanning tree for a connected weighted graph. They can effectively and efficiently remove both irrelevant and redundant features to find a good feature subset. So, high dimensional data can be used for offline and online dataset. In future, Time and Space complexity can be reduced using highly developed algorithms which can be enhanced in cloud.
关键词：Feature subset selection; filter method;dimensionality reduction.