期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:2
页码:2020
DOI:10.15680/IJIRCCE.2017.0502174
出版社:S&S Publications
摘要:Classification problems in high dimensional data with small number of observations are becoming morecommon especially in microarray data.Theincreasing amount of text information on the Internet web pages affects theclustering analysis[1]. The text clustering is a favorable analysis technique used for partitioning a massive amount ofinformation into clusters. Hence, the major problem that affects the text clustering technique is the presenceuninformative and sparse features in text documents. A broad class of boosting algorithms can be interpreted asperforming coordinate-wise gradient descent to minimize some potential function of the margins of a data set[1]. Thispaper proposes a new evaluation measure Q-statistic that incorporates the stability of the selected feature subset inaddition to the prediction accuracy. Then we propose the Booster of an FS algorithm that boosts the value of the Qstatisticof the algorithm applied.
关键词:high dimensional data classification; feature selection; stability; Q-statistic; Booster