摘要:As the basic research applied in patternrecognition, machine leaning, data mining and other fields, themain purpose of feature extraction is to achieve low loss of datadimensionality reduction. Among all the dimensionalityreduction algorithm, the classical statistical theory is the mostwidely used, the feature variance total contribution ratio(VTCR) is mostly used to measure the effect of evaluationcriteria for feature extraction. Traditional VTCR only focuseson the nature of the samples’ correlation matrix eigenvalue butnot the information measurement, resulting in large loss ofinformation for feature extraction. Shannon informationentropy is introduced into feature extraction algorithm, thegeneralized class probability and the class information functionare defined, the contributive ratio for VTCR is improved.Finally, the dimensions of feature extraction are determined bycalculating the accumulate information ratio (AIR), whichcould achieve good evaluation in respect of information theory.By combining the new methods with principal componentanalysis (PCA) and factor analysis (FA) respectively, anoptimized VTCR feature dimensionality reduction algorithmbased on information entropy is established; the number offeature dimensions extracted is calculated by AIR. By theexperiment, the results show that, the low-dimensional data hasmore interpretability, and the new algorithm has highercompression ratio.
关键词:Feature Extraction; Variance Total Contribution Ratio; Shannon Entropy; Accumulate Information Ratio