摘要:At present, most of the researches on feature selection do not consider the relevance between a term and its own category, the redundancy among terms. In order to solve this problem efficiently, we propose a new feature selection based on analyzing how to measure the relevance and the redundancy, which use Euclidean distance as the similarity calculation method. R2, the new feature selection algorithm, can obtain the optimal feature subset which has considered the correlations between term and category and filtered the redundant terms. Finally, the validity of the new algorithm in feature selection is validated by the classification experiments on Chinese classification corpus by two classifiers, including KNN and Centroid-based classifier.