期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2012
卷号:43
期号:2
页码:267-273
出版社:Journal of Theoretical and Applied
摘要:In text classification, the purity of the Gini index can be used. When purity value is greater, the characteristic of the information contained in the attribute is higher, and the feature distinguishing capability is stronger. But using the Gini purity formula on feature weight, the classification result is not very good, one of the main reasons is those rare words only appearing in one category and not appearing in other categories can not be strictly differentiated. In order to solve this problem, On the basis of Gini index, an improved feature weight method based on Gini index has proposed. By introducing the approximation quality of features term in the categories, according to the category distinguishing ability adjust term weight, using the purity formula feature weight comparison, the above problem is well solved, which can effectively improve the performance of text classification. The experiments have verified the feasibility of the proposed method.
关键词:Gini Index; Approximation Quality; Term Weigh; Text Classification