期刊名称:IAENG International Journal of Computer Science
印刷版ISSN:1819-656X
电子版ISSN:1819-9224
出版年度:2021
卷号:48
期号:1
语种:English
出版社:IAENG - International Association of Engineers
摘要:Text classification is a process of locating text documents automatically into categories based on the text content. In-text classification, there is a stage that has an important role in giving the value of importance to each document, that is term weighting. In the researchers’ previous study, a new supervised term weighting (TF-Assoc) was introduced with the concept of association to optimize term weighting distribution in a case of multiclass classification. To improve the performance of text categorization, this paper proposes a term weighting scheme with a modified association concept, that is mTF-IDF-Assoc. The proposed term weighting scheme considered Document Length (DL). DL was used to normalize the term frequency by dividing it by the length of the document's vector and then formulting IDF and Assoc in calculating the weight of each word. The results showed that mTF-IDF-Assoc implemented with SVM classifier and 10-fold cross-validation technique could outperform the TF-IDF, TF?ICF, and TF-Assoc weighting scheme with an average accuracy of 82.322%.
关键词:text classification;document length;supervised term weighting;association;confidence