首页    期刊浏览 2024年09月07日 星期六
登录注册

文章基本信息

  • 标题:Comparative Study of Feature Selection Approaches for Urdu Text Categorization
  • 本地全文:下载
  • 作者:Tehseen Zia ; Muhammad Pervez Akhter ; Qaiser Abbas
  • 期刊名称:Malaysian Journal of Computer Science
  • 印刷版ISSN:0127-9084
  • 出版年度:2015
  • 卷号:28
  • 期号:2
  • 出版社:University of Malaya * Faculty of Computer Science and Information Technology
  • 摘要:This paper presentsacomparative study of feature selection methods for Urdu text categorization. Fivewellknownfeature selection methods were analyzedby means ofsixrecognized classification algorithms: support vector machines (with linear, polynomial and radial basis kernels), naive Bayes, knearest neighbour (KNN), and decision tree (i.e. J48). Experimentations are performed on two test collections includinga standard EMILLE collection and a naive collection. We have found that information gain, Chi statistics, and symmetrical uncertainfeature selection methods have uniformly performed in mostly cases. We also found that no solo feature selection technique is best for every classifier.That is,naive Bayes and J48 have advantage with gain ratio than other feature selection methods. Similarly, support vector machines (SVM) and KNN classifiers have shown top performance with information gain.Generally,linear SVM with any of feature selection methods outperformed other classifiers on moderatesize naive collection.Conversely, naive Bayes with any of feature selection technique has an advantage over other classifiers for a smallsize EMILLE corpus.
  • 关键词:Text Categorization; Feature Selection; Urdu; Performance Evaluation; Test Collection
国家哲学社会科学文献中心版权所有