首页    期刊浏览 2024年09月29日 星期日
登录注册

文章基本信息

  • 标题:A Comparative Study of Effective Supervised Learning Methods on Arabic Text Classification
  • 本地全文:下载
  • 作者:Rachid Sammouda
  • 期刊名称:International Journal of Computer Science and Network Security
  • 印刷版ISSN:1738-7906
  • 出版年度:2017
  • 卷号:17
  • 期号:12
  • 页码:130-133
  • 出版社:International Journal of Computer Science and Network Security
  • 摘要:Nowadays, Arabic Text Classification (ATC) is attracting researchers’ attention in many fields, including text mining, web search, social media, security, and other fields. Text Classification or Categorization (TC) is the process of classifying text documents to proper categories based on their contents. Few studies have been developed for the comparison of supervised learning (SL) methods on ATC. Consequently, this paper is concerned with ATC of Arabic documents. The proposed approach adopted for this comparative study consists of three steps: (i) document pre-processing step where Arabic stop words, punctuations, diacritics, common prefix and suffix (Arabic words light stemmer) are removed from the Arabic documents, (ii) document filtering step where the words strings are converted into number of individual words vectors using term frequency transform (TFT) technique, inverse document frequency transform (IDFT) technique and both, (iii) classification step where a comparison of eight effective known SL methods is adopted for ATC. The impact of using TFT, IDFT and both on the effectiveness of these SL methods is also studied. The results show that the accuracy of 10-fold cross validation test mode obtained by LSVM classifier with IDFT technique is the highest compared to other SL methods used in this study. This outcome can be used in the future as a guidance for developers of ATC applications.
  • 关键词:Text Classification of Arabic documents; Supervised Learning Methods; Arabic Light Stemmer; and Weka Tool.
国家哲学社会科学文献中心版权所有