首页    期刊浏览 2025年08月18日 星期一
登录注册

文章基本信息

  • 标题:Utilizing Arabic WordNet Relations in Arabic Text Classification: New Feature Selection Methods
  • 本地全文:下载
  • 作者:Suhad A. Yousif ; Zainab N. Sultani ; Venus W. Samawi
  • 期刊名称:IAENG International Journal of Computer Science
  • 印刷版ISSN:1819-656X
  • 电子版ISSN:1819-9224
  • 出版年度:2019
  • 卷号:46
  • 期号:4
  • 页码:750-761
  • 出版社:IAENG - International Association of Engineers
  • 摘要:The availability of Arabic text documents on the Internet entails the use of convenient Arabic text classification (TC) techniques. Arabic TC requires extensive work in analyzing the content of valuable Arabic documents. Its rich vocabulary, semantic ambiguity, and words with semantic relations characterize the Arabic language. Therefore, using a bag-of-words (BoWs) text representation model may yield unsatisfactory results. This study is concerned with utilizing synsets and semantic relations from the original words to enhance Arabic TC accuracy. These relations are extracted using the Arabic WordNet (AWN) thesaurus as a lexical and semantic provenance. AWN provides various semantic relations for the original word. Some relations are more beneficial than others with respect to dataset content. Consequently, we suggest either assigning a weight for each relation, at which, the effect of weak relations can be minimized and the strong relations can be boosted, or selection of appropriate semantic relations . In this paper, two approaches are suggested, relation weighting scheme and relation grouping scheme. At the first approach, a developed weighting scheme for assigning weights to relations and their respective words, on the bases of Akhbar Al Khaleej dataset, is proposed. This method generates a large training file that contains the original words along with the corresponding relations extracted from AWN, as well as their weights. The second approach is based on relation grouping, at which two different types of relations are grouped based on one of three criterions (related semantic meaning, frequency occurrence (FO) of relations in AWN, and the ratio between the FO of relations in the dataset with respect to the FO of the corresponding relation in the AWN). Naive Bayes is used as a classifier, and F1 measure is used to assess the performance of the proposed methods. Tenfold cross-validation scheme is used to reduce the variability of the results. The efficiencies of the suggested methods are illustrated through the weighting scheme and semantic relation grouping. Results show that the proposed methods outperform the classic BoWs and statistical feature selection methods (Chi-Square and Information Gain). The grouping methods enhance classification accuracy and reduce feature dimensionality.
  • 关键词:Feature Selection; Machine Learning; Naive Bayes; Relations AWN; Semantic; Text Classification
国家哲学社会科学文献中心版权所有