首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:The Effectiveness of Stemming in the Stylometric Authorship Attribution in Arabic
  • 本地全文:下载
  • 作者:Abdulfattah Omar ; Wafya Ibrahim Hamouda
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2020
  • 卷号:11
  • 期号:1
  • 页码:116-121
  • 出版社:Science and Information Society (SAI)
  • 摘要:The recent years have witnessed the development of numerous approaches to authorship attribution including statistical and linguistic methods. Stylometric authorship attribution, however, remains among the most widely used due to its accuracy and effectiveness. Nevertheless, many authorship problems remain unresolved in terms of Arabic. This can be attributed to different factors including linguistic peculiarities that are not usually considered in standard authorship systems. In the case of Arabic, the morphological features carry unique stylistic features that can be usefully used in testing authorship in controversial texts and writings. The hypothesis is that much of these morphological features are lost due to the execution of stemming. As such, this study is concerned with investigating the effectiveness of stemming in the stylometric applications to authorship attribution in Arabic. In so doing, three Arabic stemmers GOLD stemmer, Khoga stemmer, Light 10 stemmer are used. By way of illustration, a corpus of 2400 news articles written by different 97 authors is designed. To evaluate the effectiveness of stemming, the selected articles (both stemmed and unstemmed texts) are clustered using cluster analysis methods. Comparisons are made between clustering structures based on stemmed and unstemmed datasets. The results indicate that stemming has negative impacts on the accuracy of the clustering performance and thus on the reliability of stylometric authorship testing in Arabic. The peculiar stylistic features of the affixation processes in Arabic can, thus, be usefully used for improving the performance of authorship attribution applications in Arabic. It can be finally concluded that stemming is not effective in the stylometric authorship applications in Arabic.
  • 关键词:Authorship attribution; cluster analysis; GOLD stemmer; Khoga stemmer; Light 10 stemmer; stemming; stylometry
国家哲学社会科学文献中心版权所有