首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Towards Improving Rule-Based Arabic Root Extraction Algorithm for Non-Vocalized Text
  • 本地全文:下载
  • 作者:Nisrean Thalji ; Zyad Thalji ; Sohair Al-Hakeem
  • 期刊名称:International Journal of Computer and Information Technology
  • 印刷版ISSN:2279-0764
  • 出版年度:2018
  • 卷号:7
  • 期号:6
  • 页码:235-242
  • 出版社:International Journal of Computer and Information Technology
  • 摘要:Rooting algorithms are used to remove affixes from different words, and extract the root from which the inputted word is derived. Rooting process helps to standardize terms referring to the same concept. These algorithms are widely used in Arabic language applications, such as information retrieval systems, indexes, text mining, text classifiers, data compression, spelling checkers, text summarization, question answering systems, machine translation, part of speech tagging systems, stemmers, and morphological analyzer ...etc. Khoja’s algorithm is a standard Arabic root extraction algorithm, which has a number of flaws. The proposed algorithm extends Khoja’s algorithm and resolves most of its flaws. The testing process was conducted on Thalji’s corpus, which was mainly built to test and compare Arabic roots extraction algorithms. This corpus contains 720,000 word-root pairs from 12,000 roots. The performance of the proposed algorithm is then compared with Khoja’s algorithm, the proposed algorithm obtained higher accuracy than Khoja’s algorithm. The result shows that Khoja algorithm achieved 63%, and the presented algorithm achieved 92% accuracy of root extraction.
  • 关键词:component; Root Extraction; stem; rules; pattern; prefix; suffix; infix; (key words)
国家哲学社会科学文献中心版权所有