期刊名称:International Journal of Computer and Information Technology
印刷版ISSN:2279-0764
出版年度:2018
卷号:7
期号:6
页码:235-242
出版社:International Journal of Computer and Information Technology
摘要:Rooting algorithms are used to remove affixes from
different words, and extract the root from which the inputted
word is derived. Rooting process helps to standardize terms
referring to the same concept. These algorithms are widely used in
Arabic language applications, such as information retrieval
systems, indexes, text mining, text classifiers, data compression,
spelling checkers, text summarization, question answering
systems, machine translation, part of speech tagging systems,
stemmers, and morphological analyzer ...etc. Khoja’s algorithm is
a standard Arabic root extraction algorithm, which has a number
of flaws. The proposed algorithm extends Khoja’s algorithm and
resolves most of its flaws. The testing process was conducted on
Thalji’s corpus, which was mainly built to test and compare
Arabic roots extraction algorithms. This corpus contains 720,000
word-root pairs from 12,000 roots. The performance of the
proposed algorithm is then compared with Khoja’s algorithm, the
proposed algorithm obtained higher accuracy than Khoja’s
algorithm. The result shows that Khoja algorithm achieved 63%,
and the presented algorithm achieved 92% accuracy of root
extraction.