期刊名称:International Journal of Computer and Information Technology
印刷版ISSN:2279-0764
出版年度:2016
卷号:5
期号:1
页码:55
出版社:International Journal of Computer and Information Technology
摘要:Arabic language is one of the top 10 Most Spoken Languages in the World. It belongs to Semitic group of languages. Technology has been slow in development for Arabic due to morphological and structural complexity in the language. Arabic language requires good stemming for effective information retrieval. Many light stemmers have been developed but still suffer weaknesses and high percentage of errors. No standard approach has been emerged yet. In this paper a new effective light stemmer algorithm has been developed overcoming many limitations of previous approaches. The new technique taking into account truncate the word infixes in addition to prefixes and suffixes based on simple rules. Proposed stemming method was found to supersede the other stemming ones. It has been tested and compared with root-based stemmers developed by Khoja [11]. Correctness, strength and similarity of both stemming algorithms are reported.
关键词:Arabic stemmer; Stemming Algorithm; Light ; stemmer; Information Retrieval