期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2013
卷号:50
期号:1
出版社:Journal of Theoretical and Applied
摘要:The continuous growth of information on the Internet and the availability of a large mass of electronic documents in Arabic language make Natural Language processing (NLP) tasks play an important role to enhance and facilitate the access and the exploitation of information. Among available NLP tasks, we are interested in Arabic Topic Detection. Our objective is to realize an indexing system capable of identifying the general topics discussed in Arabic unvowelized documents. The proposed topic detection system of Arabic texts is based on Mutual Information for Topic Oriented Vocabulary (TOV) and classification according to Jaccard and adapted TF-IDF indicators. The experimental results are presented in terms of precision, recall and F1 measure evaluating the influence of factors such as: vocabulary length and morphological analysis on Arabic Topic Detection.
关键词:Natural language processing (NLP); Topic Detection (TD); Topic Oriented Vocabulary (TOV); Mutual Information (MI); Jaccard Indicator; TF-IDF