期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2013
卷号:58
期号:3
出版社:Journal of Theoretical and Applied
摘要:While a wide range of methods has been conducted to English terminology extraction, relatively few studies have been applied to Arabic terms extraction in Islamic corpus. In this paper, we present an efficient approach for automatic extraction of Arabic Terminology (SWTs, MWTs). The approach relies on two main filtering steps: the linguistic filter, where simple part of speech (POS) tagger is used to extract candidate MWTs matching given syntactic patterns, and the statistical filter where several statistical methods (PMI, Kappa, CHI-squire, T-test, Piatersky- Shapiro and Rank Aggregation) are used to rank candidate MWTs and we applied IF.IDF to rank the SWTs candidate. Our approach extracted the bi-gram candidates of MWTs Islamic term from corpus and evaluated the association measures (STWs and MWTs) by using the n-best evaluation method.
关键词:Term Extraction; SWTs; MWTs; Association measures; n-best evaluation