文章基本信息

标题：Automatic Kurdish Dialects Identification
本地全文：下载
作者：Hossein Hassani ; Dzejla Medjedovic
期刊名称：Computer Science & Information Technology
电子版ISSN：2231-5403
出版年度：2016
卷号：6
期号：3
页码：61-78
DOI：10.5121/csit.2016.60307
出版社：Academy & Industry Research Collaboration Center (AIRCC)
摘要：Automatic dialect identification is a necessary Language Technology for processing multidialectlanguages in which the dialects are linguistically far from each other. Particularly, thisbecomes crucial where the dialects are mutually unintelligible. Therefore, to performcomputational activities on these languages, the system needs to identify the dialect that is thesubject of the process. Kurdish language encompasses various dialects. It is written usingseveral different scripts. The language lacks of a standard orthography. This situation makesthe Kurdish dialectal identification more interesting and required, both form the research andfrom the application perspectives. In this research, we have applied a classification method,based on supervised machine learning, to identify the dialects of the Kurdish texts. The researchhas focused on two widely spoken and most dominant Kurdish dialects, namely, Kurmanji andSorani. The approach could be applied to the other Kurdish dialects as well. The method is alsoapplicable to the languages which are similar to Kurdish in their dialectal diversity anddifferences.
关键词：Dialect identification; NLP; Kurdish language; Kurmanji; Sorani