出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Automatic dialect identification is a necessary Language Technology for processing multidialectlanguages in which the dialects are linguistically far from each other. Particularly, thisbecomes crucial where the dialects are mutually unintelligible. Therefore, to performcomputational activities on these languages, the system needs to identify the dialect that is thesubject of the process. Kurdish language encompasses various dialects. It is written usingseveral different scripts. The language lacks of a standard orthography. This situation makesthe Kurdish dialectal identification more interesting and required, both form the research andfrom the application perspectives. In this research, we have applied a classification method,based on supervised machine learning, to identify the dialects of the Kurdish texts. The researchhas focused on two widely spoken and most dominant Kurdish dialects, namely, Kurmanji andSorani. The approach could be applied to the other Kurdish dialects as well. The method is alsoapplicable to the languages which are similar to Kurdish in their dialectal diversity anddifferences.