期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2012
卷号:2012
出版社:ACL Anthology
摘要:We present a method for generating Colloquial
Egyptian Arabic (CEA) from morphologically disambiguated
Modern Standard Arabic (MSA).
When used in POS tagging, this process improves
the accuracy from 73.24% to 86.84% on unseen
CEA text, and reduces the percentage of out-ofvocabulary
words from 28.98% to 16.66%. The
process holds promise for any NLP task targeting
the dialectal varieties of Arabic; e.g., this approach
may provide a cheap way to leverage MSA data
and morphological resources to create resources
for colloquial Arabic to English machine translation.
It can also considerably speed up the annotation
of Arabic dialects.