期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2006
卷号:2006
出版社:ACL Anthology
摘要:The Arabic language is a collection of
spoken dialects with important phonological,
morphological, lexical, and syntactic
differences, along with a standard written
language, Modern Standard Arabic
(MSA). Since the spoken dialects are not
officially written, it is very costly to obtain
adequate corpora to use for training dialect
NLP tools such as parsers. In this paper,
we address the problem of parsing transcribed
spoken Levantine Arabic (LA).We
do not assume the existence of any annotated
LA corpus (except for development
and testing), nor of a parallel corpus LAMSA.
Instead, we use explicit knowledge
about the relation between LA and MSA.