期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2010
卷号:2010
出版社:ACL Anthology
摘要:In this paper we propose a method for the
automatic decipherment of lost languages.
Given a non-parallel corpus in a known related
language, our model produces both
alphabetic mappings and translations of
words into their corresponding cognates.
We employ a non-parametric Bayesian
framework to simultaneously capture both
low-level character mappings and highlevel
morphemic correspondences. This
formulation enables us to encode some of
the linguistic intuitions that have guided
human decipherers. When applied to
the ancient Semitic language Ugaritic, the
model correctly maps 29 of 30 letters to
their Hebrew counterparts, and deduces
the correct Hebrew cognate for 60% of
the Ugaritic words which have cognates in
Hebrew.