期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2010
卷号:2010
出版社:ACL Anthology
摘要:This paper demonstrates that the use of
ensemble methods and carefully calibrating
the decision threshold can significantly
improve the performance of machine
learning methods for morphological
word decomposition. We employ two
algorithms which come from a family of
generative probabilistic models. The models
consider segment boundaries as hidden
variables and include probabilities for letter
transitions within segments. The advantage
of this model family is that it can
learn from small datasets and easily generalises
to larger datasets. The first algorithm
PROMODES, which participated in
the Morpho Challenge 2009 (an international
competition for unsupervised morphological
analysis) employs a lower order
model whereas the second algorithm
PROMODES-H is a novel development of
the first using a higher order model. We
present the mathematical description for
both algorithms, conduct experiments on
the morphologically rich language Zulu
and compare characteristics of both algorithms
based on the experimental results.