期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2011
卷号:2011
出版社:ACL Anthology
摘要:In this work we address the problem of
unsupervised part-of-speech induction
by bringing together several strands of
research into a single model. We develop a
novel hidden Markov model incorporating
sophisticated smoothing using a hierarchical
Pitman-Yor processes prior, providing an
elegant and principled means of incorporating
lexical characteristics. Central to our
approach is a new type-based sampling
algorithm for hierarchical Pitman-Yor models
in which we track fractional table counts.
In an empirical evaluation we show that our
model consistently out-performs the current
state-of-the-art across 10 languages.