期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2011
卷号:2011
出版社:ACL Anthology
摘要:Several results in the word segmentation literature
suggest that description length provides
a useful estimate of segmentation quality in
fully unsupervised settings. However, since
the space of potential segmentations grows exponentially
with the length of the corpus, no
tractable algorithm follows directly from the
Minimum Description Length (MDL) principle.
Therefore, it is necessary to generate
a set of candidate segmentations and select
between them according to the MDL principle.
We evaluate several algorithms for generating
these candidate segmentations on a
range of natural language corpora, and show
that the Bootstrapped Voting Experts algorithm
consistently outperforms other methods
when paired with MDL.