期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2011
卷号:2011
出版社:ACL Anthology
摘要:It is well known that parsing accuracy suffers
when a model is applied to out-of-domain
data. It is also known that the most beneficial
data to parse a given domain is data that
matches the domain (Sekine, 1997; Gildea,
2001). Hence, an important task is to select
appropriate domains. However, most previous
work on domain adaptation relied on the
implicit assumption that domains are somehow
given. As more and more data becomes
available, automatic ways to select data that is
beneficial for a new (unknown) target domain
are becoming attractive. This paper evaluates
various ways to automatically acquire related
training data for a given test set. The results
show that an unsupervised technique based on
topic models is effective ¨C it outperforms random
data selection on both languages examined,
English and Dutch. Moreover, the technique
works better than manually assigned labels
gathered from meta-data that is available
for English.