首页    期刊浏览 2025年05月25日 星期日
登录注册

文章基本信息

  • 标题:Effective Measures of Domain Similarity for Parsing
  • 本地全文:下载
  • 作者:Barbara Plank ; Gertjan van Noord
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2011
  • 卷号:2011
  • 出版社:ACL Anthology
  • 摘要:It is well known that parsing accuracy suffers when a model is applied to out-of-domain data. It is also known that the most beneficial data to parse a given domain is data that matches the domain (Sekine, 1997; Gildea, 2001). Hence, an important task is to select appropriate domains. However, most previous work on domain adaptation relied on the implicit assumption that domains are somehow given. As more and more data becomes available, automatic ways to select data that is beneficial for a new (unknown) target domain are becoming attractive. This paper evaluates various ways to automatically acquire related training data for a given test set. The results show that an unsupervised technique based on topic models is effective ¨C it outperforms random data selection on both languages examined, English and Dutch. Moreover, the technique works better than manually assigned labels gathered from meta-data that is available for English.
国家哲学社会科学文献中心版权所有