首页    期刊浏览 2024年09月21日 星期六
登录注册

文章基本信息

  • 标题:CorporAl: a Method and Tool for Handling Overlapping Parallel Corpora
  • 作者:Mark Fishel ; Heiki-Jaan Kaalep
  • 期刊名称:The Prague Bulletin of Mathematical Linguistics
  • 印刷版ISSN:0032-6585
  • 电子版ISSN:1804-0462
  • 出版年度:2010
  • 卷号:94
  • 期号:1
  • 页码:67-76
  • DOI:10.2478/v10108-010-0021-7
  • 语种:English
  • 出版社:Walter de Gruyter GmbH
  • 摘要:This work introduces a method and tool for handling overlapping parallel corpora — i.e. corpora that are based on the same source material. The method is insensitive to minor changes in the text, different segmentation levels of the corpora and omitted material from either corpora. The aim is to detect matching sentence pairs and either produce combinations of the overlapping corpora or compare them and assess their quality in comparison to each other. The introduced tool enables the user to define the desired behavior when combining corpora pairs, resulting in pure comparison, maximum-size or maximum-quality versions of the combinations. We test the tool on two cases of overlapping parallel corpora and five language pairs. We also evaluate the impact of using the method on two translation systems — a phrase-based and a parsing-based one.
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有