首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:Noise-aware Character Alignment for Extracting Transliteration Fragments
  • 本地全文:下载
  • 作者:Katsuhito Sudoh ; Shinsuke Mori ; Masaaki Nagata
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2015
  • 卷号:10
  • 期号:1
  • 页码:88-112
  • DOI:10.11185/imt.10.88
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:This paper proposes a novel noise-aware character alignment method for automatically extracting transliteration fragments in phrase pairs that are extracted from parallel corpora. The proposed method extends a many-to-many Bayesian character alignment method by distinguishing transliteration (signal) parts from non-transliteration (noise) parts. The model can be trained efficiently by a state-based blocked Gibbs sampling algorithm with signal and noise states. The proposed method bootstraps statistical machine transliteration using the extracted transliteration fragments to train transliteration models. In experiments using Japanese-English patent data, the proposed method was able to extract transliteration fragments with much less noise than an IBM-model-based baseline, and achieved better transliteration performance than sample-wise extraction in transliteration bootstrapping.
  • 关键词:Statistical Machine Transliteration;Bayesian Many-to-many Alignment;Machine Translation
国家哲学社会科学文献中心版权所有