首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Statistical Models for Unsupervised, Semi-Supervised, and Supervised Transliteration Mining
  • 本地全文:下载
  • 作者:Hassan Sajjad ; Helmut Schmid ; Alexander Fraser
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2017
  • 卷号:43
  • 期号:2
  • 页码:349-375
  • DOI:10.1162/COLI_a_00286
  • 语种:English
  • 出版社:MIT Press
  • 摘要:We present a generative model that efficiently mines transliteration pairs in a consistent fashion in three different settings: unsupervised, semi-supervised, and supervised transliteration mining. The model interpolates two sub-models, one for the generation of transliteration pairs and one for the generation of non-transliteration pairs (i.e., noise). The model is trained on noisy unlabeled data using the EM algorithm. During training the transliteration sub-model learns to generate transliteration pairs and the fixed non-transliteration model generates the noise pairs. After training, the unlabeled data is disambiguated based on the posterior probabilities of the two sub-models. We evaluate our transliteration mining system on data from a transliteration mining shared task and on parallel corpora. For three out of four language pairs, our system outperforms all semi-supervised and supervised systems that participated in the NEWS 2010 shared task. On word pairs extracted from parallel corpora with fewer than 2% transliteration pairs, our system achieves up to 86.7% F-measure with 77.9% precision and 97.8% recall.
国家哲学社会科学文献中心版权所有