首页    期刊浏览 2025年04月05日 星期六
登录注册

文章基本信息

  • 标题:A Dirichlet Process Mixture Based Name Origin Clustering and Alignment Model for Transliteration
  • 本地全文:下载
  • 作者:Chunyue Zhang ; Tiejun Zhao ; Tingting Li
  • 期刊名称:Advances in Artificial Intelligence
  • 印刷版ISSN:1687-7470
  • 电子版ISSN:1687-7489
  • 出版年度:2015
  • 卷号:2015
  • DOI:10.1155/2015/927063
  • 出版社:Hindawi Publishing Corporation
  • 摘要:In machine transliteration, it is common that the transliterated names in the target language come from multiple language origins. A conventional maximum likelihood based single model can not deal with this issue very well and often suffers from overfitting. In this paper, we exploit a coupled Dirichlet process mixture model (cDPMM) to address overfitting and names multiorigin cluster issues simultaneously in the transliteration sequence alignment step over the name pairs. After the alignment step, the cDPMM clusters name pairs into many groups according to their origin information automatically. In the decoding step, in order to use the learned origin information sufficiently, we use a cluster combination method (CCM) to build clustering-specific transliteration models by combining small clusters into large ones based on the perplexities of name language and transliteration model, which makes sure each origin cluster has enough data for training a transliteration model. On the three different Western-Chinese multiorigin names corpora, the cDPMM outperforms two state-of-the-art baseline models in terms of both the top-1 accuracy and mean F-score, and furthermore the CCM significantly improves the cDPMM.
国家哲学社会科学文献中心版权所有