首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:Neural machine translation system for the Kazakh language based on synthetic corpora
  • 本地全文:下载
  • 作者:Ualsher Tukeyev ; Aidana Karibayeva ; Balzhan Abduali
  • 期刊名称:MATEC Web of Conferences
  • 电子版ISSN:2261-236X
  • 出版年度:2019
  • 卷号:252
  • DOI:10.1051/matecconf/201925203006
  • 语种:English
  • 出版社:EDP Sciences
  • 摘要:The lack of big parallel data is present for the Kazakh language. This problem seriously impairs the quality of machine translation from and into Kazakh. This article considers the neural machine translation of the Kazakh language on the basis of synthetic corpora. The Kazakh language belongs to the Turkic languages, which are characterised by rich morphology. Neural machine translation of natural languages requires large training data. The article will show the model for the creation of synthetic corpora, namely the generation of sentences based on complete suffixes for the Kazakh language. The novelty of this approach of the synthetic corpora generation for the Kazakh language is the generation of sentences on the basis of the complete system of suffixes of the Kazakh language. By using generated synthetic corpora we are improving the translation quality in neural machine translation of Kazakh-English and Kazakh-Russian pairs.
国家哲学社会科学文献中心版权所有