首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation
  • 本地全文:下载
  • 作者:Michael Adjeisah ; Guohua Liu ; Douglas Omwenga Nyabuga
  • 期刊名称:Computational Intelligence and Neuroscience
  • 印刷版ISSN:1687-5265
  • 电子版ISSN:1687-5273
  • 出版年度:2021
  • 卷号:2021
  • 页码:1-10
  • DOI:10.1155/2021/6682385
  • 出版社:Hindawi Publishing Corporation
  • 摘要:Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.
国家哲学社会科学文献中心版权所有