文章基本信息

标题：A Large-scale Evaluation of Neural Machine Transliteration for Indic Languages
本地全文：下载
作者：Anoop Kunchukuttan ; Siddharth Jain ; Rahul Kejriwal 等
期刊名称：Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度：2021
卷号：2021
页码：3469-3475
DOI：10.18653/v1/2021.eacl-main.303
语种：English
出版社：ACL Anthology
摘要：We take up the task of large-scale evaluation of neural machine transliteration between English and Indic languages, with a focus on multilingual transliteration to utilize orthographic similarity between Indian languages. We create a corpus of 600K word pairs mined from parallel translation corpora and monolingual corpora, which is the largest transliteration corpora for Indian languages mined from public sources. We perform a detailed analysis of multilingual transliteration and propose an improved multilingual training recipe for Indic languages. We analyze various factors affecting transliteration quality like language family, transliteration direction and word origin.