文章基本信息

标题：Synthetic data with neural machine translation for automatic correction in arabic grammar
本地全文：下载
作者：Aiman Solyman ; Wang Zhenyu ; Tao Qian 等
期刊名称：Egyptian Informatics Journal
印刷版ISSN：1110-8665
出版年度：2021
卷号：22
期号：3
页码：303-315
DOI：10.1016/j.eij.2020.12.001
语种：English
出版社：Elsevier
摘要：AbstractThe automatic correction of grammar and spelling errors is important for students, second language learners, and some Natural Language Processing (NLP) tasks such as part of speech and text summarization. Recently, Neural Machine Translation (NMT) has been an out-performing and well-established model in the task of Grammar Error Correction (GEC). Arabic GEC is still growing because of some challenges, such as scarcity of training sets and the complexity of Arabic language. To overcome these issues, we introduced an unsupervised method to generate large-scale synthetic training data based on confusion function to increase the amount of training set. Furthermore, we introduced a supervised NMT model for AGEC called SCUT AGEC. SCUT AGEC is a convolutional sequence-to-sequence model consisting of nine encoder-decoder layers with attention mechanism. We applied fine-tuning to improve the performance and get more efficient results. Convolutional Neural Networks (CNN) gives our model ability to joint feature extraction and classification in one task and we proved that it is an efficient way to capture features of the local context. Moreover, it is easy to obtain long-term dependencies because of convolutional layers staking. Our proposed model becomes the first supervised AGEC system based on the convolutional sequence-to-sequence learning to outperforms the current state-of-the-art neural AGEC models.
关键词：Natural language processing;Convolutional neural networks;Arabic grammar error correction