首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Synthetic data with neural machine translation for automatic correction in arabic grammar
  • 本地全文:下载
  • 作者:Aiman Solyman ; Wang Zhenyu ; Tao Qian
  • 期刊名称:Egyptian Informatics Journal
  • 印刷版ISSN:1110-8665
  • 出版年度:2021
  • 卷号:22
  • 期号:3
  • 页码:303-315
  • DOI:10.1016/j.eij.2020.12.001
  • 语种:English
  • 出版社:Elsevier
  • 摘要:AbstractThe automatic correction of grammar and spelling errors is important for students, second language learners, and some Natural Language Processing (NLP) tasks such as part of speech and text summarization. Recently, Neural Machine Translation (NMT) has been an out-performing and well-established model in the task of Grammar Error Correction (GEC). Arabic GEC is still growing because of some challenges, such as scarcity of training sets and the complexity of Arabic language. To overcome these issues, we introduced an unsupervised method to generate large-scale synthetic training data based on confusion function to increase the amount of training set. Furthermore, we introduced a supervised NMT model for AGEC called SCUT AGEC. SCUT AGEC is a convolutional sequence-to-sequence model consisting of nine encoder-decoder layers with attention mechanism. We applied fine-tuning to improve the performance and get more efficient results. Convolutional Neural Networks (CNN) gives our model ability to joint feature extraction and classification in one task and we proved that it is an efficient way to capture features of the local context. Moreover, it is easy to obtain long-term dependencies because of convolutional layers staking. Our proposed model becomes the first supervised AGEC system based on the convolutional sequence-to-sequence learning to outperforms the current state-of-the-art neural AGEC models.
  • 关键词:Natural language processing;Convolutional neural networks;Arabic grammar error correction
国家哲学社会科学文献中心版权所有