首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Comparison of Korean Preprocessing Performance according to Tokenizer in NMT Transformer Model
  • 本地全文:下载
  • 作者:Geumcheol Kim ; Sang-Hong Lee
  • 期刊名称:Journal of Advances in Information Technology
  • 印刷版ISSN:1798-2340
  • 出版年度:2020
  • 卷号:11
  • 期号:4
  • 页码:228-232
  • DOI:10.12720/jait.11.4.228-232
  • 出版社:Academy Publisher
  • 摘要:Mechanical translation using neural networks in natural language processing is making rapid progress. With the development of natural language processing model and tokenizer, accurate translation is becoming possible. In this paper, we will create a transformer model that shows high performance recently and compare the performance of English Korean according to tokenizer. We made a traditional neural network-based Neural Machine Translation (NMT) model using a transformer and compared the Korean translation results according to the tokenizer. The Byte Pair Encoding (BPE)-based Tokenizer showed a small vocabulary size and a fast learning speed, but due to the nature of Korean, the translation result was not good. The morphological analysis-based Tokenizer showed that the parallel corpus data is large and the vocabulary is large, the performance is higher regardless of the characteristics of the language.
国家哲学社会科学文献中心版权所有