期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:9
DOI:10.14569/IJACSA.2021.0120914
语种:English
出版社:Science and Information Society (SAI)
摘要:This paper presents the solution to the problem of summarizing Kazakh texts. The problem of Kazakh text summarization is considered as a sequence of two tasks: extracting the most important sentences of the text and simplifying the received sentences. The task of extracting the most important sentences of the text is solved using the TF-IDF method and the task of simplifying sentences is solved using the neural network technology “Seq2Seq”. Problem of using NMT method for simplification of Kazakh was in absence of Kazakh dataset for training. To solve this problem in this work propose use transfer learning method. The use of transfer learning made it possible to use a ready-made model that was trained on a parallel corpus of Simple English Wikipedia and not create a simplification corpus in Kazakh from scratch. For this, a transfer learning technology for simplifying sentences of the Kazakh language has been developed, based on training a neural model for simplifying sentences in the English language. Main scientific contribution of this work is transfer learning technology for the simplification of Kazakh sentences using the parallel corpus of the English language simplification.
关键词:Summarization; text simplification; low-resource language; seq2seq; transfer learning