文章基本信息

标题：Sentence Matrix Normalization Using Most Likely N-grams Vector
本地全文：下载
作者：Mohamad Abdolahi ; Morteza Zahedi
期刊名称：International Journal of Mechatronics, Electrical and Computer Technology
印刷版ISSN：2305-0543
出版年度：2018
卷号：8
期号：30
页码：4018-4028
出版社：Austrian E-Journals of Universal Scientific Organization
摘要：Word embeddings is one of the most interesting natural language processing filed and has been shown to be a great asset for a large variety of NLP tasks. N-gram language models are also an important text processing methods. It is also based on statistics of how likely words are to follow each other. There is a growing body of literature that recognizes the combine two methods. However, big problem in all previous approaches based on word vectors are different size of sentences matrices. Some studies have shown the beneficial of calculating the average value of columns and creating a one dimensional vector including N elements.But the approach has some disadvantages such as its resistance against word ordering, ignoring sentence length feature and their vector sentences are very close to each other. This paper proposes a new methodology for introducing an efficient and very simple rich statistical model of word2Vec approach and n-grams language model to assess unique size sentence matrices. The unique size resulting matrix does not depend on the language and its semantic concepts. Our results demonstrate that certain models capture complementary aspects of coherence evaluation, text summarization, automatic essay scoring, detecting fake and copied texts, text topic comparison and thus can be combined to improve performance..
关键词：n;grams normalization;natural language processing (NLP); sentence matrix;text