期刊名称:International Journal of Mechatronics, Electrical and Computer Technology
印刷版ISSN:2305-0543
出版年度:2018
卷号:8
期号:30
页码:4018-4028
出版社:Austrian E-Journals of Universal Scientific Organization
摘要:Word embeddings is one of the most interesting natural language processing filed and has
been shown to be a great asset for a large variety of NLP tasks. N-gram language models
are also an important text processing methods. It is also based on statistics of how likely
words are to follow each other. There is a growing body of literature that recognizes the combine two
methods. However, big problem in all previous approaches based on word vectors are different size of
sentences matrices. Some studies have shown the beneficial of calculating the average value of columns
and creating a one dimensional vector including N elements.But the approach has some disadvantages
such as its resistance against word ordering, ignoring sentence length feature and their vector sentences
are very close to each other. This paper proposes a new methodology for introducing an efficient and
very simple rich statistical model of word2Vec approach and n-grams language model to assess unique
size sentence matrices. The unique size resulting matrix does not depend on the language and its
semantic concepts. Our results demonstrate that certain models capture complementary aspects of
coherence evaluation, text summarization, automatic essay scoring, detecting fake and copied texts, text
topic comparison and thus can be combined to improve performance..
关键词:n;grams normalization;natural language processing (NLP); sentence matrix;text