首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Sentence Matrix Normalization Using Most Likely N-grams Vector
  • 本地全文:下载
  • 作者:Mohamad Abdolahi ; Morteza Zahedi
  • 期刊名称:International Journal of Mechatronics, Electrical and Computer Technology
  • 印刷版ISSN:2305-0543
  • 出版年度:2018
  • 卷号:8
  • 期号:30
  • 页码:4018-4028
  • 出版社:Austrian E-Journals of Universal Scientific Organization
  • 摘要:Word embeddings is one of the most interesting natural language processing filed and has been shown to be a great asset for a large variety of NLP tasks. N-gram language models are also an important text processing methods. It is also based on statistics of how likely words are to follow each other. There is a growing body of literature that recognizes the combine two methods. However, big problem in all previous approaches based on word vectors are different size of sentences matrices. Some studies have shown the beneficial of calculating the average value of columns and creating a one dimensional vector including N elements.But the approach has some disadvantages such as its resistance against word ordering, ignoring sentence length feature and their vector sentences are very close to each other. This paper proposes a new methodology for introducing an efficient and very simple rich statistical model of word2Vec approach and n-grams language model to assess unique size sentence matrices. The unique size resulting matrix does not depend on the language and its semantic concepts. Our results demonstrate that certain models capture complementary aspects of coherence evaluation, text summarization, automatic essay scoring, detecting fake and copied texts, text topic comparison and thus can be combined to improve performance..
  • 关键词:n;grams normalization;natural language processing (NLP); sentence matrix;text
国家哲学社会科学文献中心版权所有