首页    期刊浏览 2025年08月05日 星期二
登录注册

文章基本信息

  • 标题:Corpus-Based Paraphrase Detection Experiments and Review
  • 本地全文:下载
  • 作者:Tedo Vrbanec ; Ana Meštrović
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2020
  • 卷号:11
  • 期号:5
  • 页码:241-265
  • DOI:10.3390/info11050241
  • 出版社:MDPI Publishing
  • 摘要:Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, text mining in general, etc. In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection. We report the results of eight models (LSI, TF-IDF, Word2Vec, Doc2Vec, GloVe, FastText, ELMO, and USE) evaluated on three different public available corpora: Microsoft Research Paraphrase Corpus, Clough and Stevenson and Webis Crowd Paraphrase Corpus 2011. Through a great number of experiments, we decided on the most appropriate approaches for text pre-processing: hyper-parameters, sub-model selection—where they exist (e.g., Skipgram vs. CBOW), distance measures, and semantic similarity/paraphrase detection threshold. Our findings and those of other researchers who have used deep learning models show that DL models are very competitive with traditional state-of-the-art approaches and have potential that should be further developed.
  • 关键词:semantic similarity; deep learning; paraphrasing corpora; experiments; natural language processing semantic similarity ; deep learning ; paraphrasing corpora ; experiments ; natural language processing
国家哲学社会科学文献中心版权所有