首页    期刊浏览 2024年10月04日 星期五
登录注册

文章基本信息

  • 标题:Semantic Similarity Analysis for Corpus Development and Paraphrase Detection in Arabic
  • 本地全文:下载
  • 作者:Adnen Mahmoud ; Mounir Zrigui
  • 期刊名称:The International Arab Journal of Information Technology
  • 印刷版ISSN:1683-3198
  • 出版年度:2021
  • 卷号:18
  • 期号:1
  • DOI:10.34028/iajit/18/1/1
  • 语种:English
  • 出版社:Zarqa Private University
  • 摘要:Paraphrase detection allows determining how original and suspect documents convey the same meaning. It has attracted attention from researchers in many Natural Language Processing (NLP) tasks such as plagiarism detection, question answering, information retrieval, etc., Traditional methods (e.g., Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), and Latent Semantic Analysis (LSA)) cannot capture efficiently hidden semantic relations when sentences may not contain any common words or the co-occurrence of words is rarely present. Therefore, we proposed a deep learning model based on Global Word embedding (GloVe) and Recurrent Convolutional Neural Network (RCNN). It was efficient for capturing more contextual dependencies between words vectors with precise semantic meanings. Seeing the lack of resources in Arabic language publicly available, we developed a paraphrased corpus automatically. It preserved syntactic and semantic structures of Arabic sentences using word2vec model and Part-Of-Speech (POS) annotation. Overall experiments shown that our proposed model outperformed the state-of-the-art methods in terms of precision and recall.
  • 关键词:Arabic language processing;word2vec;part-of-speech annotation;paraphrasing;semantic analysis;recurrent convolutional neural networks
国家哲学社会科学文献中心版权所有