首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Arabic duplicate questions detection based on contextual representation, class label matching, and structured self attention
  • 本地全文:下载
  • 作者:Alami Hamza ; Said El Alaoui Ouatik ; Khalid Alaoui Zidani
  • 期刊名称:Journal of King Saud University @?C Computer and Information Sciences
  • 印刷版ISSN:1319-1578
  • 出版年度:2022
  • 卷号:34
  • 期号:6
  • 页码:3758-3765
  • 语种:English
  • 出版社:Elsevier
  • 摘要:Question Answering Systems (QAS) are rising solutions providing exact and precise answers to natural questions. Duplicate Question Detection (DQD), which aims to reuse previous answers, has shown its ability to improve user experience and reduce significantly the response time. However, few Arabic QAS integrate solutions able to detect duplicate questions in their workflow. In this paper, we build a DQD method based on contextual word representation, question classification and forward/backward structured self attention. First, we extract contextual word representation Embeddings from Language Models (ELMo) to map questions into a vector space. Next, we train two models to classify question embedding according to two taxonomies: Hamza et al. and Li & Roth. Then, we introduce a class label matching step to filter out questions that have different class labels. Finally, we propose a Bidirectional Attention Bidirectional LSTM (BiAttention BiLSTM) model that focuses only on keywords to predict whether a question pair is a duplicate or not. We also apply a data augmentation strategy based on symmetry, reflexivity, and transitivity relations to improve the generalization of our model. Various experimentations are performed to evaluate the impact of question classification and pre-processing step on DQD model. The obtained results show that our model achieves good performances as compared to the baseline results.
国家哲学社会科学文献中心版权所有