期刊名称:Journal of King Saud University @?C Computer and Information Sciences
印刷版ISSN:1319-1578
出版年度:2022
卷号:34
期号:6
页码:3758-3765
语种:English
出版社:Elsevier
摘要:Question Answering Systems (QAS) are rising solutions providing exact and precise answers to natural questions. Duplicate Question Detection (DQD), which aims to reuse previous answers, has shown its ability to improve user experience and reduce significantly the response time. However, few Arabic QAS integrate solutions able to detect duplicate questions in their workflow. In this paper, we build a DQD method based on contextual word representation, question classification and forward/backward structured self attention. First, we extract contextual word representation Embeddings from Language Models (ELMo) to map questions into a vector space. Next, we train two models to classify question embedding according to two taxonomies: Hamza et al. and Li & Roth. Then, we introduce a class label matching step to filter out questions that have different class labels. Finally, we propose a Bidirectional Attention Bidirectional LSTM (BiAttention BiLSTM) model that focuses only on keywords to predict whether a question pair is a duplicate or not. We also apply a data augmentation strategy based on symmetry, reflexivity, and transitivity relations to improve the generalization of our model. Various experimentations are performed to evaluate the impact of question classification and pre-processing step on DQD model. The obtained results show that our model achieves good performances as compared to the baseline results.