摘要:Aiming at the set of large-scale real questions collected from the question-answering system based on community, a novel pretreatment method of questions is put forward. The method uses KNN-based active learning algorithm to train the classifier, and uses the neighborhood category system to classify the questions efficiently. On this basis, the classified set of questions is divided into equivalence classes so that the semantically related questions are got together in combination with statistical information, semantic information and subject information based on the LDA model. The final experimental results show the effectiveness of the methods proposed.
关键词:Question Set;Pretreatment;KNN;Active Learning;Equivalence Class