首页    期刊浏览 2024年09月18日 星期三
登录注册

文章基本信息

  • 标题:Using Semi-supervised Learning for Question Classification
  • 本地全文:下载
  • 作者:Tri Thanh Nguyen ; Le Minh Nguyen ; Akira Shimazu
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2008
  • 卷号:3
  • 期号:1
  • 页码:112-130
  • DOI:10.11185/imt.3.112
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:Question classification, an important phase in question answering systems, is the task of identifying the type of a given question among a set of predefined types. This study uses unlabeled questions in combination with labeled questions for semi-supervised learning, to improve the precision of question classification task. For semi-supervised algorithm, we selected Tri-training because it is a simple but efficient co-training style algorithm. However, Tri-training is not well suitable for question data, so we give two proposals to modify Tri-training, to make it more suitable. In order to enable its three classifiers to have different initial hypotheses, Tri-training bootstrap-samples the originally labeled set to get different sets for training the three classifiers. The precisions of three classifiers are decreased because of the bootstrap-sampling. With the purpose to avoid this drawback by allowing each classifier to be initially trained on the originally labeled set while still ensuring the diversity of three classifiers, our first proposal is to use multiple algorithms for classifiers in Tri-training; the second proposal is to use multiple algorithms for classifiers in combination with multiple views, and our experiments show promising results.
  • 关键词:Computational Linguistics;Question classification;Semi-supervised learning;Tri-training algorithm
国家哲学社会科学文献中心版权所有