首页    期刊浏览 2025年12月28日 星期日
登录注册

文章基本信息

  • 标题:Topic Models Ensembles for AD-HOC Information Retrieval
  • 本地全文:下载
  • 作者:Pablo Ormeño ; Marcelo Mendoza ; Carlos Valle
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2021
  • 卷号:12
  • 期号:9
  • 页码:360
  • DOI:10.3390/info12090360
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods’ limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-<i>k</i> list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.
国家哲学社会科学文献中心版权所有