首页    期刊浏览 2025年07月01日 星期二
登录注册

文章基本信息

  • 标题:Correcting the hub occurrence prediction bias in many dimensions
  • 本地全文:下载
  • 作者:Tomašev, Nenad ; Buza, Krisztian ; Mladenić, Dunja
  • 期刊名称:Computer Science and Information Systems
  • 印刷版ISSN:1820-0214
  • 电子版ISSN:2406-1018
  • 出版年度:2015
  • 页码:39-39
  • DOI:10.2298/CSIS140929039T
  • 出版社:ComSIS Consortium
  • 摘要:Data reduction is a common pre-processing step for k-nearest neighbor classification (kNN). The existing prototype selection methods implement different criteria for selecting relevant points to use in classification, which constitutes a selection bias. This study examines the nature of the instance selection bias in intrinsically high-dimensional data. In high-dimensional feature spaces, hubs are known to emerge as centers of influence in kNN classification. These points dominate most kNN sets and are often detrimental to classification performance. Our experiments reveal that different instance selection strategies bias the predictions of the behavior of hub-points in high-dimensional data in different ways. We propose to introduce an intermediate un-biasing step when training the neighbor occurrence models and we demonstrate promising improvements in various hubness-aware classification methods, on a wide selection of high-dimensional synthetic and real-world datasets.
  • 关键词:instance selection; data reduction; classification; bias; k-nearest neighbor; hubness; curse of dimensionality
国家哲学社会科学文献中心版权所有