首页    期刊浏览 2024年07月05日 星期五
登录注册

文章基本信息

  • 标题:A New Hashing based Nearest Neighbors Selection Technique for Big Datasets
  • 本地全文:下载
  • 作者:Jude Tchaye-Kondi ; Yanlong Zhai ; Liehuang Zhu
  • 期刊名称:Computer Science & Information Technology
  • 电子版ISSN:2231-5403
  • 出版年度:2021
  • 卷号:11
  • 期号:7
  • 语种:English
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:KNN has the reputation of being a simple and powerful supervised learning algorithm used for either classification or regression. Although KNN prediction performance highly depends on the size of the training dataset, when this one is large, KNN suffers from slow decision making. This is because each decision-making process requires the KNN algorithm to look for nearest neighbors within the entire dataset. To overcome this slowness problem, we propose a new technique that enables the selection of nearest neighbors directly in the neighborhood of a given data point. The proposed approach consists of dividing the data space into sub-cells of a virtual grid built on top of the dataset. The mapping between data points and sub-cells is achieved using hashing. When it comes to selecting the nearest neighbors of a new observation, we first identify the central cell where the observation is contained. Once that central cell is known, we then start looking for the nearest neighbors from it and the cells around. From our experimental performance analysis of publicly available datasets, our algorithm outperforms the original KNN with a predictive quality as good and offers competitive performance with solutions such as KDtree.
  • 关键词:Machine learning;Nearest neighbors;Hashing;Big data
国家哲学社会科学文献中心版权所有