文章基本信息

标题：A Bloom Filter for High Dimensional Vectors
本地全文：下载
作者：Chunyan Shuai ; Hengcheng Yang ; Xin Ouyang 等
期刊名称：Information
电子版ISSN：2078-2489
出版年度：2018
卷号：9
期号：7
页码：159
DOI：10.3390/info9070159
语种：English
出版社：MDPI Publishing
摘要：Regardless of the type of data, traditional Bloom filters treat each element of a set as a string, and by iterating every character of the string, they discretize all data randomly and uniformly. However, with the data size and dimension increases, these variants are inefficient. To better discretize vectors with high numerical dimensions, this paper improves the string hashes to integer hashes. Based on the integer hashes and a counter array, we propose a new variant—high-dimensional bloom filter (HDBF)—to extend the Bloom filter into high-dimensional spaces, which can represent and query numerical vectors of a big set with a low false positive probability. This paper theoretically analyzes the feasibility of the integer hashes on discretizing data and discusses the relationship of parameters of the HDBF. The experiments illustrate that, in high-dimensional numerical spaces, the HDBF shows better randomness on distribution and entropy than that of the counting Bloom filter. Compared with the parallel Bloom filters, for a fixed false positive probability, the HDBF displays time-space overheads, and is more suitable to deal with the numerical vectors with high dimensions.
关键词：Bloom filter; high-dimensional numerical vector; high-dimensional Bloom filter; integer hash functions Bloom filter ; high-dimensional numerical vector ; high-dimensional Bloom filter ; integer hash functions