文章基本信息

标题：Constant-Time Retrieval with O(log m) Extra Bits
本地全文：下载
作者：Martin Dietzfelbinger ; Stefan Walzer
期刊名称：LIPIcs : Leibniz International Proceedings in Informatics
电子版ISSN：1868-8969
出版年度：2019
卷号：126
页码：1-16
DOI：10.4230/LIPIcs.STACS.2019.24
出版社：Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
摘要：For a set U (the universe), retrieval is the following problem. Given a finite subset S subseteq U of size m and f : S -> {0,1}^r for a small constant r, build a data structure D_f with the property that for a suitable query algorithm query we have query(D_f,x) = f(x) for all x in S. For x in U setminus S the value query(D_f,x) is arbitrary in {0,1}^r. The number of bits needed for D_f should be (1+epsilon)r m with overhead epsilon = epsilon(m) >= 0 as small as possible, while the query time should be small. Of course, the time for constructing D_f is relevant as well. We assume fully random hash functions on U with constant evaluation time are available. It is known that with epsilon ~= 0.09 one can achieve linear construction time and constant query time, and with overhead epsilon_k ~= e^{-k} it is possible to have O(k) query time and O(m^{1+alpha}) construction time, for arbitrary alpha>0. Furthermore, a theoretical construction with epsilon =O((log log m)/sqrt{log m}) gives constant query time and linear construction time. Known constructions avoiding all overhead, except for a seed value of size O(log log m), require logarithmic query time. In this paper, we present a method for treating the retrieval problem with overhead epsilon = O((log m)/m), which corresponds to O(1) extra memory words (O(log m) bits), and an extremely simple, constant-time query operation. The price to pay is a construction time of O(m^2). We employ the usual framework for retrieval data structures, where construction is effected by solving a sparse linear system of equations over the 2-element field F_2 and a query is effected by a dot product calculation. Our main technical contribution is the design and analysis of a new and natural family of sparse random linear systems with m equations and (1+epsilon)m variables, which combines good locality properties with high probability of having full rank. Paying a larger overhead of epsilon = O((log m)/m^alpha), the construction time can be reduced to O(m^{1+alpha}) for arbitrary constant 0 < alpha < 1. In combination with an adaptation of known techniques for solving sparse linear systems of equations, our approach leads to a highly practical algorithm for retrieval. In a particular benchmark with m = 10^7 we achieve an order-of-magnitude improvement over previous techniques with epsilon = 0.24% instead of the previously best result of epsilon ~= 3%, with better query time and no significant sacrifices in construction time.
关键词：Retrieval; Hashing; Succinct Data Structure; Randomised Data Structure; Structured Gaussian Elimination; Method of Four Russians