摘要:In this paper, we focus on lower bounds for data structures supporting orthogonal range querying on m points in n-dimensions in the semigroup model. Such a data structure usually maintains a family of "canonical subsets" of the given set of points and on a range query, it outputs a disjoint union of the appropriate subsets. Fredman showed that in order to prove lower bounds in the semigroup model, it suffices to prove a lower bound on a certain combinatorial tradeoff between two parameters: (a) the total sizes of the canonical subsets, and (b) the total number of canonical subsets required to cover all query ranges. In particular, he showed that the arithmetic mean of these two parameters is Omega(m log^n m). We strengthen this tradeoff by showing that the geometric mean of the same two parameters is Omega(m log^n m). Our second result is an alternate proof of Fredman's tradeoff in the one dimensional setting. The problem of answering range queries using canonical subsets can be formulated as factoring a specific boolean matrix as a product of two boolean matrices, one representing the canonical sets and the other capturing the appropriate disjoint unions of the former to output all possible range queries. In this formulation, we can ask what is an optimal data structure, i.e., a data structure that minimizes the sum of the two parameters mentioned above, and how does the balanced binary search tree compare with this optimal data structure in the two parameters? The problem of finding an optimal data structure is a non-linear optimization problem. In one dimension, Fredman's result implies that the minimum value of the objective function is Omega(m log m), which means that at least one of the parameters has to be Omega(m log m). We show that both the parameters in an optimal solution have to be Omega(m log m). This implies that balanced binary search trees are near optimal data structures for range querying in one dimension. We derive intermediate results on factoring matrices, not necessarily boolean, while trying to minimize the norms of the factors, that may be of independent interest.
关键词:range querying; lower bounds; matrix factorization; Lagrange dual function