文章基本信息

标题：Instance-by-instance optimal identity testing
本地全文：下载
作者：Gregory Valiant ; Paul Valiant
期刊名称：Electronic Colloquium on Computational Complexity
印刷版ISSN：1433-8092
出版年度：2013
卷号：2013
出版社：Universität Trier, Lehrstuhl für Theoretische Computer-Forschung
摘要：
We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete support p=(p1p2pn), how many samples (independent draws) must one obtain from an unknown distribution, q, to distinguish, with high probability, the case that p=q from the case that the total variation distance (L1 distance) p−q1 ? We resolve this question, up to constant factors, on an instance by instance basis: there exist universal constants cc and a function f(p) on distributions and error parameters, such that our tester distinguishes p=q from p−q1 using f(p) samples with success probability 2/3">23 , but no tester can distinguish p=q from p−q1c when given cf(p) samples. The function f(p) is upper-bounded by a multiple of 2p23, but is more complicated, and is significantly smaller in cases when p has many small domain elements. This result significantly generalizes and tightens previous results: since distributions of support at most n have L23 norm bounded by n this result immediately shows that for such distributions, O(n2) samples suffice, tightening the previous bound of O(4npolylogn) for this class of distributions, and matching the (tight) results for the case that p is the uniform distribution over support n.
The analysis of our very simple testing algorithm involves several hairy inequalities. To facilitate this analysis, we give a complete characterization of a general class of inequalities---generalizing Cauchy-Schwarz, Holder's inequality, and the monotonicity of Lp norms. Specifically, we characterize the set of sequences a=a1am b=b1bm c=c1cm for which it holds that for all finite sequences of positive numbers x=x1 and y=y1 i jxjaiyjbi ci1 The characterization is of a perhaps non-traditional nature in that it uses linear programming to compute a derivation that may otherwise have to be sought through trial and error, by hand. We do not believe such a characterization has appeared in the literature, and hope its computational nature will facilitate analyses like the one here
关键词：distribution testing; Holder`s inequality; identity testing; lower bounds; Sample Complexity