文章基本信息
- 标题:Testing Data Binnings
- 本地全文:下载
- 作者:Clment L. Canonne ; Karl Wimmer
- 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
- 电子版ISSN:1868-8969
- 出版年度:2020
- 卷号:176
- 页码:24:1-24:13
- DOI:10.4230/LIPIcs.APPROX/RANDOM.2020.24
- 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
- 摘要:Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) ðª and samples from an unknown distribution ð©, both over [n] = {1,2,⦠,n}, whether ð© equals ðª, or is significantly different from it. In this paper, we introduce the related question of identity up to binning, where the reference distribution ðª is over k ⪠n elements: the question is then whether there exists a suitable binning of the domain [n] into k intervals such that, once "binned," ð© is equal to ðª. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne [Clément L. Canonne, 2019]. Finally, we discuss several extensions and related research directions.
- 关键词:property testing; distribution testing; identity testing; hypothesis testing