首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:Testing Data Binnings
  • 本地全文:下载
  • 作者:Clement Canonne ; Karl Wimmer
  • 期刊名称:Electronic Colloquium on Computational Complexity
  • 印刷版ISSN:1433-8092
  • 出版年度:2020
  • 卷号:2020
  • 页码:1-14
  • 出版社:Universität Trier, Lehrstuhl für Theoretische Computer-Forschung
  • 摘要:Motivated by the question of data quantization and “binning,” we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) q and samples from an unknown distribution p, both over [n] = {1, 2, . . . , n}, whether p equals q, or is significantly different from it. In this paper, we introduce the related question of identity up to binning, where the reference distribution q is over k  n elements: the question is then whether there exists a suitable binning of the domain [n] into k intervals such that, once “binned,” p is equal to q. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne [Can19]. Finally, we discuss several extensions and related research directions.
国家哲学社会科学文献中心版权所有