首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Learning Discrete Distributions from Untrusted Batches
  • 作者:Mingda Qiao ; Gregory Valiant
  • 期刊名称:LIPIcs : Leibniz International Proceedings in Informatics
  • 电子版ISSN:1868-8969
  • 出版年度:2018
  • 卷号:94
  • 页码:47:1-47:20
  • DOI:10.4230/LIPIcs.ITCS.2018.47
  • 出版社:Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik
  • 摘要:We consider the problem of learning a discrete distribution in the presence of an epsilon fraction of malicious data sources. Specifically, we consider the setting where there is some underlying distribution, p, and each data source provides a batch of >= k samples, with the guarantee that at least a (1 - epsilon) fraction of the sources draw their samples from a distribution with total variation distance at most \eta from p. We make no assumptions on the data provided by the remaining epsilon fraction of sources--this data can even be chosen as an adversarial function of the (1 - epsilon) fraction of "good" batches. We provide two algorithms: one with runtime exponential in the support size, n, but polynomial in k, 1/epsilon and 1/eta that takes O((n + k)/epsilon^2) batches and recovers p to error O(eta + epsilon/sqrt(k)). This recovery accuracy is information theoretically optimal, to constant factors, even given an infinite number of data sources. Our second algorithm applies to the eta = 0 setting and also achieves an O(epsilon/sqrt(k)) recover guarantee, though it runs in poly((nk)^k) time. This second algorithm, which approximates a certain tensor via a rank-1 tensor minimizing l_1 distance, is surprising in light of the hardness of many low-rank tensor approximation problems, and may be of independent interest.
  • 关键词:robust statistics; information-theoretic optimality
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有