A discrete distribution p, over [n], is a k -histogram if its probability distribution function can berepresented as a piece-wise constant function with k pieces. Such a functionisrepresented by a list of k intervals and k corresponding values. We considerthe following problem: given a collection of samples from a distribution p,find a k -histogram that (approximately) minimizes the 2 distance to thedistribution p.We give time and sample efficient algorithms for this problem.
We further provide algorithms that distinguish distributions that have theproperty of being a k -histogram from distributions that are \eps-far fromanyk -histogram in the 1 distance and 2 distance respectively.