摘要:The title poses a deceptively simple question that must be addressed by
any statistical model or computational algorithm for the clustering of points. Two
distinct interpretations are possible, one connected with the number of clusters in
the sample and one with the number in the population. Under suitable conditions,
these questions may have essentially the same answer, but it is logically possible
for one answer to be nite and the other innite. This paper reformulates the
standard Dirichlet allocation model as a cluster process in such a way that these
and related questions can be addressed directly. Our conclusion is that the data
are sometimes informative for clustering points in the sample, but they seldom
contain much information about parameters such as the number of clusters in the
population.
关键词:Cluster process; Dirichlet partition; Gauss-Ewens process; Random
sub-clusters; Species-counting model