期刊名称:Electronic Colloquium on Computational Complexity
印刷版ISSN:1433-8092
出版年度:1999
卷号:1999
出版社:Universität Trier, Lehrstuhl für Theoretische Computer-Forschung
摘要:We address the problem of partitioning a set of n points into
clusters, so as to minimize the sum, over all intracluster pairs of
points, of the cost associated with each pair. We obtain a randomized
approximation algorithm for this problem, for the cost functions
221 and 2, as well as any cost function
isometrically embeddable in 22.
The maximization problem (maximize the costs of intercluster edges) is
approximated with high probability to within multiplicative factor
(1−\ep); while the minimization problem either receives a
multiplicative 1+\ep approximation, or else the optimal clustering
is correctly identified except for a mislabelling of an \ep fraction
of the point set. Given a fixed approximation parameter \ep, the
runtime is linear in n for 22 problems of dimension
o(lognloglogn) ; and nO(loglogn) in the general
case.
The case 22 is addressed by combining three elements: (a)
Variable-probability sampling of the given points, to reduce the size
of the data set. (b) Near-isometric dimension reduction. (c) A
deterministic exact algorithm which runs in time exponential in the
dimension (rather than the number of points). The remaining cases are
addressed by reduction to 22.