New strategies to fix degeneracy in the k-means algorithm
K-means, Degeneracy, Clustering, Heuristics.
The k-means is a benchmark algorithm used in cluster analysis. It belongs to the
large category of heuristics based on location-allocation steps that alternately
locate cluster centers and allocate data points to them until no further
improvement is possible. Such heuristics are known to su er from a phenomenon
called degeneracy in which some of the clusters are empty, and hence, out of
use. In this thesis, we compare and propose a series of strategies to circumvent
degenerate solutions during a k-means execution. Our computational
experiments demonstrate that these strategies are e cient leading to better
clustering solutions in the vast majority of the cases in which degeneracy appears
in k-means.