On the use of the p-median model for semi-supervised clustering
semi-supervised clustering, data mining, p-median.
Clustering is a powerful tool for automated analysis of data. It addresses the following general problem: given a set of entities, find subsets, or clusters, which are homogeneous and/or well separated. The biggest challenge of data clustering is to find a criterion to present good separation of data into homogeneous groups, so that these groups bring useful information to the user. To solve this problem, it is suggested that the user can provide a priori information about the data set. Clustering under this assumption is called semi-supervised clustering. This work explores the semi-supervised clustering problem using a new model: the data is clustered by solving the p-medians problem. Results shows that this new approach was able to efficiently cluster the data in many different domains.