Abstract:
The internal evaluation criteria, such as DB and BWP et al, need the upper bound of kmax to be given in advance when these criteria are used to determine the number of clusters of a dataset, so that the number of clusters of k in a dataset satisfies k≤kmax. However, there is not so far a theory about how to find the upper bound of kmax. Aiming at this problem, a new F-statistics Fr is proposed in this paper.Fr can be used as a new criterion to judge whether a clustering algorithm is converged or not, so that the number of clusters of a dataset can be found automatically without the upper bound of kmax to be given in advance. In addition, Fr is applied to the fast K-medoids clustering algorithm as its criterion to judge whether its iterations continues or not, and the geodesic distance between samples on the minimum spanning tree (MST), that is the length of the path between samples on MST, is introduced to instead of the direct Euclidean distance between them to measure their similarities, so that an adaptive fast K-medoids algorithm is obtained. The adaptive K-medoids algorithm avoids the deficiencies of the available K-medoids algorithms which need the number of clusters to be given in advance and cannot detect the clusters with any arbitrary shape. The power of the proposed criterion Fr and the power of the adaptive fast K-medoids algorithm are tested in real world datasets from UCI machine learning repository and in the synthetic datasets. All the experimental results demonstrate that our proposed criterion Fr is a valid clustering criterion, and the adaptive fast K-medoids algorithm based on the Fr and geodesic distance can recognize the clusters with arbitrary shape, and can detect the number of clusters automatically, and is robust to noises as well.