一种改进的全局K-均值聚类算法-陕西师范大学学报期刊社网站

陕西师范大学学报（自然科学版）

数学与计算机科学

一种改进的全局K-均值聚类算法

谢娟英1，2，蒋帅1，王春霞1，张琰1，谢维信2

(1 陕西师范大学计算机科学学院，陕西西安 710062；2 西安电子科技大学电子工程学院，陕西西安 710071)

谢娟英，女，副教授，博士研究生，主要从事智能信息处理和模式识别研究.E-mail:xiejuany@snnu.edu.cn.

摘要:

将快速K中心点聚类算法确定初始中心点的思想应用于全局K-均值聚类算法，对其选取下一个簇的最佳初始中心的方法进行改进，提出选取下一个簇的最佳初始中心的一种新方法.该新方法选择一个周围样本分布相对密集，且距离现有簇的中心比较远的样本为下一个簇的最佳初始中心，得到一种改进的全局K-均值聚类算法.改进后的算法不仅可以避免将噪音点作为下一个簇的最佳初始中心点，而且在不影响聚类效果的基础上缩短了聚类时间.通过UCI机器学习数据库数据以及随机生成的人工模拟数据实验测试，证明改进的全局K-均值聚类算法与全局K-均值聚类算法及快速全局K-均值聚类算法相比在聚类时间上更优越.

关键词：

K-均值；全局K-均值；快速全局K-均值； K中心点法

收稿日期：

2009-09-25

中图分类号：

TP18

文献标识码：

文章编号：

1672-4291(2010)02-0018-05

基金项目：

国家自然科学基金资助项目（30670250）

Doi:

An improved global K-means clustering algorithm

XIE Juan-ying1,2, JIANG Shuai1, WANG Chun-xia1, ZHANG Yan1, XIE Wei-xin2

(1 College of Computer Science, Shaanxi Normal University, Xi′an 710062, Shaanxi, China;2 School of Electronic Engineering, Xidian University, Xi′an 710071, Shaanxi, China)

Abstract:

An improved global K-means clustering algorithm is proposed by presenting a novel method of generating the next optimal initial center with the enlightening of the idea of K-medoids clustering algorithm suggested by Park et al. Our new method choose a point which has a high density and is far away from the centers of the available clusters, so that it can not only avoid choosing a noisy datum as the optimal candidate centre, but also reduce the computational time without affecting the performance of the global K-means clustering algorithm. Our improved global K-means clustering algorithm is tested on some well-known data sets from UCI and on some synthetic data with noisy data, and the results of these experiments demonstrate that our method significantly outperforms the global K-means clustering algorithm and the fast global K-means clustering algorithm.

KeyWords:

K-means; global K-means; fast global K-means; K-medoids clustering