基于邻域的K中心点聚类算法-陕西师范大学学报期刊社网站

陕西师范大学学报（自然科学版）

数学与计算机科学

基于邻域的K中心点聚类算法

谢娟英1, 2，郭文娟1，谢维信2,3

（1 陕西师范大学计算机科学学院，陕西西安 710062;2 西安电子科技大学电子工程学院，陕西西安 710071；3 ATR国家重点实验室，深圳大学信息工程学院, 广东深圳 518060）

谢娟英，女，副教授，主要从事智能信息处理、模式识别、机器学习和数据挖掘研究. E-mail: xiejuany@snnu.edu.cn.

摘要:

提出一种基于邻域的K中心点聚类算法，该算法利用数据集样本的自然分布信息定义数据对象的邻域半径和相应邻域，选择位于样本分布密集区且相距较远的K个数据对象作为初始聚类中心，以期改进快速K中心点算法在选取初始中心点时有可能使多个初始中心位于同一类簇的潜在缺陷.通过UCI机器学习数据库数据集以及随机生成的带有噪音点的人工模拟数据集实验测试，表明提出的基于邻域的K中心点算法不仅具有很好的聚类效果，而且运行时间短，对噪音数据有很强的抗干扰性能，优于传统K中心点算法和Park等人的快速K中心点算法.

关键词：

邻域； K中心点算法；样本密度；聚类；样本空间分布

收稿日期：

2011-06-16

中图分类号：

TP181.1

文献标识码：

文章编号：

1672-4291(2012)04-0016-05

基金项目：

陕西省自然科学基础研究计划项目（2010JM3004）；中央高校基本科研业务费专项资金重点项目（GK201102007）.

Doi:

A neighborhood-based K-medoids clustering algorithm

XIE Juan-ying1,2, GUO Wen-juan1, XIE Wei-xin2,3

(1 College of Computer Science, Shaanxi Normal University, Xi′an 710062, Shaanxi, China; 2 College of Electronic Engineering, Xidian University, Xi′an 710071, Shaanxi, China; 3 National Laboratory of Automatic Target Recognition (ATR), College of Information Engineering, Shenzhen University, Shenzhen 518060, Guangdong, China)

Abstract:

A new K-medoids algorithm is proposed based on the neighborhood of samples in a dataset. This algorithm defines the radius of the neighborhood and the related neighborhood for each sample according to the distribution of samples in a dataset, and selects the samples that not only lie in higher density area, but also are far away from each other as initial seeds for K-medoids algorithm to overcome the potential disadvantage of the fast K-medoids algorithm to select the samples in the same cluster as initial seeds for different clusters. The proposed K-medoids algorithm is tested on some well-known data sets from UCI machine learning repository and on some synthetic datasets with noisy samples. The experimental results demonstrate that the proposed algorithm achieves the excellent clustering result in short time and is not sensible to noisy data. It outperforms the traditional K-medoids algorithm(Partitioning Around Medoids,PAM) and the fast K-medoids algorithm of Park′s.

KeyWords:

neighborhood; K-medoids clustering algorithm; sample density; clustering; sample space distribution