一种最大化模块度的可重叠联合聚类算法-陕西师范大学学报期刊社网站

陕西师范大学学报（自然科学版）

人工智能专题

一种最大化模块度的可重叠联合聚类算法

魏家辉1, 马慧芳1,2*, 贺相春1, 李志欣2

（1 西北师范大学计算机科学与工程学院，甘肃兰州 730070,2 广西师范大学广西多源信息挖掘与安全重点实验室，广西桂林 541004）

马慧芳，女,教授,博士,研究方向为人工智能、数据挖掘和机器学习。E-mail:mahuifang@yeah.net

摘要:

针对许多现实数据集不仅包含行列簇之间的大量重叠，还包含不属于任何簇的异常值，提出了一种最大化模块度的可重叠的联合聚类方法(OMMCC)，即行簇和列簇都允许重叠，并且数据矩阵的行列离群值都不分配给任何簇。具体的，设计了统一框架将数据的非穷尽与可重叠的约束加入目标函数，通过使用迭代交替优化过程直接最大化模块度，高效地获得更好的块对角非穷尽可重叠联合聚类，且重叠程度和非穷尽程度的参数易于理解。实验结果表明，本文方法非常有效、稳定并且优于其他联合聚类算法。

关键词：

联合聚类；模块度；可重叠；离群点；块对角矩阵；人工智能

收稿日期：

2019-05-24

中图分类号：

TP391

文献标识码：

文章编号：

1672-4291(2019)05-0025-09

基金项目：

国家自然科学基金(61762078, 61966004, 61363058)；广西多源信息挖掘与安全重点实验室开放基金(MIMS18-08)

Doi:

An overlapping co-clustering algorithm by maximizing modularity

WEI Jiahui1, MA Huifang1,2*, HE Xiangchun1, LI Zhixin2

（1 College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, Gansu, China;2 Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin 541004, Guangxi, China）

Abstract:

The traditional co-clustering algorithm simultaneously produces a predetermined number of partitions for rows and columns of a two-dimensional data matrix. Most existing co-clustering algorithms are designed for non-overlapping and exhaustive co-clustering. However, many real-world datasets contain not only a large amount of overlap between row and column clusters, but also outliers that do not belong to any cluster. In view of this, an overlapping co-clustering algorithm is proposed by maximizing modularity (OMMCC), that is, both row clusters and column clusters are allowed to overlap, and the row and column outliers of the data matrix are not assigned to any cluster. Specifically, a unified framework is designed to add non-exhaustive and overlapping constraints to the objective function. Through using an iterative alternating optimization process to directly maximize the modularity, the better block diagonal non-exhaustive overlapping co-clustering can be obtained efficiently. Besides, the degree of overlap and non-exhaustive parameters are easy to understand. The experimental results show that the proposed method is very effective, stable and superior to other co-clustering algorithms.

KeyWords:

co-clustering; modularity; overlapping; outlier; block diagonal matrix; artificial intelligence