WEI Jiahui1, MA Huifang1,2*, HE Xiangchun1, LI Zhixin2
(1 College of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, Gansu, China;2 Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin 541004, Guangxi, China)
Abstract:
The traditional co-clustering algorithm simultaneously produces a predetermined number of partitions for rows and columns of a two-dimensional data matrix. Most existing co-clustering algorithms are designed for non-overlapping and exhaustive co-clustering. However, many real-world datasets contain not only a large amount of overlap between row and column clusters, but also outliers that do not belong to any cluster. In view of this, an overlapping co-clustering algorithm is proposed by maximizing modularity (OMMCC), that is, both row clusters and column clusters are allowed to overlap, and the row and column outliers of the data matrix are not assigned to any cluster. Specifically, a unified framework is designed to add non-exhaustive and overlapping constraints to the objective function. Through using an iterative alternating optimization process to directly maximize the modularity, the better block diagonal non-exhaustive overlapping co-clustering can be obtained efficiently. Besides, the degree of overlap and non-exhaustive parameters are easy to understand. The experimental results show that the proposed method is very effective, stable and superior to other co-clustering algorithms.
KeyWords:
co-clustering; modularity; overlapping; outlier; block diagonal matrix; artificial intelligence