OREGON STATE UNIVERSITY

You are here

Learning multiple nonredundant clusterings

TitleLearning multiple nonredundant clusterings
Publication TypeJournal Article
Year of Publication2010
AuthorsCui, Y., X. Z. Fern, and J. G. Dy
JournalACM Transactions on Knowledge Discovery from Data
Volume4
Issue3
Pagination1 - 32
Date Published10/2010
ISSN15564681
Keywordsalgorithms, data mining, disparate clustering, diverse clustering, nonredundant clustering, orthogonalization
Abstract

Real-world applications often involve complex data that can be interpreted in many different ways. When clustering such data, there may exist multiple groupings that are reasonable and interesting from different perspectives. This is especially true for high-dimensional data, where different feature subspaces may reveal different structures of the data. However, traditional clustering is restricted to finding only one single clustering of the data. In this article, we propose a new clustering paradigm for exploratory data analysis: find all non-redundant clustering solutions of the data, where data points in the same cluster in one solution can belong to different clusters in other partitioning solutions. We present a framework to solve this problem and suggest two approaches within this framework: (1) orthogonal clustering, and (2) clustering in orthogonal subspaces. In essence, both approaches find alternative ways to partition the data by projecting it to a space that is orthogonal to the current solution. The first approach seeks orthogonality in the cluster space, while the second approach seeks orthogonality in the feature space. We study the relationship between the two approaches. We also combine our framework with techniques for automatically finding the number of clusters in the different solutions, and study stopping criteria for determining when all meaningful solutions are discovered. We test our framework on both synthetic and high-dimensional benchmark data sets, and the results show that indeed our approaches were able to discover varied clustering solutions that are interesting and meaningful.

DOI10.1145/1839490.1839496
Short TitleACM Trans. Knowl. Discov. DataTKDD