MultiClust: 1st International Workshop on
Discovering, Summarizing and Using Multiple Clusterings
Held in Conjunction with
MultiClust: Discovering, Summarizing, and Using Multiple Clusterings
MultiClust is a workshop held in conjunction with the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2010), July 25-28, 2010 in Washington, DC.
Workshop Description
Data is often multi-faceted by nature.  Given a single data set, one can interpret it in several different ways. This is particularly true with complex data that has become prevalent in the data mining community: text, video, images and biological data to name just a few. Yet, many data mining and clustering algorithms in particular only extract and present a single clustering/summarization even though multiple good alternatives exist. Practitioners oftentimes find that the clustering solution provided by an algorithm is not what they are looking for.  Why limit the output to one clustering solution?  Why not provide all possible alternative and interesting clustering solutions?

Recently, there has developed an emerging interest on discovering multiple clustering solutions from complex data. To avoid redundancy and excessive burden on the data analyst, it is key to extract clustering solutions that are informative yet non-redundant from one another. Toward this goal, important research issues include, how to define redundancy among clusterings, can existing algorithms be modified to accommodate this goal, how many solutions should we extract, how to select among exponentially many possible solutions which solutions to present to the data analyst, and how to most effectively help the data analyst find what he or she is searching for. Existing work approach this problem by looking for non-redundant, alternative, disparate or orthogonal clustering. Research in this area is developing and can benefit from well-established closely related areas, such as ensemble clustering, constraint-based clustering, compression and coding theory.

In this workshop, we plan to bring together the researchers from the above research areas to discuss important issues in multiple clustering discovery, compression and summarization. Our objectives are to:  1) further increase the general interest on this important topic in the broader research community; 2) bring together experts from closely related areas (e.g., cluster ensembles and constraint-based clustering) to shed light on how this emerging new research direction can benefit from other well-established areas; 3) provide a venue for active researchers to exchange ideas and explore important research issues in this area.

Suggested topics:

  • Alternative clustering: discovering new clusterings that are different from previously known clusterings
  • Algorithms for learning simultaneously multiple diverse clusterings
  • Visualization of multiple clustering solutions
  • Interactive exploration of multiple clustering solutions
  • Multiple high dimensional subspace clusterings
  • Disparate Clustering
  • Meta Clustering
  • Model selection for non-redundant clustering: how many clusterings and how many clusters?
  • Non-redundant frequent patterns
  • Non-redundant subspace clustering
  • Relation between cluster ensembles and disparate clustering
  • Constraint-based Clustering for alternative clustering
  • Evaluation Metrics for Multiple Clusterings
  • Applications
Keynote Speeches

Joydeep Ghosh (University of Texas, Austin)
Multi-Clust Systems: When Many Views are Better than One
As opposed to multi-classifier systems where the primary goal is to improve classification accuracy, multiple clusterings over a common set of objects can provide a wide range of benefits. For example, they can facilitate detecting multi-membership objects, allow knowledge reuse, provide alternative ways of model selection and can imbibe a variety of domain knowledge to constrain the consensus solution. Examples from disciplines ranging from psychology to marketing will be presented to highlight these capabilities.

James Bailey (University of Melbourne) Talk slides
Alternative Clusterings: Current Progress and Open Challenges
This talk will review the state of the art for discovering alternative clusterings, identify key applications and highlight current challenges for this important and emerging area.

Rich Carauna (Microsoft)
Clustering With Side Information vs. Multi/Meta Clustering
Early clustering work usually assumed there was one true clustering of the data, and that the goal of clustering was to find a clustering as close to the correct clustering and as efficiently as possible. It is now clear that complex data sets can be clustered in many different ways, and that different clusterings are useful for different purposes. The goal now is to efficiently find multiple, significantly different, yet high quality clusterings, and to allow users to efficiently find among these the clustering(s) that are most useful for them. In this talk we'll compare two competing approaches for accomplishing this: clustering with side information and multi/meta clustering. One surprising result from our experiments is that the clustering which is most useful often is not a very compact clustering using common definitions of compactness.

Important Dates
Submission date for full papers: May 4, 2010
Author notification: May 25, 2010
Submission of camera-ready paper: May 28, 2010
Half-day workshop at ACM SIGKDD conference: July 25, 2010
Tentative program
Multi-Clust Fairfax 2pm-5pm
2pm Sharp Start

Session - 1
2pm - 2:10pm: Opening Remarks
Ian Davidson and Jennifer Dy

2:10pm - 2:40pm (5 minutes for question/discussion)
Invited Talk
Multi-Clust Systems: When Many Views are Better than One
Joydeep Ghosh, University of Texas - Austin

Variational Inference for Nonparametric Multiple Clustering
Guan, Dy (Northeastern), Niu and Ghahramani

Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other? (Talk slides)
Kriegel and Zimek, University of Munich

3:05pm - 3:15pm
Coffee Break (note shortened break)

Session - 2
3:15pm - 3:45pm (5 minutes for question/discussion)
Invited Talk
Alternative Clusterings: Current Progress and Open Challenges
James Bailey, University of Melbourne

3:45pm - 4:00pm
Uncovering Many Views of Biological Networks Using Ensembles of Near-Optimal Partitions Ensembles of Near-Optimal Partitions
Duggal, Navlakha, Girvan, Kingsford, University of Maryland - College Park

4:00pm - 4:15pm
Incorporating Spatial Similarity into Ensemble Clustering
Ansari, Fillmore, Coen, University of Wisconsin - Madison

Less is More: Non-redundant subspace clustering (Talk slides)
Assent (Aalborg University), Muller, Gunnemann, Krieger, Seidl

4:25-4:30pm (short break)

Session - 3
Invited Talk
Clustering With Side Information vs. Multi/Meta Clustering
Caruana, Microsoft Research

5:00pm - 5:15pm
On Using Class-Labels in Evaluation of Clusterings (Talk Slides)
Farber (RWTH Aachen University), Gunnemann, Kriegel, Kroger, Muller, Schubert, Seidl, Zimek

5:15pm - 5:25pm
ASCLU: Alternative Sub-Space Clustering (Talk Slides)
Gunnemann, Faber, Muller and Seidl
The workshop proceedings will be distributed through USB drive to all the participants.
Xiaoli Z. Fern
Oregon State University, School of Electrical Engineering and Computer Science
1148 Kelly Engineering Center, Corvallis, OR, 97330
xfern [at]

Ian Davidson
The University of California - Davis, Computer Science
1 Shields Avenue, Davis, CA, 95616
davidson [at]

Jennifer Dy
Northeastern University, Department of Electrical and Computer Engineering
409 Dana Bldg., Boston, MA, 02115
jdy [at]

Program Committee (not complete)
  • Ira Assent (Aalborg University, Denmark)
  • James Bailey (University of Melbourne)
  • Arindam Banerjee (University of Minnesota, USA)
  • Rich Caruana (Microsoft, USA)
  • Chris Ding (University of Texas at Arlington, USA)
  • Tao Li (Florida International University, USA)
  • Emmanuel Muller(RWTH Aachen University, Germany)
  • Naren Ramakrishnan
  • Thomas Seidl (RWTH Aachen University, Germany)
  • Alexander Topchy (Nielsen Media Research)
  • Kiri Wagstaff (NASA - JPL, USA)