Flower 

 || Komo Mai || Research || Publications || Teaching ||



Research Interests: Unsupervised learning / Cluster analysis



Cluster analysis is one of the most freequently used data analysis methods with applications in a variety of fields ranging from bioinformatics to computer vision to ecology, geophysics, and astronomy.

One of the most challenging problems in cluster analysis is to determine the number of clusters we can resolve from a given data set. It is related to controlling the complexity of a model, a crucial task in machine learning. We derived a complexity control term within an information theoretical framework. This allows us to determine the maximal number of clusters which can be resolved given the size of the data set.

S. Still and W. Bialek. How many clusters? An information theoretic perspective. Neural Computation, 16(12):2483-2506, 2004.  

This approach allows us to re-visited geometric clustering algorithms, including the widely used k-means clustering algorithm, which can be understood in terms of this framework. The approach suggests a quenched annealing method that allows one to improve the k-means algorithm, dramatically increasing the likelyhood of finding the global optimum.

S. Still and W. Bialek. Geometric Clustering using the Information Bottleneck method. In Sebastian Thrun, Lawrence K. Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press. 2004.


Student Projects (499, 699) and thesis projects:

Applications:

1. Astronomy
          I am looking for a student who is interested in applying these methods to astronomical data.

2. Bioinformatics

Cluster analysis is frequently used to analyze gene expression array data. However, existing methods have crucial shortcomings. In this project, we will address some of them.

3. Computer vision
Image segmentation is an important step in computer vision. This project deals with new approaches to segmentation.

4. Movement primitives
Much work has gone into attempts of trying to decompose human movements such as to be able to better model them. This has applications for example in animation. We will take a new approach to the problem in this project.

Theory:

1. Foundations of unsupervised learning
This problem addresses open problems at the theoretical foundation of unsupervised learning.