Research Interests: Unsupervised learning / Cluster analysis
Cluster analysis is one of the most freequently used data analysis
methods with applications in a variety of fields ranging from
bioinformatics to computer vision to ecology, geophysics, and astronomy.
One of the most challenging problems in cluster analysis is to
determine the number of clusters we can resolve from a given data set.
It is related to controlling the complexity of a model, a crucial task
in machine learning. We derived a complexity control term within an
information theoretical
framework. This allows us to determine the maximal number of clusters
which can be resolved given the size of the data set.
This approach allows us to re-visited geometric clustering
algorithms, including the widely used k-means
clustering algorithm, which can be understood in terms of this
framework. The approach suggests a quenched annealing method that
allows one to improve the k-means algorithm, dramatically increasing
the likelyhood of finding the global optimum.
Student Projects (499, 699)
and
thesis projects:
Applications:
1. Astronomy I am looking for a
student who is interested in applying these methods to astronomical
data.
2. Bioinformatics
Cluster analysis is frequently used to
analyze gene expression array data. However, existing methods have
crucial shortcomings. In this project, we will address some of them.
3. Computer vision
Image segmentation is an important step
in computer vision. This project deals with new approaches to
segmentation.
4. Movement primitives
Much work has gone into attempts of
trying to decompose human movements such as to be able to better model
them. This has applications for example in animation. We will take a
new approach to the problem in this project.
Theory:
1.
Foundations of unsupervised learning
This problem
addresses open problems at the theoretical foundation of unsupervised
learning.