Flower 

 || Komo Mai || Research || Publications || Teaching ||


ICS 636:  Information Theory in Machine Learning.

Information theory offers an elegant theoretical foundation for understanding information processing, learning and adaptation which are ubiquitous throughout the animated world, but also play a crucial role for modern intelligent systems. Information theory has interesting and important ties to statistical mechanics and information theory serves as the basis for many data analysis methods.

This course will discuss the role that information theory plays in areas such as
  • statistical inference
  • statistical mechanics
  • time series analysis
  • unsupervised learning and cluster analysis
  • modeling dynamical systems

This is a graduate level course for PhD Students and for Masters students with a serious interest in research. Students from other departments are welcome!  This course may be of interest in particular for students in Physics, Geosciences, Astronomy and other disciplines that heavily utilize data analysis, such as Engineering and (applied) Mathematics.

Format of the class:

This course is organized into thematic blocks. Within each block, we start with a series of lectures to introduce the subject and then move on to discussions in which we go through research papers and open research questions. There are opportunities to do research projects, and to collaborate within the scope of the course.



Syllabus and Readings: (subject to changes, check frequently)

Reference Books:
  • (R1) MacKay, "Information Theory, Inference and Learning Algorithms" online at http://www.inference.phy.cam.ac.uk/mackay/itila/book.html
  • (R2) Cover and Thomas, "Elements of Information Theory". See Cover's website.

Section 1: Basics of Information Theory.

Readings:
C. E. Shannon: A mathematical theory of communication, Bell System Technical Journal, vol. 27, pp. 379-423 and 623-656, July and October, 1948.
R1: Chap 1-3,  34.
R2: Chap. 2 and 3.


Papers:
  • R. Linsker, Self-organization in a perceptual network. Computer 21 105-17
  • A. Bell and T. Sejnowski. An information-maximisation approach to blind separation and blind deconvolution. Neural Computation 7 1129-1159 (1995)
  • E. T. Jaynes. Information Theory and Statistical Mechanics. Physical Review 106(4), 1957.

Section 2: Unsupervised Learning and Cluster Analysis.

Readings:
R1: Chap.
20, 22, 23, 28,  31, 33.
R2: Chap. 13.
A. D. Gordon, "Classification". 
Cluster analysis (online book).

Papers:
  • C. Fraley, A. E. Raftery (1998): How many clusters? Which clustering method? Answers via model-based cluster analysis
  • K. Rose "Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems," Proceedings of the IEEE, vol. 80, pp. 2210-2239, November 1998.
  • N Tishby, FC Pereira, & W Bialek, The information bottleneck method, in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, B Hajek & RS Sreenivas, eds, pp 368-377 (University of Illinois, 1999).
  • J. Buhmann, M. Held (2000): Model selection in clustering by uniform convergence bounds. in NIPS Proceedings.
  • S. Still, W. Bialek (2004): How Many Clusters? An Information-Theoretic Perspective 
  • S. Still, W. Bialek, L. Bottou (2004): Geometric Clustering using the Information Bottleneck method.
  • Q. Song: A Robust Information Clustering Algorithm (2005).
  • http://www.cs.uwaterloo.ca/~shai/LuxburgBendavid05.pdf
  • X-means. D. Pelleg and A. Moore: http://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf
  • A. Raj, C. H. Wiggins: An information-theoretic derivation of min-cut based clustering.

Section 3: Time series analysis and dynamical systems modeling.

Readings:
J.R. Rabiner: Tutorial on HMMs

Papers:

  • W. Bialek, I. Nemenman, N. Tishby: http://www.princeton.edu/~wbialek/our_papers/bnt_01a.pdf
  • J. Crutchfield, K. Young: http://users.cse.ucdavis.edu/~cmg/papers/ISC.pdf
  • and http://users.cse.ucdavis.edu/~cmg/papers/CompOnset.pdf
  • C. Shalizi, J. Crutchfield: http://users.cse.ucdavis.edu/~cmg/papers/cmppss.pdf
  • more papers by J. Crutchfield and his colaborators can be found at: http://users.cse.ucdavis.edu/~cmg/compmech/pubs.htm

Other related interesting reading:

  • "On Discovery and Learning of Models with Predictive State Representations of State for Agents with Continuous Actions and Observations" by David Wingate and Satinder Singh. In Procedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2007.
    This and other related papers can be found on the PIs homepage -> publications -> reinforcement learning: http://www.eecs.umich.edu/~baveja/
  • Judea Pearl "Causality", 2000. http://bayes.cs.ucla.edu/BOOK-2K/
  • Milner's Bisimulation. Tutorial: http://users.ecs.soton.ac.uk/ps/teaching/bisimulation.pdf

Section 4: Interactive Learning.


Lectures


Flyer:
Flyer

REMINDER: Please remember to take part in the CAFE evaluations at the end of the semester! Your feedback is important.