ICS 691 Machine Learning

Class objectives

An introduction is given to the area of machine learning. Concepts and typical problems are discussed. Frequently used machine learning methods and classical work are introduced. The course proceeds to focus on information theoretic approaches to machine learning. An introduction to information theory is given. Motivations from information processing in the brain are discussed, and finally applications to bioinformatics and other areas of interest.

Organization

The course combines Lectures with Seminars in which students present a paper. The presentations count 40% towards the final grade. The remaining 60% are covered by a final exam. There is an optional choice of doing a class project instead of an exam.

Schedule

Week |
Subject |
Lecture / Seminar: discussed
papers |

1 |
Introduction to machine
learning. Supervised learning: Regression. Cross-validation. Bayesian
estimation. |
Lecture |

2 |
Neural Networks intro:
Perceptrons. |
Lecture |

3 |
Reinforcement Learning. | Guest Lecture by Dr. Chris
Watkins, Royal Holloway, University of London. |

4 |
Neural Networks. Support vector machines. | Lecture |

5 |
Unsupervised learning and
cluster analysis. |
Lecture |

6 |
Quantifying information transmission and learning. Introduction to information theory. | Lecture |

7 |
Optimality in neural information
processing systems. |
W. Bialek, F. Rieke, R. de Ruyter van Steveninck,
& D. Warland,
Reading a neural code, Science 252 1854-57 (1991)Fairhall AL, Lewen GD, Bialek W, de Ruyter Van Steveninck RR, Efficiency and ambiguity in an adaptive neural code , Nature, 412(6849), 787-92, August 2001 |

8 |
Applications of Neural Networks |
[Ning
et al., 2005]: Toward Automatic Phenotyping of Developing Embryos
from Videos (IEEE Trans. Image Processing, 2005) |

9 |
Lossy compression and
unsupervised learning. 1: Rate distortion theory and clustering. |
Lecture |

10 |
2: Compression and relevance. | Lecture |

11 |
3: Complexity control. | Lecture |

12 |
Behavioral and interactive learning. | Lecture |

13 |
Advanced topics or applications. |
2 Papers of student choice. Possible examples. |

14 |
Applications to molecular
biology and bioinformatics. |
2 Papers of student choice. |

15 |
Student project reports. |

Reference books (not required):

- Mitchell, "Machine Learning"

- MacKay, "Information Theory, Inference and Learning Algorithms"

- Duda, Hart and Stork, "Pattern CLassification"

- Alpaydin, "Introduction to Machine Learning"
- Hastie, Tibshirani and Friedman, "The Elements of Statistical Learning: Data Mining, Inference, and Prediction"
- Cristianini and Shaw-Taylor, "An Introduction to Support Vector
Machines"

- Sutton and Barto, "Reinforcement Learning"
- Cover and Thomas, "Elements of Information Theory". See Cover's website.
- Gordon, "Classification"
- Hertz, Krogh, Palmer "Introduction to the theory of neural computation" (read Chapter 2 to repeat material on Hopfield Nets)

Journal of Machine Learning Research http://jmlr.csail.mit.edu/

Kernel Machines http://www.kernel-machines.org

MacKay, "Information Theory, Inference and Learning Algorithms" online at http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

Independent Component Analysis http://www.cnl.salk.edu/~tewon/ica_cnl.html

Citeseer: http://citeseer.ist.psu.edu/

NIPS Proceedings: books.nips.cc

Some papers, etc.:

- Kurt Hornik, Maxwell B. Stinchcombe, Halbert White: Multilayer feedforward networks are universal approximators. Neural Networks 2(5): 359-366 (1989)
- George Cybenko.
*Continuous valued neural networks with two hidden layers are sufficient*. Technical report, Department of Computer Science, Tufts University, Medford, MA, 1988. link to his publications - Nettalk
- Digit recognition
- Hopfield net: Neural
networks and physical systems with emergent collective computational
abilities. PNAS 79, 2554, 1982.

- Kernel PCA:
Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation*Schölkopf, B., A.J. Smola and K.-R. Müller:***10**, 1299-1319 (1998)**(5)** - Clustering: http://www.cs.uwaterloo.ca/~shai/LuxburgBendavid05.pdf ; J. Buhmann and M. Held (2000): Model selection in clustering by uniform convergence bounds, NIPS Proceedings.
- Rose et. al (1999)

Mini-quizzes (not graded):

The intention here is to give you a reminder of all the subjects we covered in the lecture. You can take a little time at the end of each lecture to write down the main ideas for each subject. This may help you to assess your understanding, and to formulate questions if necessary.

Notes and announcements:

- (9/7) Guest Lecture by Dr. Chris Watkins, Royal Holloway,
University of London.

- (9/7) Students should start now to chose papers to present and a project to work on.

Google MathWorld Citeseer