Flower
 
 || Komo Mai || Research || Publications || Teaching ||


Spring 2008. ICS 491: Special topics. T/R 1-2.15.  FLYER


Neuroinformatics and Machine Learning:
From synapses to algorithms. An Introduction.

Overview:

Machine learning and machine intelligence have rapidly gained importance, both in science and in every day life. As processor speed and memory storage are drastically increasing, now we are often facing the problem that there is too much data to be analyzed by hand, and we need automated tools. This becomes evident in every day problems, like email spam filtering, and also in modern branches of science, such as astronomy, genomics and bioinformatics.

Questions of the type ``Can machines think?'' have been asked since the invention of the computer. Half a century later, we understand much more about information processing in the brain, and about learning theory. The advances that have been made in machine learning over the past two decades have resulted in drastic improvements, allowing computer programs to learn from examples, and to adapt.

This course provides a comprehensible picture of the concepts underlying machine learning algorithms. It also introduces computational modeling of biological systems, and focuses on drawing connections between the two areas by paying particular attention to computational neuroscience/neuroinformatics. Basics of information theory are introduced and applied to learning.

By the end of the course, students will understand the basics of information processing and learning in the nervous system, they will be familiar with a variety of important machine learning methods and computational models, and will be able to apply those to selected problems.


Students from other departments, especially Math, Physics and Biology, are very welcome.

Syllabus:

Introduction to neuroinformatics / computational neuroscience

Models of neurons: Hodgkin-Huxley model, Morris-Leccar model, Integrate and fire model.
Models of learning: Hebbian learning, spike-timing dependent plasticity.

Introduction to machine learning algorithms

Supervised learning: The perceptron algorithm, feed-forward neural networks and the backpropagation algorithm, Support Vector Machines.
Unsupervised learning / Cluster Analysis: Associative memory (Hopfield network), Cluster analysis, K-means algorithm.

Introduction to information theory

Entropy as a measure of uncertainty; Mutual Information; Channel Capacity; Channel coding theorem; Rate-distortion theory.

Information theory in machine learning

Rate Distortion theory and clustering: soft K-means and deterministic annealing.
Compression through relevance: Information Bottleneck Method. Complexity control.
Time series prediction and optimal predictive inference.

Detailed Syllabus:

Jan   
15    Introduction   
17    Biophysics of Neurons
22    Neurons: HH   
24    Simplifications: ML
29    IF model; Synapses   
31    Quiz 1
Feb   
5    Supervised Learning; Perceptron   
7    Feed Forward ANNs
12    Learning in FF ANNs   
14    Backpropagation Alg.
19    Associative Memory   
21    HNN
25    Quiz 2   
27    Statistical Learning Theory
March  
4    Support Vector Machines   
6    Kernel Machines
11    Midterm Exam   
13    K-means Alg.
18    Cluster Analysis   
20    Bayesian Inference
April   
1    Time Series Analysis   
3    HMMs
8    Quiz 3   
10    Information Theory: Intro
15    Information Theory: Intro   
17    Rate Distortion Theory
22    Applications to Learning Theory: DA   
24    Soft K-means and information theory
29    Complexity Control   
May
1    Prediction
6    Student Homework Presentations   

Lecture
Homework (tentative -- this column will change)
Overview and Introduction.

Computations in single neurons: Biophysical modeling (Hodgkin-Huxley model). (2)

Film
Homework 1
Single neurons and synnaptic connections. Simple learning rules. (2)

Supervised learning: The perceptron algorithm. Homework 2
Feed-forward artificial neural networks (ANNs): Backpropagation. (2)
Homework 3
Recurrent ANN, associative memory: Hopfield Network. (2)

Worrying about generalization error: Intro to VC-dimension, Structural risk minimalization and Support vector machines (SVMs). SVM algorithm. (3)
Homework  (voluntary -- extra credit!)
Unsupervised Learning and Cluster analysis. K-means algorithm. (2)
Homework 4
Bayesian Inference

Time Series Analysis; Hidden Markov Models. (2)

Information theory: Uncertainty and Entropy. Conditional entropy and mutual information. (2)
Rate--distortion theory. Lossy compression through relevance.
Application to clustering; Deterministic annealing; soft K-means algorithm. (2)
Homework 5
Complexity control.

Prediction. Optimal causal inference.



Reference books:
Useful web sites:

arXiv.org
Journal of Machine Learning Research http://jmlr.csail.mit.edu/
Kernel Machines http://www.kernel-machines.org
Independent Component Analysis http://www.cnl.salk.edu/~tewon/ica_cnl.html
Citeseer: http://citeseer.ist.psu.edu/
NIPS Proceedings: books.nips.cc

Additional Reading material (optional):

Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460
John von Neumann. (1958). The Computer and the Brain, Yale Univ. Press, New Haven
Gerstner and Kistler: Spiking Neuron Models. Single Neurons, Populations, Plasticity; Cambridge University Press, 2002.
Computation in single neurons:  Hodgkin and Huxley revisited. B Agüera y Arcas, AL Fairhall, & W Bialek, Neural Comp 15, 1715-1749 (2003); physics/0212113.
Malenka RC, Nicoll RA: Long-term potentiation--a decade of progress? Science. 1999 Sep 17;285(5435):1870-4
Bi GQ, Poo MM. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci. 1998 Dec 15;18(24):10464-72.
Multilayer feedforward networks are universal approximators. K Hornik, M Stinchcombe, H White (1989) Neural Networks,Volume 2 ,  Issue 5, Pages: 359 - 366
K. Fukushima: "Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron —", Trans. IECE, J62-A[10], pp. 658-665 (Oct. 1979).
Y. LeCun: LeNet
J. W. Maass, E.D. Sontag, ``Neural systems as nonlinear filters,'' Neural Computation 12(2000): 1743-1772
D. MacKay: Bayesian Interpolation, Neural Computation 4 3 415-447
V. Balasubramanian, Statistical Inference, Occam's Razor and Statistical Mechanics on The Space of Probability Distributions, Neural Computation, Vol.9, No.2, Feb. 1997
Field theories for learning probability distributions.  W Bialek, CG  Callan & SP Strong,  Phys Rev Lett  77, 4693-4697 (1996)
Occam factors and model-independent Bayesian learning of continuous distributions. I Nemenman & W Bialek, Phys Rev E
  65, 026137 (2002)
J. Hopfield: Neural networks and physical systems with emergent collective computational abilities. PNAS 79, 2554, 1982.
Neurons with graded response have collective computational properties like those of two-state
neurons. PNAS 81, 3088, 1984.

J. Hopfield, C. D. Brody: Learning rules and network repair in spike-timing based computation networks (2004) Proc. Natl. Acad. Sci. USA 101, 337-342.
B. Schölkopf: SVM and kernel methods, 2001. Tutorial given at the NIPS Conference
B. Schölkopf, et al.: Introduction to Support Vector Learning, 1999. In: Advances in Kernel Methods - Support Vector Learning, MIT Press.
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273 -- 297, 1995
Schölkopf, B.: Support Vector Learning. 176, R. Oldenbourg Verlag, Munich (1997)
A. J. Smola. Regression estimation with support vector learning machines. Diplomarbeit, Technische Universität München, 1996.
http://www.cs.uwaterloo.ca/~shai/LuxburgBendavid05.pdf  
J. Buhmann, M. Held (2000): Model selection in clustering by uniform convergence bounds. in NIPS Proceedings.
R. Linsker, Self-organization in a perceptual network. Computer 21 105-17
A. Bell and T. Sejnowski. An information-maximisation approach to blind separation and blind deconvolution. Neural Computation 7 1129-1159 (1995)
Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA. B. A. Pearlmutter and L. C. Parra in Advances in Neural Information Processing Systems 9. (1997).
W. Bialek, F. Rieke, R. de Ruyter van Steveninck, & D. Warland, Reading a neural code, Science 252 1854-57 (1991).
Fairhall AL, Lewen GD, Bialek W, de Ruyter Van Steveninck RR, Efficiency and ambiguity in an adaptive neural code, Nature, 412(6849), 787-92, August 2001.
A sensory source for motor variation.  LC Osborne, SG Lisberger & W Bialek, Nature 437, 412-416 (2005).
Thinking about the brain. W Bialek, in  Physics of Biomolecules and Cells: Les Houches Session LXXV, H Flyvbjerg, F JŸlicher, P Ormos, & F David, eds, pp 485-577 (EDP Sciences, Springer-Verlag, Berlin, 2002);
T. Toyoizumi, J.-P. Pfister, K. Aihara, W. Gerstner: Spike-timing Dependent Plasticity and Mutual Information Maximization for a Spiking Neuron Model
Advances in Neural Information Processing Systems 17, MIT Press
J.-P. Nadal & N. Parga, "Sensory coding: information maximization and redundancy reduction", in Neural information processing, G. Burdet, P. Combe and O. Parodi Eds., World Scientific Series in Mathematical Biology and Medecine Vol. 7 (Singapore, 1999), pp. 164-171.
K. Rose, "Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems," Proceedings of the IEEE, vol. 80, pp. 2210-2239, November 1998.
N Tishby, FC Pereira, & W Bialek, The information bottleneck method, in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, B Hajek & RS Sreenivas, eds, pp 368-377 (University of Illinois, 1999);
S. Still, W. Bialek and L. Bottou. Geometric Clustering using the Information Bottleneck method. In Sebastian Thrun, Lawrence K. Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada], 2004. MIT Press.
S. Still and W. Bialek. How many clusters? An information theoretic perspective. Neural Computation, 16(12):2483-2506, 2004.
A tutorial on hidden Markov models and selected applications inspeech recognition; Rabiner, L.R. Proceedings of the IEEE Volume 77, Issue 2, Feb 1989 Page(s):257 - 286
W Bialek, I Nemenman & N Tishby, Complexity through nonextensivity Physica A 302, 89-99 (2001)
C. R. Shalizi and J. P. Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity", Journal of Statistical Physics 104 (2001) 819--881.
D. P. Feldman and J. P. Crutchfield, "Structural Information in Two-Dimensional Patterns: Entropy Convergence and Excess Entropy", Physical Review E 67 (2003) 051104.
S. Still. Statistical Mechanics approach to interactive learning. 2005/2007(revised). http://arxiv.org/abs/0709.1948
S. Still and J. P. Crutchfield. Structure or Noise? 2007. http://lanl.arxiv.org/abs/0708.0654
S. Still, J. P. Crutchfield and C. J. Ellison. Optimal Causal Inference. 2007. http://lanl.arxiv.org/abs/0708.1580
Chris Watkins and Peter Dayan. Q-Learning. Machine Learning, 8:279--292, 1992.

(Please report bugs and discontinued links.)

Workload / Grading.

Homework: 20% of the final grade.
Midterm: 30 %.
Final: 50%.

All reading is voluntary. The material is listed for student's information.

Office Hours by appointment.

 || Google || MathWorld ||