|| Komo Mai || Research || Publications || Teaching ||

[ICS 435]   Machine Learning Fundamentals

Syllabus (subject to change):

Homework etc.
First HW given. (Perceptron)
Remarks on presenting scientific results.
First HW due.
Supervised learning I: Generalization Error vs. training error. Introduction to statistical learning theory and support vector machines (SVM)
Support vector learning: Support vector machines (SVM) and Support vector regression (SVR).
Second HW given. (SVM/R)
Introduction to regression and Bayesian Inference.
Extra credit HW given. (Bayesian Interpolation)
Guest lecture: Prasad Santhanam on linear regression.

Supervised learning II: Introduction to artificial neural networks and deep learning.

Guest tutorial: Giacomo Indiveri on Neuromorphic engineering.
Second HW due.
Introduction to unsupervised learning.

Third HW given. (k-means)
Introduction to the use of information theory in unsupervised learning. 

From thermodynamics to information theory to machine learning.
Behavioral learning (pending student interest and time frame). OR work on Final Project Third HW due. EC HW due.
Work on Final Project.
Thanksgiving break.

Final Project Presentations. Final Project due.
Study period. Last day of instruction is 12/7. Final taken online the following week.
Final Exam.


Students from other departments, especially Math, Physics and Biology, are welcome!

Useful web sites:

Citeseer: http://citeseer.ist.psu.edu/
Journal of Machine Learning Research http://jmlr.csail.mit.edu/
NIPS Proceedings: books.nips.cc

Some reference books (none of it is mandatory reading):
Additional Reading material (optional):

Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460
John von Neumann. (1958). The Computer and the Brain, Yale Univ. Press, New Haven
Computation in single neurons:  Hodgkin and Huxley revisited. B Agüera y Arcas, AL Fairhall, & W Bialek, Neural Comp 15, 1715-1749 (2003); physics/0212113.
Malenka RC, Nicoll RA: Long-term potentiation--a decade of progress? Science. 1999 Sep 17;285(5435):1870-4
Bi GQ, Poo MM. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci. 1998 Dec 15;18(24):10464-72.
Multilayer feedforward networks are universal approximators. K Hornik, M Stinchcombe, H White (1989) Neural Networks,Volume 2 ,  Issue 5, Pages: 359 - 366
K. Fukushima: "Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron —", Trans. IECE, J62-A[10], pp. 658-665 (Oct. 1979).
Y. LeCun: LeNet
J. W. Maass, E.D. Sontag, ``Neural systems as nonlinear filters,'' Neural Computation 12(2000): 1743-1772
D. MacKay: Bayesian Interpolation, Neural Computation 4 3 415-447
V. Balasubramanian, Statistical Inference, Occam's Razor and Statistical Mechanics on The Space of Probability Distributions, Neural Computation, Vol.9, No.2, Feb. 1997
Field theories for learning probability distributions.  W Bialek, CG  Callan & SP Strong,  Phys Rev Lett  77, 4693-4697 (1996)
Occam factors and model-independent Bayesian learning of continuous distributions. I Nemenman & W Bialek, Phys Rev E  65, 026137 (2002)
J. Hopfield: Neural networks and physical systems with emergent collective computational abilities. PNAS 79, 2554, 1982.
Neurons with graded response have collective computational properties like those of two-state
neurons. PNAS 81, 3088, 1984.

J. Hopfield, C. D. Brody: Learning rules and network repair in spike-timing based computation networks (2004) Proc. Natl. Acad. Sci. USA 101, 337-342.
B. Schölkopf: SVM and kernel methods, 2001. Tutorial given at the NIPS Conference
B. Schölkopf, et al.: Introduction to Support Vector Learning, 1999. In: Advances in Kernel Methods - Support Vector Learning, MIT Press.
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273 -- 297, 1995
Schölkopf, B.: Support Vector Learning. 176, R. Oldenbourg Verlag, Munich (1997)
A. J. Smola. Regression estimation with support vector learning machines. Diplomarbeit, Technische Universität München, 1996.
J. Buhmann, M. Held (2000): Model selection in clustering by uniform convergence bounds. in NIPS Proceedings.
R. Linsker, Self-organization in a perceptual network. Computer 21 105-17
A. Bell and T. Sejnowski. An information-maximisation approach to blind separation and blind deconvolution. Neural Computation 7 1129-1159 (1995)
Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA. B. A. Pearlmutter and L. C. Parra in Advances in Neural Information Processing Systems 9. (1997).
W. Bialek, F. Rieke, R. de Ruyter van Steveninck, & D. Warland, Reading a neural code, Science 252 1854-57 (1991).
Fairhall AL, Lewen GD, Bialek W, de Ruyter Van Steveninck RR, Efficiency and ambiguity in an adaptive neural code, Nature, 412(6849), 787-92, August 2001.
A sensory source for motor variation.  LC Osborne, SG Lisberger & W Bialek, Nature 437, 412-416 (2005).
Thinking about the brain. W Bialek, in  Physics of Biomolecules and Cells: Les Houches Session LXXV, H Flyvbjerg, F JŸlicher, P Ormos, & F David, eds, pp 485-577 (EDP Sciences, Springer-Verlag, Berlin, 2002);
T. Toyoizumi, J.-P. Pfister, K. Aihara, W. Gerstner: Spike-timing Dependent Plasticity and Mutual Information Maximization for a Spiking Neuron Model
Advances in Neural Information Processing Systems 17, MIT Press
J.-P. Nadal & N. Parga, "Sensory coding: information maximization and redundancy reduction", in Neural information processing, G. Burdet, P. Combe and O. Parodi Eds., World Scientific Series in Mathematical Biology and Medecine Vol. 7 (Singapore, 1999), pp. 164-171.
K. Rose, "Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems," Proceedings of the IEEE, vol. 80, pp. 2210-2239, November 1998.
N Tishby, FC Pereira, & W Bialek, The information bottleneck method, in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, B Hajek & RS Sreenivas, eds, pp 368-377 (University of Illinois, 1999);
S. Still, W. Bialek and L. Bottou. Geometric Clustering using the Information Bottleneck method. In Sebastian Thrun, Lawrence K. Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada], 2004. MIT Press.
S. Still and W. Bialek. How many clusters? An information theoretic perspective. Neural Computation, 16(12):2483-2506, 2004.
A tutorial on hidden Markov models and selected applications inspeech recognition; Rabiner, L.R. Proceedings of the IEEE Volume 77, Issue 2, Feb 1989 Page(s):257 - 286
W Bialek, I Nemenman & N Tishby, Complexity through nonextensivity Physica A 302, 89-99 (2001)
C. R. Shalizi and J. P. Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity", Journal of Statistical Physics 104 (2001) 819--881.
D. P. Feldman and J. P. Crutchfield, "Structural Information in Two-Dimensional Patterns: Entropy Convergence and Excess Entropy", Physical Review E 67 (2003) 051104.
Chris Watkins and Peter Dayan. Q-Learning. Machine Learning, 8:279--292, 1992.
S. Still (2009) Information theoretic approach to interactive learning EPL 85, 28005.
S. Still (2014) Information Bottleneck Approach to Predictive Inference Entropy 16(2):968-989
S. Still (2017) Thermodynamic cost and benefit of data representations arXiv:1705.00612

(Page under construction--subject to change. Please report bugs and discontinued links.)

 || Google || MathWorld ||