|| Komo Mai || Research || Publications || Teaching ||

[ICS 435]   Machine Learning Fundamentals

Syllabus (subject to change):

Homework etc.
First HW given. (Perceptron)
Remarks on presenting scientific results.
First HW due.
Supervised learning I: Generalization Error vs. training error. Introduction to statistical learning theory and support vector machines (SVM)
Support vector learning: Support vector machines (SVM) and Support vector regression (SVR).
Second HW given. (SVM/R)
Introduction to regression and Bayesian Inference.
Extra credit HW given. (Bayesian Interpolation)
Guest lecture: Prasad Santhanam on linear regression.

Supervised learning II: Introduction to artificial neural networks and deep learning.

Guest tutorial: Giacomo Indiveri on Neuromorphic engineering.
Second HW due.
Introduction to unsupervised learning.

Cluster analysis
Third HW given. (k-means)
Introduction to the use of information theory in unsupervised learning. 

Non-instructional day
Work on Final Project. Third HW due. EC HW due.
Thanksgiving break.

Final Project Presentations. Final Project due.
Study period. Last day of instruction is 12/7. Final taken online the following week.
Final Exam.

40% Homework
40% Final Project (final report is mandatory; presentation is optional)
20% Final Exam


Students from other departments, especially Math, Physics and Biology, are welcome!

Useful web sites:

Citeseer: http://citeseer.ist.psu.edu/
Journal of Machine Learning Research http://jmlr.csail.mit.edu/
NIPS Proceedings: books.nips.cc

Some reference books (none of it is mandatory reading):
Additional Reading material (optional):

Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460
John von Neumann. (1958). The Computer and the Brain, Yale Univ. Press, New Haven
Computation in single neurons:  Hodgkin and Huxley revisited. B Agüera y Arcas, AL Fairhall, & W Bialek, Neural Comp 15, 1715-1749 (2003); physics/0212113.
Malenka RC, Nicoll RA: Long-term potentiation--a decade of progress? Science. 1999 Sep 17;285(5435):1870-4
Bi GQ, Poo MM. Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. J Neurosci. 1998 Dec 15;18(24):10464-72.
Multilayer feedforward networks are universal approximators. K Hornik, M Stinchcombe, H White (1989) Neural Networks,Volume 2 ,  Issue 5, Pages: 359 - 366
K. Fukushima: "Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron —", Trans. IECE, J62-A[10], pp. 658-665 (Oct. 1979).
Y. LeCun: LeNet
J. W. Maass, E.D. Sontag, ``Neural systems as nonlinear filters,'' Neural Computation 12(2000): 1743-1772
D. MacKay: Bayesian Interpolation, Neural Computation 4 3 415-447
V. Balasubramanian, Statistical Inference, Occam's Razor and Statistical Mechanics on The Space of Probability Distributions, Neural Computation, Vol.9, No.2, Feb. 1997
Field theories for learning probability distributions.  W Bialek, CG  Callan & SP Strong,  Phys Rev Lett  77, 4693-4697 (1996)
Occam factors and model-independent Bayesian learning of continuous distributions. I Nemenman & W Bialek, Phys Rev E  65, 026137 (2002)
J. Hopfield: Neural networks and physical systems with emergent collective computational abilities. PNAS 79, 2554, 1982.
Neurons with graded response have collective computational properties like those of two-state
neurons. PNAS 81, 3088, 1984.

J. Hopfield, C. D. Brody: Learning rules and network repair in spike-timing based computation networks (2004) Proc. Natl. Acad. Sci. USA 101, 337-342.
B. Schölkopf: SVM and kernel methods, 2001. Tutorial given at the NIPS Conference
B. Schölkopf, et al.: Introduction to Support Vector Learning, 1999. In: Advances in Kernel Methods - Support Vector Learning, MIT Press.
C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273 -- 297, 1995
Schölkopf, B.: Support Vector Learning. 176, R. Oldenbourg Verlag, Munich (1997)
A. J. Smola. Regression estimation with support vector learning machines. Diplomarbeit, Technische Universität München, 1996.
J. Buhmann, M. Held (2000): Model selection in clustering by uniform convergence bounds. in NIPS Proceedings.
R. Linsker, Self-organization in a perceptual network. Computer 21 105-17
A. Bell and T. Sejnowski. An information-maximisation approach to blind separation and blind deconvolution. Neural Computation 7 1129-1159 (1995)
Maximum Likelihood Blind Source Separation: A Context-Sensitive Generalization of ICA. B. A. Pearlmutter and L. C. Parra in Advances in Neural Information Processing Systems 9. (1997).
W. Bialek, F. Rieke, R. de Ruyter van Steveninck, & D. Warland, Reading a neural code, Science 252 1854-57 (1991).
Fairhall AL, Lewen GD, Bialek W, de Ruyter Van Steveninck RR, Efficiency and ambiguity in an adaptive neural code, Nature, 412(6849), 787-92, August 2001.
A sensory source for motor variation.  LC Osborne, SG Lisberger & W Bialek, Nature 437, 412-416 (2005).
Thinking about the brain. W Bialek, in  Physics of Biomolecules and Cells: Les Houches Session LXXV, H Flyvbjerg, F JŸlicher, P Ormos, & F David, eds, pp 485-577 (EDP Sciences, Springer-Verlag, Berlin, 2002);
T. Toyoizumi, J.-P. Pfister, K. Aihara, W. Gerstner: Spike-timing Dependent Plasticity and Mutual Information Maximization for a Spiking Neuron Model
Advances in Neural Information Processing Systems 17, MIT Press
J.-P. Nadal & N. Parga, "Sensory coding: information maximization and redundancy reduction", in Neural information processing, G. Burdet, P. Combe and O. Parodi Eds., World Scientific Series in Mathematical Biology and Medecine Vol. 7 (Singapore, 1999), pp. 164-171.
K. Rose, "Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems," Proceedings of the IEEE, vol. 80, pp. 2210-2239, November 1998.
N Tishby, FC Pereira, & W Bialek, The information bottleneck method, in Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, B Hajek & RS Sreenivas, eds, pp 368-377 (University of Illinois, 1999);
S. Still, W. Bialek and L. Bottou. Geometric Clustering using the Information Bottleneck method. In Sebastian Thrun, Lawrence K. Saul, and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada], 2004. MIT Press.
S. Still and W. Bialek. How many clusters? An information theoretic perspective. Neural Computation, 16(12):2483-2506, 2004.
A tutorial on hidden Markov models and selected applications inspeech recognition; Rabiner, L.R. Proceedings of the IEEE Volume 77, Issue 2, Feb 1989 Page(s):257 - 286
W Bialek, I Nemenman & N Tishby, Complexity through nonextensivity Physica A 302, 89-99 (2001)
C. R. Shalizi and J. P. Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity", Journal of Statistical Physics 104 (2001) 819--881.
D. P. Feldman and J. P. Crutchfield, "Structural Information in Two-Dimensional Patterns: Entropy Convergence and Excess Entropy", Physical Review E 67 (2003) 051104.
Chris Watkins and Peter Dayan. Q-Learning. Machine Learning, 8:279--292, 1992.
S. Still (2009) Information theoretic approach to interactive learning EPL 85, 28005.
S. Still (2014) Information Bottleneck Approach to Predictive Inference Entropy 16(2):968-989
S. Still (2017) Thermodynamic cost and benefit of data representations arXiv:1705.00612

(Page under construction--subject to change. Please report bugs and discontinued links.)

 || Google || MathWorld ||