Selected research
interests of Susanne Still

Fundamental physical limits to information processing

Optimal data representation and the Information Bottleneck framework

Quantum machine learning

Rules of Information Acquisition and Processing

Selected Applications

Students interested in doing a project or joining the group should email me.

The purpose of this
text is to give interested readers a synopsis of some of
my work. Obviously, there is a large body of relevant
work done by others in these areas, references can be
found in my papers.

Fundamental physical limits to information processing

Optimal data representation and the Information Bottleneck framework

Quantum machine learning

Rules of Information Acquisition and Processing

Selected Applications

Students interested in doing a project or joining the group should email me.

These ultimate limits are achievable under conditions that are unrealistic for real world information processing systems: (i) operations have to be performed arbitrarily slowly, (ii) all relevant information has to be observable, (iii) all actions have to be chosen optimally. But interactive observers (often called "agents") in the real world usually do not have the luxury of operating arbitrarily slowly. On the contrary, speed is crucial, and these systems are therefore mostly not in thermodynamic equilibrium. They also typically find themselves in partially observable situations, and usually face constraints on what they can do, having limited control.

What are general bounds on thermodynamic efficiency that apply to real world systems? When those bounds are optimized over all possible strategies with which an agent can represent data and act on it, do general rules for optimal information processing emerge? Can this optimization furthermore give us concrete design principles for learning algorithms?

Implications from this line of reasoning are broad. On the one hand, it could lead to a unifying theory of learning and adaptation that is well grounded in physical reality, on the other hand it could lead to design principles for novel, highly energy efficient, computing hardware.

In the most general setup, information-to-work conversion happens within partially observable systems. I showed that the thermodynamic efficiency of generalized, partially observable, information engines is limited by how much irrelevant information is retained by the data representation [1]. Optimizing for energy efficiency thus leads to a general rule for data acquisition and information processing: retain only information that is predictive of the quantities to be inferred. In other words: predictive inference can be derived from a physical principle. The generalized lower bound on dissipation can be directly minimized over all possible data representation strategies to yield strategies that least preclude efficiency. Mathematically, this procedure results in the derivation of a concrete, known and widely used, method for lossy compression and machine learning, called Information Bottleneck (Tishby, Pereira and Bialek, 1999).

The theory we developed here lays the foundation for our ongoing investigations into the thermodynamics of interactive learning (more about interactive learning below).

To test the predictions of [1, 2, 6], and others made for
information engines that run at finite speed, we proposed a
series of experiments that should allow us to either develop
concrete building blocks for a "thermodynamic computer", or
to clarify why and how reversible computation is infeasible.
This is a joint project with John Bechhoefer (experiment)
and David Sivak at SFU, funded by the Foundational Questions
Institute. We are
currently interviewing candidates for a postdoctoral
position. Interested candidates should send me an
email.

- [1] S. Still (2019) Thermodynamic cost and benefit of data representations Phys. Rev. Lett. (in press) arXiv:1705.00612
- [2] E. Stopnitzky, S. Still, T. E. Ouldridge and L. Altenberg (2019) Physical limitations of work extraction from temporal correlations Phys. Rev. E 99 042115. arXiv:1808.06908
- [3] J. Song, S. Still, R. Díaz Hernández Rojas, I. Pérez Castillo, M. Marsili (2019) Optimal work extraction and mutual information in a generalized Szilárd engine Phys. Rev. E (under review) arXiv:1910.04191
- [4] G. E. Crooks and S. Still (2019) Marginal
and Conditional Second Laws of Thermodynamics
EPL (Europhysics Letters) 125, 4, 40005.
arXiv:1611.04628

- [5] S. Still (2014) Lossy is lazy.
Proc. Seventh Workshop on Information Theoretic Methods in
Science and Engineering (WITMSE-2014), eds. J. Rissanen,
P. Myllymäki, T. Roos, and N. P. Santhanam

- [6] S. Still, D. A. Sivak, A. J. Bell, and G. E. Crooks
(2012) Thermodynamics
of
Prediction Phys.
Rev. Lett. 109, 120604

(2018-2020) "Thermodynamics of Agency"(partly); Foundational Questions Institute with the Fetzer Franklin Fund.

(2019-2021) "Maxwell's demon in the real world"; with John Bechhoefer (PI) and David Sivak; Foundational Questions Institute.

- 11/14-18/2019 Montreal Artificial Intelligence and Neuroscience (MAIN), Montreal, Canada.
- 07/20-25/2019 The Foundational Questions Institute 6th International Conference, Tuscany, Italy.
- 07/11-12/2019 The
Physics of Evolution, Francis Crick Institute,
London.

- 08/26-31/2018 Runde
Workshop, Runde Island, Norway.

- 02/08/2018 Non-equilibrium
dynamics and information processing in biology,
Okinawa Institute of Science and Technology, Japan (remote
talk).

- 11/18/2016 Statistical Physics, Information Processing and Biology, Santa Fe Institute, Santa Fe, NM
- 09/25/2016 Information,
Control, and Learning--The Ingredients of Intelligent
Behavior, Hebrew University, Jerusalem, Israel
(remote talk).

- 08/20/2016 Foundational
Questions Institute, 5th International Conference,
Banff, Canada.

- 04/25/2016 Spring
College
in the Physics of Complex Systems International
Center for Theoretical Physics (ICTP), Trieste, Italy.

- 7/14-17/2015 Conference
on
Sensing, Information and Decision at the Cellular Level
ICTP

- 5/4-6/2015 Workshop
"Nature
as Computation". Beyond Center for
Fundamental Concepts in Science.

- 4/8-10/2015 Workshop on Entropy and Information in Biological Systems National Institute for Mathematical and Biological Synthesis (NIMBioS).
- 10/26-31/2014 Biological
and Bio-Inspired Information Theory Banff,
Canada.

- 7/5-8/2014 Seventh Workshop on Information Theoretic Methods in Science and Engineering
- 5/8-10/2014 Statistical Mechanics Foundations of
Complexity–Where do we stand? Santa Fe Institute.

- 1/14-16/2014
The Foundational Questions Institute Fourth
International Conference, Vieques Island, PR.

- 6/26-28/2013 Modeling Neural Activity (MONA) Kauai, HI.
- 01/2011 Workshop on
measures of complexity Santa Fe Institute, Santa
Fe, NM

- 01/2011 - Berkeley Mini Stat. Mech. Meeting.

- 08/2018 - Institute
for Theoretical Physics (ITP), ETH Zuerich,
Switzerland.

- 08/2018 - Institute
for Neuroinformatics, University of Zuerich,
Switzerland.

- 07/2018 - IST,
Austria.

- 06/2018 - Google
Deepmind, Montreal, Canada.

- 06/2018 - Facebook AI,
Montreal, Canada.

- 11/2016 - Condensed Matter Seminar, UC Santa Cruz.
- 08/2016 - Biophysics Seminar, Simon Fraser University, Vancouver, Canada.
- 06/2013 - Max Planck Institute for Dynamics and Self-organization, Göttingen, Germany.
- 04/2013 - Scuola Internazionale Superiore di Studi Avanzati (SISSA) Trieste, Italy.
- 03/2013 - Physics Department, The University of Auckland, Auckland, NZ.
- 03/2013 - Physics Department, The University of the South Pacific, Suva, Fiji.
- 11/2012 - Center for
Mind, Brain and Computation Stanford University.

- 09/2012 - Physics
Colloquium University of Hawaii at Manoa.

- 10/2011 - Redwood Center for Neuroscience, University of California at Berkeley.
- 08/2011 - Institute for Neuroinformatics, ETH/UNI Zürich, Switzerland.
- 11/2011 - Symposium in honor of W. Bialek’s 50th Birthday, Princeton University, Princeton, NJ.

- 2/19/2015 Nostalgia Just Became a Law of Nature (by S. DeDeo)
- 10/9/2014 Life's Quantum Crystal Ball (by C. Piekema)
- 10/4/2012 Proteins remember the past to predict the future (by P. Ball) Nature News.

Living systems learn by interacting with their environment,
in a sense they "ask questions and do experiments", not only
by actively filtering the data but also by perturbing, and,
to some degree, controlling the environment that they are
learning about. Ultimately, one would like to understand the
emergence of complex behaviors from simple first principles.
To ask about simple characteristics of policies which would
allow an agent to optimally capture predictive information,
I extended the Information Bottleneck approach to the
situation with feedback from the learner, and showed that
optimal encoding in the presence of feedback requires action
strategies to balance exploration with control [10]. Both
aspects, exploration and control, emerge in this treatment
as necessary ingredients for behaviors with maximal
predictive power. The reason why they both emerge is the
feedback itself.

This study resulted in a novel algorithm for recursively
learning optimal models and policies from data, which my
student Lisa Miller has applied to selected problems in
robotics (work in progress). In the context of reinforcement
learning this approach allowed us to study [8] how
exploration emerges as an optimal strategy, driven by the
need to gather information, rather than being put in by hand
as action policy randomization.

- [7] S. Still (2014) Information
Bottleneck Approach to Predictive Inference.
*Entropy*16(2):968-989

- [8] S. Still and D. Precup (2012) An information-theoretic approach to curiosity-driven reinforcement learning Theory in Biosciences, 131 (3) pp. 139-148
- [9] S. Still, J. P. Crutchfield and C. J. Ellison (2010)
Optimally
Predictive Causal Inference. CHAOS 20, 037111

- [10] S. Still (2009) Information theoretic approach to interactive learning EPL 85 28005
- [11] S. Still and W. Bialek (2004) How many clusters? An information theoretic perspective. Neural Computation 16(12):2483-2506

- 09/2010 Eigth Fall Course on Computational Neuroscience}, Bernstein Center for Computational Neuroscience, and Max Planck Institute for Dynamics and Self-Organization, Goettingen, Germany.
- 08/2009 Keynote Lecture. 2nd International Conference on
Guided Self-Organization (GSO)}, Leipzig, Germany.

- 07/2009 Chaos/Xaoc, Conference Center of the National Academy of Sciences in Woods Hole, MA.
- 08/2008 Sante Fe Institute Complex Systems Summer School at the Institute of Theoretical Physics, Chinese Academy of Sciences (CAS), Beijing, China.
- 09/2008 Ecole Recherche Multimodale d'Information Techniques & Sciences (ERMITES); Universite du Sud Toulon-Var, Laboratoire des Sciences de l'Information et des Systemes, Association Francaise de la Communication Parlee; Giens, France.
- 09/2009 European Conference on Complex Systems, Warwick (ECCS ‘09), Workshop on Information, Computation, and Complex Systems.
- 04/2006 Bellairs Reinforcement Learning Workshop,
Barbados.

- 12/2005 Neural Information Processing Systems (NIPS),
Workshop on ``Models of Behavioral Learning'', Vancouver,
BC, Canada.

- 07/2004 Kavli Institute for Theoretical Physics (KITP), University of California, Santa Barbara. Program: Understanding the Brain.

- 04/2010 University of British Columbia, Canada, Physics
Colloquium.

- 03/2010 University of Victoria, Canada, Physics
Colloquium.

- 01/2010 University of California at Berkeley, Redwood
Center for Theoretical Neuroscience.

- 12/2009 Universitaet Koeln, Germany, Physics Department.

- 11/2009 International Center of Theoretical Physics (ICTP), Trieste, Italy.
- 04/2009 University of California at Davis, Computational
Science and Engineering Center, Davis, CA.

- 10/2008 Max Planck Institute for Biological Cybernetics,
Machine Learning Seminar, Tuebingen, Germany.

- 09/2007 University of Montreal, Montreal, Canada.
Department of Computer Science.

- 09/2007 McGill University, Montreal, Canada.
McGill-UdeM-MITACS Machine Learning Seminar.

- 03/2007 University of California at Davis, Computational
Science and Engineering Center, Davis, CA.

- 01/2007 TU Munich, Institute of Computer Science,
Munich, Germany.

- 01/2007 ETH Zuerich, Institute for Neuroinformatics,
Zuerich, Switzerland.

- 01/2007 IDSIA, Institute for Artificial Intelligence
(Istituto Dalle Molle di Studi sull'Intelligenza
Artificiale), Lugano, Switzerland.

- 01/2007 ETH Zuerich, Institute of Computer Sciences, Zuerich, Switzerland.
- 01/2007 University of Hawai'i at Manoa, Physics
Colloquium.

- 07/2006 Max Planck Institute for Biological Cybernetics,
T\"ubingen, Germany.

- 06/2006 McGill University, Montreal, Canada. Department of Computer Science.
- 04/2005 University of Hawai'i at Manoa, Honolulu, HI,
Mathematics Colloquium.

- 09/2005 University College Dublin, Dublin, Ireland.

- 04/2005 University of Hawai'i, Hilo, Hilo, HI,
Department of Computer Science.

- 04/2005 University of Hawai'i, Manoa, Honolulu, HI,
Department of Electrical Engineering.

- 03/2003 University of British Columbia, Vancouver,
Canada, Department of Physics.

- 08/2003 Humboldt University, Berlin, Germany,
Theoretical Biology Seminar.

- 08/2003 Hamilton Institute, National University of Ireland, Maynooth, Ireland. Machine Learning and Cognitive Neuroscience Seminar.
- 08/2003 University of Hawai'i, Honolulu, HI. Department of Electrical Engineering.
- 07/2003 Max Planck Institute for Biological Cybernetics,
Tuebingen, Germany, Machine Learning Seminar.

- 07/2003 ETH Zuerich, Switzerland, Institute for Neuroinformatics.
- 04/2003 Columbia University, New York, NY, Applied Mathematics Seminar.

We generalized the Information Bottleneck framework to quantum information processing [3]. With Renato Renner (ETH Zurich), and members of his group, we are now working towards extending the approach I proposed in [1] to quantum systems.

Students interested in this research should email me for
possible projects. This would be on a Master's or PhD thesis
level.

- [12] A. L. Grimsmo and S. Still (2016) Quantum Predictive Filtering Phys. Rev. A 94, 012338

(2018-2020) "Thermodynamics
of Agency"(partly); Foundational Questions
Institute with the Fetzer Franklin Fund.

(08/24-10/31, 2019) Pauli Center for theoretical
studies, ETH Zuerich, Switzerland.

Observers, biological and man made alike, do not gain
anything by choosing strategies to acquire and represent
data which would not allow them to operate, in principle, as
close to the physical limits as possible. This does not mean
that they always will operate optimally--in certain
situations that might be either not possible or a
disadvantage (for example, to make sure a process runs in
one direction, excess dissipation might be necessary). It
just means that observers should choose the structure of
their strategies, not the execution, to allow for achieving
the limits. Then they have the freedom to invest other
resources to achieve the limits whenever necessary (e.g.
invest time to achieve energy efficiency).

Hence, we may postulate the following principle: Observers
chose the general rules they use to acquire and process
information in such a way that the fundamental physical
limits to information processing can be reached as closely
as possible.

(Again, remember that "can be reached" and "will be
achieved" are two different statements, and the second one
obviously would not yield a reasonable postulate, as counter
examples exist in nature).

Applied to energy efficiency, we know that the "rule" that
emerges is to write down predictive information and leave
out irrelevant information. What rule(s) might emerge from
speed limits? What are the speed limits for mesoscopic and
macroscopic observers? What rules emerge from limits on
robustness? What are those limits, and how should we even
quantity robustness of information processing? Is it
possible that the emerging set of rules might serve as
axioms for an operational approach to quantum mechanics?

(2018-2019) "Thermodynamics of Agency"(partly); Foundational Questions Institute with the Fetzer Franklin Fund.

Textbook portfolio optimization methods used in quantitative finance produce solutions that are not stable under sample fluctuations when used in practice. This effect was discovered by a team of physicists, lead by Imre Kondor, and characterized using methods from statistical physics. The instability poses a fundamental problem, because solutions that are not stable under sample fluctuations may look optimal for a given sample, but are, in effect, very far from optimal with respect to the average risk. In the bigger picture, instabilities of this type show up in many places in finance, in the economy at large, and also in other complex systems. Understanding systemic risk has become a priority since the recent financial crisis, partly because this understanding could help to determine the right regulation.

The instability was discovered in the regime in which the number of assets is large and comparable to the number of data points, as is typically the case in large institutions, such as banks and insurance companies. I realized that the instability is related to over-fitting, and pointed out that portfolio optimization needs to be regularized to fix the problem. The main insight is that large portfolios are selected by minimization of an emperical risk measure, in a regime in which there is not enough data to guarantee small actual risk, i.e. there is not enough data to ensure that empirical averages converge to expectation values. This is the case because the practical situation for selecting large institutional portfolios dictates that the amount of historical data is more or less comparable to the number of assets. The problem can be addressed by known regularization methods. Interestingly, when one uses the fashionable "expected shortfall" risk measure, then the regularized portfolio problem results in an algorithm that is closely related to support vector regression. Support vector algorithms have met with considerable success in machine learning and it is highly desirable to be able to exploit them also for portfolio selection. We gave a detailed derivation of the algorithm [18], which slightly differs from a previously known SVM algorithm due to the nature of the portfolio selection problem. We also show that the proposed regularization corresponds to a diversification ''pressure". This then means that diversification, besides counteracting downward fluctuations in some assets by upward fluctuations in others, is also crucial for improving the stability of the solution. The approach we provide here allows for the simultaneous treatment of optimization and diversification in one framework which allows the investor to trade-off between the two, depending on the size of the available data set.

In two follow-up papers [14, 16] we have characterized the typical behavior of the optimal liquidation strategies, in the limit of large portfolio sizes, by means of a replica calculation, showing how regularization can remove the instability. We furthermore showed how regularization naturally emerges when market impact of portfolio liquidation is taken into account. The idea is that an investor should care about the risk of the cashflow that could be generated by the portfolio if it was liquidated. But the liquidation of large positions will influence prices, and that has to be taken into account when computing the risk of the cash that could be generated from the portfolio. We showed which market impact functions correspond to different regularizers, and systematically analyzed their effects on performance [14]. Importantly, we found that the instability is cured (meaning that the divergence goes away) for all Lp norms with p > 1. However, for the fashionable L1 norm, things are more complicated. There is a way of implementing it that does cure the instability, but the most naive implementation may not - it may only shift the divergence.

Robert
Wright of HIGP has 18 years of thermal emission data
from 110 volcanoes around the Earth, recorded twice daily.
We are analyzing this data with a variety of machine
learning techniques, with the ultimate goal of predicting
thermal output trends and classifying volcanoes by their
activity profiles.

This project is perfect for a student in CS, or HIGP, or
Physics, and can be done as 499 or 699.

This project is perfect for a CS student, and could be done as 499 or 699.

- [13] E. Stopnitzky and S. Still (2019) Non-equilibrium odds for the emergence of life Phys. Rev. E 99 052101.
- [14] F. Caccioli, I. Kondor, M. Marsili and
S.Still (2016)
Liquidity
Risk And Instabilities In Portfolio Optimization.
*Int. J. Theor. Appl. Finan.*19, 1650035.

- [15] L. J. Miller, R. Gazan and S. Still (2014) Unsupervised Document Classification and Visualization of Unstructured Text for the Support of Interdisciplinary Collaboration. Proc. 17th ACM Conf. Computer Supported Cooperative Work and Social Computing (CSCW-2014).
- [16] F. Caccioli, I. Kondor,
M. Marsili and S. Still (2013) Optimal
liquidation strategies regularize portfolio selection.
*The European Journal of Finance, 19 (6)*, 554-571 arxiv:1004.4169 - [17] Hamilton CW, C Beggan, S Still, M Beuthe, R Lopes,
D Williams, J Radebaugh, and W Wright (2013) Spatial
distribution of volcanoes on Io: implications for tidal
heating and magma ascent. Earth and Planetary Sciences
Letters 361, pp. 272–286 (pdf)
(Paper was reported on by several news agencies, including
NBC
and LA
Times)

- [18] S. Still and I. Kondor: Regularizing
Portfolio Optimization (2010)
*New Journal of Physics*12 075034

- 11/2016 - Santa Fe Institute, NM.

- 11/2011 - Applied Math Seminar, City College New York, NY.

* This material is presented to ensure timely
dissemination of scholarly and technical work. Copyright
and all rights therein are retained by authors or by other
copyright holders. All person copying this information are
expected to adhere to the terms and constraints invoked by
each author's copyright. In most cases, these works may
not be reposted without the explicit permission of the
copyright holder.*