Homepage Publications Research Group  || Teaching and Service ||  

Selected research interests of Susanne Still

The purpose of this text is to give interested readers a synnopsis of some of my work. Obviously, there is a large body of relevant work done by others in these areas, but since this text is not intended as a tutorial, I am limiting references to the work of others to only the most essential. Please find further references in my papers.

Physics of Computation, and Optimal Information Processing
Selected Machine Learning Applications

Physics of Computation, and Optimal Information Processing


This work is driven by my interest in efficient computation, and my desire to look for simple physical principles that could provide theoretical guidance to optimization principles for learning and information processing. There is an enormous number of algorithms in machine learning and statistical data analysis, many of which can be cast as optimization problems. One may ask not only for common guiding principles, but also whether there is any physical reality behind any particular optimization. The ultimate goal is, of course, to understand principles underlying the remarkable complexity of biological machinery involved in learning and intelligent behavior. Biological learning systems have in common that they use energy to process information, energy which they ultimately have to obtain from their environment. To function and survive, these systems have to be good at extracting useful information from the environment, without wasting too much energy. A fascinating question is where, when and how biology implements simple optimization principles, and what those may be. Two favorite candidates, which have been discussed at great length in the literature, are:
  • energy efficiency.  
  • efficient predictive inference: the ability to produce a model of environmental variables that has predictive power at smallest possible model complexity.
But are these two not related? What is the energetic cost of computation and inference? Landauer argued that a lower bound is set by the Second Law of thermodynamics: to erase one bit of information, heat in the amount of at least kT ln(2) has to be dissipated. While Landauer's bound is based on an analysis of thermodynamic equilibrium states, living systems typically operate (arbitrarily far) away from equilibrium. Neither living systems nor man-made computing hardware has the luxury of running processes that take an arbitrary amount of time, on the contrary, speed is crucial. A commonly shared intuition for what a good model should do is that it should be able to predict, and it should do so without being overly complicated. This intuition can be put into information theoretic terms, resulting in inference and compression methods based on Shannon’s rate-distortion theory, such as the Information Bottleneck method (Tishby, Pereira and Bialek, 1999). The Information Bottleneck method filters relevant bits from data, discarding irrelevant information. It is a lossy compression scheme that has found many applications, in particular in machine learning. In earlier work, we studied the effect of finite sampling errors, and derived an information criterion that allows complexity control to avoid over-fitting [9].

Thermodynamics of prediction

We study the energetics of learning machines using far-from-equilibrium thermodynamics, an area that has gained increasing traction since Jarzynski's work relation was published in 1997. We addressed the thermodynamics of prediction [5] for systems driven arbitrarily far from thermodynamic equilibrium by a stochastic environment. These systems, by means of executing their dynamics, produce (implicit) models of the signal that drives them. The core contribution of this paper is to show that there is an intimate relation between thermodynamic inefficiency, measured by dissipation, and model inefficiency, measured by the difference between instantaneous memory and instantaneous predictive power. As a corollary, Landauer's principle is extended to systems that operate arbitrarily far away from thermodynamic equilibrium, and refined. Our work gives an indication that the dynamics of far from equilibrium systems with finite memory must be predictive in order to achieve optimal efficiency.

Thermodynamics of inference and optimal data representation

More recent work focusses on generalizations of the above treatment. Common wisdom places information theory outside of physics, highlighting the broad applicability of a discipline rooted in probability theory. The connection to statistical mechanics is, however, tight, as was emphasized, for example, by E. T. Jaynes. I've shown how both Shannon’s rate-distortion theory and Shannon’s channel capacity can be directly motivated from thermodynamic arguments [3]: The maximum work potential a data representation can have is related to Shannon’s channel capacity. The least effort required to make a data representation (consisting of measurement and memory) is governed by the information captured about the data by the representation. Inference, or compression, methods that extract relevant information are thus not outside of physics, but have, instead, a very tangible physical justification: minimizing the least physical effort necessary for representing given data subject to some fixed fidelity (or utility), produces an encoding that is optimal in the sense of Shannon’s rate-distortion theory [3]. In the most general setup, the thermodynamic efficiency of an information engine is, loosely speaking, limited by how much irrelevant information is retained by the data representation [11]. The Information Bottleneck method then arises naturally from physical arguments by choosing a data representation that minimizes expended work while maximizing work potential [3, 11]. Thermodynamically efficient predictive information processing is a special case of this general treatment, allowing for the generalization from instantaneous non-predictive information (as in [5]) to information about longer time sequences (as discussed e.g. in [4, 7]).

It is not inconceivable that von Neumann, Wiener and Shannon had these ideas in the back of their minds when they developed measures of information. However, the analysis we use here hinges upon the notion of nonequilibrium (or generalized) free energy, which emerged much later, and which is becoming a common tool in the study of systems operating far from thermodynamic equilibrium (such as living systems). Since inference is an activity of the human mind, which is obviously not in thermodynamic equilibrium, it comes as no great surprise that the concept of generalized free energy is helping understand the thermodynamics of inference and communication.

Optimal data representation and the Information Bottleneck framework

The Information Bottleneck framework provides then provides not only a constructive method for predictive inference from which learning algorithms can be derived, but also a general information theoretic framework for data processing that is well grounded in physics, as I have argued [4]. The framework can be generalized to dynamical learning yielding a recursive algorithm [4], and further to interactive learning [8] (see also next paragraph). The generalized Information Bottleneck framework then provides not only a way to better understand known models of dynamical systems [4, 7], but also a way to learn them from data [4, 7], and to extend them to the situation with feedback [8].

Recently, we generalized the Information Bottleneck framework to quantum information processing [1]. This work enables a quantitative assessment of the advantages of using a quantum memory over a classical memory. All systems ultimately have to obey quantum mechanics. With the advent of quantum computers, and with mounting evidence for the importance of quantum effects in certain biological systems, understanding efficient use of quantum information has become increasingly important. In this context, I joined a collaboration studying light harvesting complexes. We found indications for possible adaptation mechanisms in a model of nonphotochemical quenching [2], and are now trying to understand how the thermodynamics of information processing come to bear on the subject. This project is relevant with regards to bio-fuel production. I believe that the future of humanity hinges upon efficient use of regenerative energy sources.

Predictive inference in the presence of feedback from the learner

Living systems learn by interacting with their environment, in a sense they "ask questions and do experiments", not only by actively filtering the data but also by perturbing, and, to some degree, controlling the environment that they are learning about. Ultimately, one would like to understand the emergence of complex behaviors from simple first principles. To ask about simple characteristics of policies which would allow an agent to optimally capture predictive information, I extended the Information Bottleneck approach to the situation with feedback from the learner, and showed that optimal encoding in the presence of feedback requires action strategies to balance exploration with control [8]. Both aspects, exploration and control, emerge in this treatment as necessary ingredients for behaviors with maximal predictive power. This study resulted in a novel algorithm for computing optimal models and policies from data, which my student Lisa Miller has applied to selected problems in robotics (work in progress). In the context of reinforcement learning this approach allowed us to study [6] how exploration emerges as an optimal strategy, driven by the need to gather information, rather than being put in by hand as action policy randomization.

Strongly coupled systems - marginalized and conditioned Second Law

Ultimately, information engines and feedback learners are a subset of the class of strongly coupled systems. Much recent work has been devoted to understanding the thermodynamics of strongly coupled, interacting systems. The Second Law of thermodynamics was originally stated by Clausius as “The entropy of the universe tends to a maximum.” In practice, measuring the entropy of the entire universe is difficult. Alternatively, the Second Law can be applied to any system isolated from outside interactions (a universe unto itself). Of course, perfectly isolating any system or collection of systems from outside influences is also difficult. Over the last 150 years, thermodynamics has progressed by adopting various idealizations which allow us to the isolate and measure that part of the total universal entropy change that is relevant to the behavior of the system at hand. These idealizations include heat reservoirs, work sources, measurement devices, and information engines. We showed that we do not need, in principle, to resort to the usual idealizations [10]: Conditional and marginalized versions of the Second Law hold locally, even when the system of interest is strongly coupled to other driven, non-equilibrium systems.

Other ongoing and future projects

We have studied the statistical physics of generalized Szillard-type information engines with Matteo Marsili, ICTP, and a team of students from this year's "Spring College in the Physics of Complex Systems", where I gave a lecture series (we are currently writing the manuscript). With my student Elan Stopnitzky we have applied Gavin Crooks' notion of non-equilibrium maximum entropy hyperensembles to questions of abundancy of building blocks for life in the context of the origin of life (we are currently writing the manuscript).


Recent (2011-2016):

Publications prior to 2011:

Manuscripts under revision (working papers will be shared upon request):

  • [10] G. E. Crooks and S. Still. Marginal and Conditional Second Law of Thermodynamics for Strongly Coupled Systems.
  • [11] S. Still, Thermodynamics of inference and optimal information processing.


"Foundations of information processing in living systems";  Foundational Questions Institute: "Physics of Information" Program, Large Grant.

Recent talks (since 2011):

Invited Conference Talks

  1. 11/18/2016 (planned) Statistical Physics, Information Processing and Biology, Santa Fe Institute, Santa Fe, NM
  2. 09/25/2016 Information, Control, and Learning--The Ingredients of Intelligent Behavior, Hebrew University, Jerusalem, Israel (remote talk).
  3. 08/20/2016 Foundational Questions Institute, 5th International Conference, Banff, Canada.
  4. 04/25/2016 Spring College in the Physics of Complex Systems International Center for Theoretical Physics (ICTP), Trieste, Italy.
  5. 7/14-17/2015 Conference on Sensing, Information and Decision at the Cellular Level ICTP
  6. 5/4-6/2015 Workshop "Nature as Computation". Beyond Center for Fundamental Concepts in Science.
  7. 4/8-10/2015 Workshop on Entropy and Information in Biological Systems National Institute for Mathematical and Biological Synthesis (NIMBioS).
  8. 10/26-31/2014  Biological and Bio-Inspired Information Theory Banff, Canada.
  9. 7/5-8/2014 Seventh Workshop on Information Theoretic Methods in Science and Engineering
  10. 5/8-10/2014 Statistical Mechanics Foundations of Complexity–Where do we stand? Santa Fe Institute.
  11. 1/14-16/2014 The Foundational Questions Institute Fourth International Conference, Vieques Island, PR.
  12. 6/26-28/2013 Modeling Neural Activity (MONA) Kauai, HI.
  13. 01/2011 Workshop on measures of complexity Santa Fe Institute, Santa Fe, NM
  14. 01/2011 - Berkeley Mini Stat. Mech. Meeting.

Invited Talks

  1. 11/2016 (planned) Condensed Matter Seminar, UC Santa Cruz.
  2. 08/2016 - Biophysics Seminar, Simon Frazer University, Vancouver, Canada.
  3. 06/2013 - Max Planck Institute for Dynamics and Self-organization, Göttingen, Germany.
  4. 04/2013 - Scuola Internazionale Superiore di Studi Avanzati (SISSA) Trieste, Italy.
  5. 03/2013 - Physics Department, The University of Auckland, Auckland, NZ.
  6. 03/2013 - Physics Department, The University of the South Pacific, Suva, Fiji.
  7. 11/2012 - Center for Mind, Brain and Computation Stanford University.
  8. 09/2012 - Physics Colloquium University of Hawaii at Manoa.
  9. 10/2011 - Redwood Center for Neuroscience, University of California at Berkeley.
  10. 08/2011 - Institute for Neuroinformatics, ETH/UNI Zürich, Switzerland.
  11. 11/2011 - Symposium in honor of W. Bialek’s 50th Birthday, Princeton University, Princeton, NJ.
  12. 11/2011 - Applied Math Seminar, City College New York, NY.

Contributed Talk:

  1. 03/18/2013 - APS March meeting; Session: Fluctuations in Non-Equilibrium Systems; Chair: Chris Jarzynski.


Selected Machine Learning Applications

With students and collaborators, my lab applies machine learing methods to data analysis. I try to keep the focus on applications that are of scientific relevance and/or have some potential positive impact on society.

Regularized Portfolio Optimization

Since 2008, the world has been reminded of the importance of having a stable global financial system. It is usually the poor that suffer most from crashes, and therefore, preventing instability becomes a moral imperative. As scientists, we have little or no control over most relevant factors, such as political decision making. But, it does fall into my area of expertise to work on improving the mathematical tools used in the finance sector.

Textbook portfolio optimization methods used in quantitative finance produce solutions that are not stable under sample fluctuations when used in practice. This effect was discovered by a team of physicists, lead by Imre Kondor, and characterized using methods from statistical physics. The instability poses a fundamental problem, because solutions that are not stable under sample fluctuations may look optimal for a given sample, but are, in effect, very far from optimal with respect to the average risk. In the bigger picture, instabilities of this type show up in many places in finance, in the economy at large, and also in other complex systems. Understanding systemic risk has become a priority since the recent financial crisis, partly because this understanding could help to determine the right regulation.

The instability was discovered in the regime in which the number of assets is large and comparable to the number of data points, as is typically the case in large institutions, such as banks and insurance companies. I realized that the instability is related to over-fitting, and pointed out that portfolio optimization needs to be regularized to fix the problem. The main insight is that large portfolios are selected by minimization of an emperical risk measure, in a regime in which there is not enough data to guarantee small actual risk, i.e. there is not enough data to ensure that empirical averages converge to expectation values. This is the case because the practical situation for selecting  large institutional portfolios dictates that the amount of historical data is more or less comparable to the number of assets. The problem can be addressed by known regularization methods. Interestingly, when one uses the fashionable "expected shortfall" risk measure, then the regularized portfolio problem results in an algorithm that is closely related to support vector regression. Support vector algorithms have met with considerable success in machine learning and it is highly desirable to be able to exploit them also for portfolio selection. We gave a detailed derivation of the algorithm [16], which slightly differs from a previously known SVM algorithm due to the nature of the portfolio selection problem. We also show that the proposed regularization corresponds to a diversification ''pressure". This then means that diversification, besides counteracting downward fluctuations in some assets by upward fluctuations in others, is also crucial for improving the stability of the solution. The approach we provide here allows for the simultaneous treatment of optimization and diversification in one framework which allows the investor to trade-off between the two, depending on the size of the available data set.

In two follow-up papers [12, 14] we have characterized the typical behavior of the optimal liquidation strategies, in the limit of large portfolio sizes, by means of a replica calculation, showing how regularization can remove the instability. We furthermore showed how regularization naturally emerges when market impact of portfolio liquidation is taken into account. The idea is that an investor should care about the risk of the cashflow that could be generated by the portfolio if it was liquidated. But the liquidation of large positions will influence prices, and that has to be taken into account when computing the risk of the cash that could be generated from the portfolio. We showed which market impact functions correspond to different regularizers, and systematically analyzed their effects on performance [12]. Importantly, we found that the instability is cured (meaning that the divergence goes away) for all Lp norms with p > 1. However, for the fashionable L1 norm, things are more complicated. There is a way of implementing it that does cure the instability, but the most naive implementation may not - it may only shift the divergence.

Geospatial analysis

Geological surface processes are relevant in the context of understanding physical mechanisms underlying volcanism, and the temporal evolution of planets and moons in the Solar System. In a team including former students C. Hamilton (lead) and W. Wright, we analyzed the spatial distribution of volcanic craters on Io, a moon of Jupiter, believed to be more vulcanologically active than any other object in the Solar System. The extreme volcanism on Io results from tidal heating, but its tidal dissipation mechanisms and magma ascent processes are poorly constrained. Our results may help narrow down the possible mechanisms underlying Io's volcanism by putting constraints on physical models [15].

Social sciences/Document Classification

Large scale multi-disciplinary research efforts often face the problem that synergy might be impeded by lack of knowing which research from other disciplines relates (sometimes in unexpected ways) to, or may inspire ones own research. Document classification can be applied here to build visualization interfaces as library science tools to aid multi-disciplinary collaborations. This is of relevance, as it may help to increase communication between groups, increase synergy, reduce redundancy, and perhaps even occasionally spur scientific creativity. In the ``old days", each trip to the library could turn into an adventure as one walked down the aisles and got distracted from the main purpose of the trip by some interesting looking titles. Before one knew it, one was reading something unexpected, and had found a new idea, a new approach. The taste of these adventures has changed in the digital age. To a certain extent, they have been replaced by online browsing, but the sheer volume of information is often a limiting factor, and there may well be use for tools that help organize relevant information in an intuitive way. As part of a NASA funded study, we developed such a tool in the context of astrobiology, an area that spans many fields, from chemistry to biology, to astronomy, with my student L. Miller and my colleague R. Gazan [13]. Along the way, we tested a new algorithm for document pre-processing I proposed. Pre-processing is a crucial step in the document classification process.


Prior to 2011:

  • [16] S. Still and I. Kondor: Regularizing Portfolio Optimization (2010) New Journal of Physics 12 075034  (Special Issue on Statistical Physics Modeling in Economics and Finance


This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.