Homepage Research Group Publications || Teaching ||  

Selected research interests of Susanne Still

S. Still, D. A. Sivak, A. J. Bell, and G. E. Crooks: The Thermodynamics of Prediction Phys. Rev. Lett. 109, 120604 (2012)


This work is driven by my interest in efficient computation, and my desire to look for simple principles underlying the remarkable complexity of biological machinery. The systems I am interested in all have in common that they process information and that they utilize energy from their environment. To do this well, these systems have to be good at extracting useful information from the environment, without wasting energy. I am fascinated by whether, where and when biology implements simple optimization principles. Two favorite candidates, that have been discussed at length in the literature, are:
  • energetic, or thermodynamic efficiency--studying this is complicated by the fact that biology operates far from thermodynamic equilibrium.
  • predictive inference: the ability to produce a model of environmental variables which has predictive power at limited model complexity.
This paper studies systems which are embedded into a stochastic environment, and which are driven far from thermodynamic equilibrium by factors in their environment. These systems, by means of executing their dynamics, produce (implicit) models of the stochastic signal that drives them. The core contribution of this paper is to show that there is a fundamental equivalence between thermodynamic inefficiency, measured by dissipation, and model inefficiency, measured by the difference between instantaneous memory and instantaneous predictive power.
This has the following consequences:
  • to achieve thermodynamic efficiency, a system has to implement predictive inference;
  • the philosohy underlying predictive inference may have a very simple physical underpinning, since it allows for energetic efficiency;
  • Landauer's principle can be extended to systems arbitrarily far away from thermodynamic equilibrium, and it can be refined: to achieve the Landauer limit, a system has to be predictive.

Press: http://www.nature.com/news/proteins-remember-the-past-to-predict-the-future-1.11544


  • 03/2013 - APS March meeting Fluctuations in Non-Equilibrium Systems, Chair: Chris Jarzynski.
  • 11/2012 - Center for Mind, Brain and Computation Stanford University.
  • 09/2012 - Physics Colloquium University of Hawaii at Manoa.
  • 10/2011 - Seminar University of California at Berkeley, Redwood Center for Neuroscience.
  • 01/2011 - Invited Talk Berkeley Mini Stat. Mech. Meeting.

S. Still: Information theoretic approach to interactive learning EPL 85 (2009) 28005.


This paper addresses the issue of efficient information processing in the presence of feedback. Let me specify what I mean by feedback in this context: An interactive learner is a learning entity (this may be a living agent or an AI system) which actively probes and changes the process/environment that it is learning about. This interactive learning scenario is somewhat different from the problem of active learning, studied in machine learning. The difference is due to the fact the here the learner's actions change the process which it is being learned. This goes beyond optimal sampling.

The central questions addressed in this paper are: Which types of behavior arise as a result of efficient information processing, in the presence of this feedback? What characteristics do information-theoretically optimal models (codes) have in this interactive scenario? What are optimal action strategies for learning in the presence of feedback?

The original motivation, and also the overarching goal of this research, is to try to understand the emergence of complex behavior from simple first principles.  As a fundamental and simple first step, I was curious to see if one can derive characteristics that optimal behaviors must have from some very simple and general considderations of efficient information processing. I find that if one wants to construct optimally predictive models (codes) in the presence of feedback, then the action policies that best allow for this to happen need to balance control and exploration. Both aspects, exploration and control emerge as necessary ingredients to any optimal data aquisition policy.

This derivation also results in an algorithm for computing these optimal models and policies from the data. My student Lisa Miller is currently applying this algorithm in robotics. We collaborate with J. P. Crutchfield's group at UC Davis and F. Sommer's group at Berkeley.

In a follow up paper ("An information-theoretic approach to curiosity-driven reinforcement learning", with D. Precup, under review) we have connected the above theory to reinforcement learning by including explicit behavioral goals. The resulting optimal exploration strategy, interestingly does not "soften" up the "greedy optimal" policy by randomization, as does, for example, Boltzmann exploration. Instead, it contains an explorative component even in the absence of randomness -- when the agent/robot takes specific actions in response to a specific stimulus with probability one. Conceptually, this results in an interesting new point of view: exploration emerges as an optimal strategy in our framework, whereas before it has often simply been introduced "by hand" via policy randomization. This could potentially be and interesting approach for understanding the exploration behavior of intelligent animals.


  • 09/2010 - IPAB Seminar University of Edinburgh, Institute for Perception, Action and Behaviour, School of Informatics.
  • 04/2010 - Physics Colloquium (covering in part this work), University of British Columbia, Canada.
  • 03/2010 - Invited Conference Presentation American Physical Society March Meeting, Focus Session on Physics of Behavior, Portland, Oregon.
  • 01/2010 - Seminar University of California at Berkeley, Redwood Center for Neuroscience.
  • 11/2009 - Seminar International Center of Theoretical Physics (ICTP), Trieste, Italy.
  • 08/2009 - Keynote Lecture. The Second International Conference on Guided Self-Organization (GSO), Leipzig, Germany.
  • 04/2009 - Invited Talk University of California at Davis, Computational Science and Engineering Center.
  • 08/2008 - Invited Lecture Series (covering in part this work), Santa Fe Institute Complex Systems Summer School, Institute of Theoretical Physics, Chinese Academy of Sciences (CAS), Beijing, China.
  • 01/2007 - Physics Colloquium, University of Hawai'i at Manoa.
  • 01/2007 - Invited Talk ETH/Uni Zuerich, Institute for Neuroinformatics, Zuerich, Switzerland.
  • 01/2007 - Invited Talk TU Munich (covering in part this work), Institute of Computer Science, Munich, Germany.
  • 01/2007 - Invited Talk IDSIA, (Institute for Artificial Intelligence/Istituto Dalle Molle di Studi sull'Intelligenza Artificiale), Lugano, Switzerland.
  • 01/2007 - Invited Talk ETH Zuerich, Institute of Computer Sciences, Switzerland.
  • 07/2006 - Invited Talk Max Planck Institute for Biological Cybernetics, Tuebingen, Germany.
  • 04/2006 - Invited Workshop Talk Bellairs Reinforcement Learning Workshop, Barbados.
  • 12/2005 - Invited Workshop Talk Workshop on ``Models of Behavioral Learning''. Neural Information Processing Systems (NIPS).
  • 04/2005 - Math Colloquium, University of Hawai’i at Manoa.
A preliminary version of this work (with W. Bialek) dates from 2005. The first archived version is at http://arxiv.org/abs/0709.1948
A follow-up paper (with D. Precup) is going to appear in Theory in Biosciences, Special Issue on “Guided Self Organization”: An information-theoretic approach to curiosity-driven reinforcement learning

S. Still and I. Kondor: Regularizing Portfolio Optimization (2010) New Journal of Physics 12 075034  (Special Issue on Statistical Physics Modeling in Economics and Finance)


The paper discusses an imporatant issue in modern economics: the optimization of large portfolios displays an inherent instability to estimation error. This poses a fundamental problem, because solutions that are not stable under sample fluctuations may look optimal for a given sample, but are, in effect, very far from optimal with respect to the average risk. In the bigger picture, instabilities of this type show up in many places in finance, in the economy at large, and also in other complex systems. They are related to systemic risk. Understanding systemic risk has become a priority since the last financial crash, partly because this understanding could help to determine the right regulation.

We provide a solution to the problem by arguing that portfolio optimization must be regularized. We explain the occurance of the instability by noting that large portfolios are selected by minimization of an emperical risk measure, in a regime in which there is not enough data to guarantee small actual risk, i.e. there is not enough data to ensure that empirical averages converge to expectation values. The practical situation for selecting a large, say an institutional, portfolio dictates that the ammount of historical data is more or less comparable to the number of assets. This problem can be addressed by known regularization methods. Interestingly, when one uses the fashionable "expected shortfall" risk measure, then the regularized portfolio problem results in an algorithm that is closely related to support vector regression. Support vector algorithms have met with considerable success in machine learning and it is highly desirable to be able to exploit them also for portfolio selection. The paper gives a detailled derivation of the algorithm, which slightly differs from previously know SVM algorithm due to the nature of the portfolio selection problem. We also show that the proposed regularization corresponds to a diversification ''pressure". This then means that diversification, besides counteracting downward fluctuations in some assets by upward fluctuations in others, is also crucial for improving the stability of the solution. The approach we provide here allows for the simultaneous treatment of optimization and diversification in one framework which allows the investor to trade-off between the two, depending on the size of the available data set.

This work has been presented at the Third Financial Risks International Forum, Paris, March 25/26, 2010.

In a follow-up paper,

F. Caccioli, S. Still, M. Marsili, and I. Kondor Optimal Liquidation Strategies Regularize Portfolio Selection. http://xxx.lanl.gov/abs/1004.4169

we have characterized the typical behavior of the optimal liquidation strategies, in the limit of large portfolio sizes, by means of a replica calculation. By doing so, we showed how regularization removes the instability of ES.  We also show how regularization naturally emerges when market impact of portfolio liquidation is taken into account. The idea is that an investor should care about the risk of the cashflow that could be generated by the portfolio if it was liquidated. But the liquidation of large positions will influence prices, and that has to be taken into account when computing the risk of the cash that could be generated from the portfolio.

We are currently investigating (i) the effect of different regularizors in the context of portfolio selection, (ii) the effect of regularization on market dynamics, in particular market stability, and (iii) testing on data how much better RPO is compared to standard methods.


  • 04/2011 - Seminar, Santa Fe Institute, Santa Fe, NM.
  • 11/2010 - CUNY Applied Math Seminar, City College New York.

S. Still and J. P. Crutchfield. Structure or Noise? (under review)

S. Still, J. P. Crutchfield and C. J. Ellison. (2010) CHAOS in press Optimally Predictive Causal Inference.


These papers deal with automated model-making of dynamical systems from available data. The main observation here is that an approach to lossy data compression, called "Information Bottleneck method (IB)" (which is a form of rate-distortion theory), when applied to time series data analysis in a certain way, contains in a limit an approach to modeling of dynamical systems that is known as "computational mechanics", which has been applied widely to a many problems in physics and chemistry. The information theoretic framework we discuss here allows us to generalize the computational mechanics approach to include probabilistic models.

The original motivation for this work was that the last decade has witnessed a new era in collecting truly vast data sets, the volume of which far exceeds what any human can analyze directly by hand.  Examples include contemporary experiments in particle physics and astronomy, but range to genomics, automated language translation, and web social organization. In all these areas automated methods are crucial. One challenge then is to develop methods that are not making too many ad hoc assumptions (ideally none), in particular about the (unknown) causal and statistical structure of the observed process. We try to step back and ask if there are some simple principles of information processing and causal modeling that can be used to derive a data analysis method which makes minimal assumptions.

A commonly shared intuition for what a good model should do is that it should be able to predict, and it should do so without being overly complicated. This intuition can be put into information theoretic terms: a model's predictive power is reflected by the amount of information that the model contains about future data, and the coding rate measures how concise a model is. Rate-distortion theory/IB then allows one to specify the family of (probabilistic) models which all have the maximal predictive power that can be achieved at a fixed coding rate, or, vice versa, which are maximally concise at fixed predictive power. It also results in an iterative algorithm with which to compute these representations. They contain as a parameter the trade-off between predictive power and model complexity. By specifying this parameter, one can control the allowed level of detail, or, vice versa, the desired predictive power of the model. In the limit in which this parameter tends to zero, which corresponds to relaxing the constraint on the coding rate, the procedure arrives at a representation of the data that contains as much predictive information as the data itself. In this limit, the resulting model becomes deterministic.

Here we proof that the procedure finds, in this limit, the "causal state partition" of computational mechanics. This is a positive result, because the causal states are known to be unique and minimal sufficient statistics for the prediction of the time series. This result shows that the model which is inferred by the information theoretic inference method has structural meaning. The causal states have been used in the literature, for example, to calculate fundamental invariants of the original system, such as symmetries, entropy rate, and stored information. We show with examples how rate-distortion theory provides us with a principled way of constructing more compact approximations to the causal state partition.

Earlier version of "Structure or Noise?" at http://lanl.arxiv.org/abs/0708.0654 


  • 04/2010 - Physics Colloquium (covering in part this work), University of British Columbia, Canada.
  • 03/2010 - Physics Colloquium University of Victoria, Canada.
  • 07/2009 - Invited Conference Talk Chaos/Xaoc, Conference Center of the National Academy of Sciences, Woods Hole, MA.
  • 08/2008 - Invited Lecture Series (covering in part this work), Santa Fe Institute Complex Systems Summer School, Institute of Theoretical Physics, Chinese Academy of Sciences (CAS), Beijing, China.
  • 10/2008 - Machine Learning Seminar Max Planck Institute for Biological Cybernetics, Tuebingen, Germany.
  • 01/2007 - Invited Talk TU Munich (covering in part this work), Institute of Computer Science, Munich, Germany.

S. Still and W. Bialek: How many clusters? An information theoretic perspective (2004) Neural Computation, 16(12):2483-2506. 


The information bottleneck method (IB) is an instantiation of rate distortion theory that captures only the relevant bits of a data set. What is considered relevant is specified by means of a relevant variable that co-occurs with the data. IB assumes that the underlying joint probability distribution is known. However, in practice one has to estimate this distribution, and doing so from finite data results in errors which result in an overestimate of the available relevant information. This may lead to over-fitting.

In this paper we use perturbation theory to derive a correction that counteracts this effect and thereby also provides a method for finding the maximal number of clusters in a data set of size N, which can be resolved without over-fitting, i.e. which are stable under sample fluctuations.


  • 12/2009 - Invited Talk Universitaet Koeln, Physics Department, Koeln, Germany
  • 06/2006 - Invited Talk McGill University, Montreal, Canada. Department of Computer Science.
  • 10/2005 - Guest Lecture University of Hawai’i, Manoa, Honolulu, HI, Medical Informatics, ICS.
  • 09/2005 - Invited Talk University College Dublin, Dublin, Ireland.
  • 04/2005 - Guest Lecture University of Hawai’i at Hilo, Hilo, HI, Department of Computer Science.
  • 04/2005 - Guest Lecture University of Hawai’i at Manoa, Honolulu, HI, Department of Electrical Engineering.
  • 04/2003 - Applied Mathematics Seminar Columbia University, New York, NY.
  • 03/2003 - Invited Talk University of British Columbia, Department of Physics, Vancouver, Canada.
  • 08/2003 - Theoretical Biology Seminar Humboldt University, Berlin, Germany,
  • 08/2003 - Machine Learning and Cognitive Neuroscience Seminar Hamilton Institute, National University of Ireland, Maynooth, Ireland.
  • 08/2003 - Invited Talk University of Hawai’i, Honolulu, HI. Department of Electrical Engineering.
  • 10/2008 - Machine Learning Seminar Max Planck Institute for Biological Cybernetics, Tuebingen, Germany.
  • 07/2003 - Invited Talk ETH Zuerich, Institute of Computer Sciences, Switzerland.


This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.