Selected research interests of Susanne
Still
Synopsis:
This work is driven by my interest in efficient computation, and my
desire to look for simple principles underlying the remarkable
complexity of biological machinery. The systems I am interested in all
have in common that they process information and that they utilize
energy from their environment. To do this well, these systems have to
be good at extracting useful information from the environment, without
wasting energy. I am fascinated by whether,
where and when biology implements simple optimization
principles. Two favorite candidates, that have been
discussed at length in the literature, are:
- energetic, or thermodynamic efficiency--studying this is
complicated by the fact that biology operates far from thermodynamic
equilibrium.
- predictive inference: the ability to produce a model of
environmental variables which has predictive power at limited model
complexity.
This paper studies systems which are embedded into a stochastic
environment, and which are driven far from thermodynamic equilibrium by
factors in their environment. These systems, by
means of executing their dynamics, produce (implicit) models of the
stochastic signal that drives them. The
core contribution of this paper
is to show that there
is a fundamental equivalence between thermodynamic inefficiency,
measured by dissipation,
and model inefficiency, measured by the difference between
instantaneous memory and instantaneous predictive power.
This has the
following consequences:
- to achieve thermodynamic efficiency, a system has to
implement predictive inference;
- the philosohy underlying predictive inference may have a very
simple physical underpinning, since it allows for energetic efficiency;
- Landauer's principle can be extended to systems arbitrarily far
away from
thermodynamic equilibrium, and it can be refined: to achieve the
Landauer limit,
a system has to be predictive.
Talks:
- 03/2013 - APS March meeting
Fluctuations in Non-Equilibrium Systems, Chair: Chris Jarzynski.
- 11/2012 - Center for Mind,
Brain and Computation Stanford
University.
- 09/2012 - Physics Colloquium University of Hawaii at Manoa.
- 10/2011 - Seminar University of
California at Berkeley, Redwood
Center for Neuroscience.
- 01/2011 - Invited Talk Berkeley Mini Stat. Mech. Meeting.
Synopsis:
This paper addresses the issue of efficient information processing in
the
presence of feedback. Let me specify what I mean by feedback in this
context: An interactive learner is a learning entity (this may be a
living agent or an AI system) which
actively probes and
changes the process/environment that it is learning about. This
interactive learning scenario
is
somewhat
different
from
the
problem
of
active
learning,
studied in machine learning. The difference is due to the fact the here
the learner's actions change the process which it is
being learned. This goes beyond optimal sampling.
The central questions addressed in this paper are: Which types of
behavior arise as a result of efficient information processing, in the
presence of this feedback? What characteristics do
information-theoretically optimal models (codes) have in this
interactive scenario? What are optimal action strategies for learning
in the
presence of feedback?
The original motivation, and also the overarching goal of this
research, is to try to understand the
emergence of complex
behavior from simple first principles. As a fundamental
and simple
first
step, I
was curious to see if one can derive characteristics that optimal
behaviors must have from some very simple and general
considderations of efficient information processing. I find that if
one wants to construct optimally predictive models (codes) in the
presence of
feedback, then the action policies that best allow for this to happen
need to
balance
control and exploration. Both aspects, exploration
and control
emerge
as necessary ingredients to any optimal data aquisition
policy.
This derivation also results in an algorithm for computing these
optimal models and policies from the data. My student Lisa Miller is
currently applying this algorithm in robotics. We collaborate with J.
P. Crutchfield's group at UC Davis and F. Sommer's group at Berkeley.
In a follow up paper (
"An
information-theoretic approach to curiosity-driven reinforcement
learning", with D. Precup, under review) we have connected the
above
theory to reinforcement learning by
including explicit
behavioral goals. The resulting
optimal exploration strategy, interestingly does not "soften" up the
"greedy optimal" policy by randomization, as does, for example,
Boltzmann exploration. Instead, it contains an explorative component
even in the absence of randomness -- when the agent/robot takes
specific actions in response to a specific stimulus with probability
one. Conceptually, this results in an interesting new point of view:
exploration emerges as an optimal strategy in our framework, whereas
before it has often simply been introduced "by hand" via policy
randomization. This could potentially be and interesting approach for
understanding the exploration behavior of intelligent animals.
Talks:
- 09/2010 - IPAB Seminar University of Edinburgh,
Institute for Perception, Action and
Behaviour, School of Informatics.
- 04/2010 - Physics Colloquium
(covering in part this work), University
of
British
Columbia, Canada.
- 03/2010 - Invited Conference
Presentation American
Physical Society
March Meeting, Focus Session on
Physics of Behavior, Portland, Oregon.
- 01/2010 - Seminar University of
California at Berkeley, Redwood
Center for Neuroscience.
- 11/2009 - Seminar International Center of
Theoretical Physics (ICTP), Trieste, Italy.
- 08/2009 - Keynote Lecture.
The
Second International Conference on Guided Self-Organization
(GSO),
Leipzig, Germany.
- 04/2009 - Invited Talk University
of
California at Davis, Computational Science and Engineering
Center.
- 08/2008 - Invited Lecture Series
(covering in part this work), Santa
Fe Institute Complex Systems Summer School, Institute of
Theoretical Physics, Chinese Academy of Sciences (CAS), Beijing, China.
- 01/2007 - Physics Colloquium,
University
of Hawai'i at Manoa.
- 01/2007 - Invited Talk ETH/Uni
Zuerich,
Institute for Neuroinformatics, Zuerich, Switzerland.
- 01/2007 - Invited Talk TU
Munich (covering in part this work), Institute of Computer Science,
Munich, Germany.
- 01/2007 - Invited Talk IDSIA,
(Institute
for Artificial Intelligence/Istituto Dalle Molle di Studi
sull'Intelligenza Artificiale), Lugano,
Switzerland.
- 01/2007 - Invited Talk ETH Zuerich, Institute
of
Computer Sciences, Switzerland.
- 07/2006 - Invited Talk
Max Planck Institute for Biological Cybernetics, Tuebingen,
Germany.
- 04/2006 - Invited Workshop
Talk Bellairs Reinforcement
Learning Workshop, Barbados.
- 12/2005 - Invited Workshop
Talk Workshop on ``Models of
Behavioral Learning''. Neural
Information Processing Systems (NIPS).
- 04/2005 - Math Colloquium,
University of Hawai’i at Manoa.
A preliminary version of this work (with W. Bialek) dates from 2005.
The first
archived version is at
http://arxiv.org/abs/0709.1948
A follow-up paper (with D. Precup) is going to appear in
Theory in Biosciences, Special
Issue on “Guided Self Organization”:
An
information-theoretic approach to
curiosity-driven reinforcement learning
S. Still and I. Kondor: Regularizing
Portfolio
Optimization (2010) New
Journal
of
Physics 12 075034
(Special
Issue on Statistical Physics Modeling in Economics and Finance)
Synopsis:
The paper discusses an imporatant issue in modern economics: the
optimization of large portfolios displays an inherent instability to
estimation error. This poses a fundamental problem, because solutions
that are not stable under sample fluctuations may look optimal for a
given sample, but are, in effect, very far from optimal with respect to
the average risk. In the bigger picture, instabilities of this type
show up in many places in finance, in the economy at large, and also in
other complex systems.
They are related to systemic risk. Understanding systemic risk has
become a priority since the last financial crash, partly because this
understanding could help to determine the right regulation.
We provide a solution to the problem by arguing that portfolio
optimization must be regularized. We explain the occurance of the
instability by noting that large portfolios are selected by
minimization of an emperical risk measure, in a regime in which there
is not
enough data to guarantee small actual risk, i.e. there is not enough
data to ensure that empirical averages converge to expectation values.
The practical situation for selecting a large, say an institutional,
portfolio dictates that the ammount of historical data is more or less
comparable to
the number of assets. This problem can be addressed by known
regularization methods. Interestingly, when one uses the fashionable
"expected shortfall" risk measure, then the regularized portfolio
problem results in an algorithm that is closely related to support
vector regression. Support vector algorithms have met with considerable
success in machine learning and it is highly desirable to be able to
exploit them also for portfolio selection. The paper gives a detailled
derivation of the algorithm, which slightly differs from previously
know SVM algorithm due to the nature of the portfolio selection
problem. We
also show that the proposed regularization corresponds to a
diversification ''pressure". This then means that diversification,
besides counteracting downward fluctuations in some assets by upward
fluctuations in others, is also crucial for improving the stability of
the solution. The approach we provide here allows for the simultaneous
treatment of optimization and diversification in one framework which
allows the investor to trade-off between the two, depending on the size
of the available data set.
This work has been presented at the
Third
Financial
Risks
International
Forum, Paris, March 25/26, 2010.
In a follow-up paper,
F. Caccioli, S. Still, M. Marsili, and I. Kondor Optimal Liquidation Strategies Regularize
Portfolio Selection. http://xxx.lanl.gov/abs/1004.4169
we have characterized the typical behavior of the
optimal liquidation strategies, in the limit of large portfolio sizes,
by means of a replica calculation. By doing so, we showed how
regularization removes the instability of ES. We also show how
regularization naturally emerges when market impact of portfolio
liquidation is taken into account. The idea is that an investor should
care about the risk of the cashflow that could be generated by the
portfolio if it was liquidated. But the liquidation of large positions
will influence prices, and that has to be taken into account when
computing the risk of the cash that could be generated from the
portfolio.
We are currently investigating (i) the effect of different regularizors
in the context of portfolio selection, (ii) the effect of
regularization on market dynamics, in particular market stability, and
(iii) testing on data how much better RPO is compared to standard
methods.
Talk:
- 04/2011 - Seminar, Santa Fe Institute, Santa Fe, NM.
- 11/2010 - CUNY Applied Math
Seminar, City College New
York.
S. Still and J. P. Crutchfield. Structure
or Noise? (under review)
Synopsis:
These papers deal with automated model-making of dynamical systems from
available data. The main observation here is that an approach to lossy
data compression, called "Information Bottleneck method (IB)" (which is
a
form of rate-distortion theory), when applied to time series data
analysis in a certain way, contains in a limit an approach to modeling
of dynamical systems that is known as "computational mechanics", which
has been applied widely to a many problems in physics and chemistry.
The information theoretic framework we discuss
here allows us to generalize the computational mechanics approach to
include probabilistic models.
The original motivation for this work was that the last decade has
witnessed a new era in collecting truly vast data sets, the volume of
which far exceeds what any human can analyze directly by hand.
Examples include contemporary experiments in particle physics and
astronomy, but range to genomics, automated language translation, and
web social organization. In all these areas automated methods are
crucial. One challenge then is to develop methods that are not making
too many ad hoc assumptions (ideally none), in particular about the
(unknown) causal and statistical structure of the observed process. We
try to step back and ask if there are some simple principles of
information processing and causal modeling that can be used to derive a
data analysis method which makes minimal assumptions.
A commonly shared intuition for what a good model should do is
that it
should be able to predict, and it should do so without being overly
complicated. This
intuition can be put into information theoretic terms: a model's
predictive power is
reflected by the amount of information that the model contains about
future data, and the coding rate measures how concise a model is.
Rate-distortion theory/IB then allows one to specify the family of
(probabilistic) models which all have the maximal predictive power that
can be achieved at a fixed coding rate, or, vice versa, which are
maximally concise at fixed predictive power. It also results in an
iterative algorithm with which to compute these representations. They
contain as a parameter the trade-off between predictive power and model
complexity. By specifying this parameter, one can control the allowed
level of detail, or, vice versa, the desired predictive power of the
model. In the limit in which this parameter tends to zero, which
corresponds to relaxing the constraint on the coding rate, the
procedure arrives at a representation of the data that contains as much
predictive information as the data itself. In this limit, the resulting
model becomes deterministic.
Here we proof that the procedure finds, in
this limit, the "causal state partition" of computational mechanics.
This is a positive result, because the causal states are known to be
unique and
minimal sufficient statistics for the prediction of the time series.
This result shows that the model which is inferred by the information
theoretic inference method has structural meaning. The causal states
have been used in the literature, for example, to calculate fundamental
invariants of the original system, such as symmetries, entropy rate,
and stored information. We show with examples how rate-distortion
theory provides us with a
principled way of constructing more compact approximations to the
causal state partition.
Earlier
version
of "Structure or Noise?" at
http://lanl.arxiv.org/abs/0708.0654
Talks:
- 04/2010 - Physics Colloquium
(covering in part this work), University
of
British
Columbia, Canada.
- 03/2010 - Physics Colloquium University of Victoria, Canada.
- 07/2009 - Invited Conference
Talk Chaos/Xaoc,
Conference Center of the National Academy of Sciences, Woods Hole, MA.
- 08/2008 - Invited Lecture Series
(covering in part this work), Santa
Fe Institute Complex Systems Summer School, Institute of
Theoretical Physics, Chinese Academy of Sciences (CAS), Beijing, China.
- 10/2008 - Machine Learning
Seminar Max Planck Institute for Biological Cybernetics,
Tuebingen, Germany.
- 01/2007 - Invited Talk TU
Munich (covering in part this work), Institute of Computer Science,
Munich, Germany.
Synopsis:
The information bottleneck method (IB) is an instantiation of rate
distortion theory that captures only the relevant bits of a data set.
What is considered relevant is specified by means of a relevant
variable that co-occurs with the data. IB assumes that the underlying
joint probability distribution is known. However, in practice one has
to estimate this distribution, and doing so from finite data results in
errors which result in an overestimate of the available relevant
information. This may lead to over-fitting.
In this paper we use perturbation theory to derive a correction that
counteracts this effect and thereby also provides a method for finding
the maximal number of clusters in a data set of size N, which can be
resolved without over-fitting, i.e.
which are stable under
sample fluctuations.
Talks:
- 12/2009 - Invited Talk Universitaet Koeln, Physics
Department, Koeln, Germany
- 06/2006 - Invited Talk McGill University, Montreal,
Canada. Department of Computer Science.
- 10/2005 - Guest Lecture University of Hawai’i, Manoa,
Honolulu, HI, Medical Informatics, ICS.
- 09/2005 - Invited Talk University College Dublin, Dublin,
Ireland.
- 04/2005 - Guest Lecture University of Hawai’i at Hilo,
Hilo, HI, Department of Computer Science.
- 04/2005 - Guest Lecture University of Hawai’i at Manoa,
Honolulu, HI, Department of Electrical Engineering.
- 04/2003 - Applied Mathematics
Seminar Columbia University,
New
York,
NY.
- 03/2003 - Invited Talk University of British Columbia,
Department of Physics, Vancouver, Canada.
- 08/2003 - Theoretical Biology
Seminar Humboldt University,
Berlin,
Germany,
- 08/2003 - Machine Learning and
Cognitive Neuroscience Seminar Hamilton
Institute, National University of Ireland, Maynooth, Ireland.
- 08/2003 - Invited Talk University of Hawai’i, Honolulu,
HI. Department of Electrical Engineering.
- 10/2008 - Machine Learning
Seminar Max Planck Institute for Biological Cybernetics,
Tuebingen, Germany.
- 07/2003 - Invited Talk ETH Zuerich, Institute
of
Computer Sciences, Switzerland.
Disclaimer:
This material is presented to ensure timely dissemination of
scholarly and technical work. Copyright and all rights therein
are retained by authors or by other copyright holders.
All person copying this information are expected to adhere to
the terms and constraints invoked by each author's copyright.
In most cases, these works may not be reposted
without the explicit permission of the copyright holder.
Les documents contenus dans ces répertoires sont rendus
disponibles
par les auteurs qui y ont contribué en vue d'assurer la
diffusion
à temps de travaux savants et techniques sur une base
non-commerciale.
Les droits de copie et autres droits sont gardés par les auteurs
et par les détenteurs du copyright, en dépit du fait
qu'ils présentent
ici leurs travaux sous forme électronique. Les personnes copiant
ces
informations doivent adhérer aux termes et contraintes couverts
par
le copyright de chaque auteur. Ces travaux ne peuvent pas être
rendus disponibles ailleurs sans la permission explicite du
détenteur
du copyright.