Anders Høst-Madsen

Information Theory for Machine Learning and Graphs

Coding, in the information theoretic sense, is the transformation of data into sequences of numbers (usually 0 and 1) so that the data can be quickly transmitted from a source to a receiver. The challenge of coding is how to find the most efficient transformation such that 1) the data can be reconstructed perfectly by the receiver, 2) the data is represented by the shortest possible sequence of numbers; the length of this sequence is called the description length. Minimizing description length is both of interest for storing and transmitting data and because the description length reveals important information about structure of data. For example, description length can be used to bound the error of a learning model in machine learning, thereby allowing the algorithm to select the best model for data classification or regression tasks. The project will expand the applications of description length to new types of problems and data. Furthermore, the project also seeks to expand the type of data traditionally considered in coding by going from sequential type to data on graphs, e.g., social networks like Facebook. The project will make it easier to share and store scientific data by developing efficient coding methods based on graph coding and learned coding. It will broaden the applicability of machine learning by developing efficient methods for model selection. The project will also develop new methods for anomaly detection in graphs that have applicability in medicine, such as the detection on onset seizures, power grids, and computer network security, such as fraudulent transactions.

Model selection is a central problem in many areas of science. Minimum description length (MDL) was developed by Rissanen to provide a formal criterion for model selection. It is based on coding data losslessly together with the model describing data, and choosing the model that results in the shortest total code length. The goal of the project is to extend minimum description length in new directions by going back to basics, namely that description length is based on coding. The focus is on developing new lossless coding methods. This includes the coding of graph structures, graphs with attributes, as well as combining machine learning and source coding. The research consists of three thrusts. In the first thrust, theory and practical algorithms for learned coding, which is coding based on training data, will be developed. Learned coding willbe used for model selection and hyperparameter optimization in machine learning methods. In the second thrust, coding methods for graphs will be developedand be used to select graph models of data. In the third thrust, algorithms for anomaly detection and novelty detection in graph-based data will be developed by combining learned coding and graph coding.

Delay and Energy in Wireless Communications

Energy consumption has become of increasing importance by the transition in communications to mobile devices with limited battery capacity. At the same time, delay needs to be carefully controlled due to the need to support the proliferation of time-sensitive services, such as video. Seen from a user perspective, these are the main things that affect the mobile experience: energy consumption (how long does the battery last), delay (how long to wait for content). From a societal perspective, wireless communications contribute significantly to global energy consumption and CO2 emissions. An estimate is that currently 0.5-1% of global CO2 emissions is due directly to wireless communications, comparable to that of air traffic (2%). At the same time, the number of devices that share the wireless medium is increasing dramatically. Yet, the desire for low delay and low energy consumption in congested systems conflict. In order to design future communication systems, it is therefore necessary to understand the fundamental tradeoff between energy and delay in congested systems. The insights from this fundamental analysis will be used to jointly design new communications software and energy efficient communication hardware, which will be tested in a wireless testbed. This hardware can be used as basis of future wireless communication devices.

The project aims to understand the factors that influence energy and delay such as the time characteristics of data to be transmitted and the characteristics of the channel. To obtain a fundamental understanding of these factors, it is necessary to analyze the information theory underlying delay and energy, taking into account realistic features of the systems, including networking and hardware. The project builds on the recent line of work in information theory on finite block length communications. Basically, if the block length is infinite, as in traditional information theory, the delay is also infinite. Therefore, in order to reach a more fundamental understanding of delay, finite block length analysis is needed. However, block length is not a direct indicator of delay. Therefore, finite block length theory has to be developed specifically for investigating delay and energy. Preliminary results show that delay affects energy consumption much more strongly than one would expect from bandwidth constraints alone. The project attacksthe tradeoff between delay and energy in four steps. First it develops the basic information theory relating delay and energy. It then applies this to multiuser systems, where spectral efficiency is essential. In the next step, it extends the models with realistic hardware constraints. In the final step, the theory is tested on a hardware testbed with state-of-the-art coding.

Learning Lossless and Lossy Coding

An estimated 330,000 billion bytes of data is generated daily in various forms: video, images, and music, but also scientific, economic and industrial content. This enormous amount of data has already transformed modern life in ways that are transparent (social media) and in ways that are not immediately visible (furthering scientific, business and economic goals through better modeling, forecast and use of data). Data is also used communicated on massive scales in many formats: videos, images and music, and in real time applications such as gaming, streaming content, video calls and telemedicine. Data generated is often compressed by algorithms that examines the data to understand the structure underlying data, remove redundant descriptions, and thus use fewer bits to represent the same. Therefore compression is used both to produce a more compact representation to aid in transmission over the Internet, but also to promote better understanding of content, and are used to build better scientific models. A recent focus from both researchers and big tech companies, including Amazon, Google and Microsoft, is to leverage AI and machine learning to aid in uncovering these patterns in data. This research will quantify how AI and machine learning can assist in compression, thus developing a better modeling of data and shorter descriptions of the same to use in communication.

We adopt a probabilistic framework that has been transformational in machine learning, called the "Probably Approximately Correct" framework, and use it to explore how algorithms can balance the amount of data they examine with the quality of the representative models they generate for it. We enhance this understanding with an active learning framework, where the algorithms can adapt how much data they need to examine, using more data for more subtle models and less data for simpler models, and figuring how when the underlying model may be simple with what is technically known as a "stopping rule". Compression methods include "lossless approaches" which can reproduce the original data exactly from the compressed representation (necessary for computer files and programs), and "lossy approaches" where the compressed representation irreversibly loses information, for example in images and video, but in a way that does not sacrifice human perception---we build our approaches for both. It has always been appreciated from the time of Francis Bacon and Occam that prediction is different from modeling. By using insights involving machine learning and compression together, this research will shed light into their interplay under the ubiquitous modeling scenario where one section of data influences another. In effect, this research anticipates not just better algorithms for compression, thereby also for estimation and prediction, but also fundamentally alter our understanding of how to better predict and model data.