Machine Learning for Algorithms (ML4A)

Virtual Workshop. July 13-14, 2021


Registration is now closed.


While algorithms in computer science and optimization have classically supported machine learning, in recent years there has been increasing interest in the reverse direction: using machine learning to improve the performance of classical algorithms in computer science, by fine-tuning their behavior to adapt to the properties of the input distribution. Many applications involve processing streams of data (video, data logs, customer activity etc) by executing the same algorithm on an hourly, daily or weekly basis. These data sets are typically not "random" or "worst-case"; instead, they come from some distribution which does not change rapidly from execution to execution. This makes it possible to design better algorithms tailored to the specific data distribution, trained on past instances of the problem. This "data-driven" or "learning-based" approach to algorithm design has the potential to significantly improve the efficiency of some of the most widely used algorithms. For example, they have been used to design better data structures, online algorithms, streaming and sketching algorithms, market mechanisms and algorithms for combinatorial optimization, similarity search and inverse problems. This workshop will feature talks from experts at the forefront of this exciting area.


The workshop will take place virtually on July 13-14, 2021. Links for virtual participation will be shared to registrants in advance of the workshop.

Confirmed speakers

  • Alex Dimakis (UT Austin)
  • Yonina Eldar (Weizmann)
  • Anna Goldie (Google Brain, Stanford)
  • Reinhard Heckel (Technical University of Munich)
  • Stefanie Jegelka (MIT)
  • Tim Kraska (MIT)
  • Benjamin Moseley (CMU)
  • David Parkes (Harvard)
  • Ola Svensson (EPFL)
  • Tuomas Sandholm (CMU, Optimized Markets, Strategy Robot, Strategic Machine)
  • Sergei Vassilvitski (Google)
  • Ellen Vitercik (CMU/UC Berkeley)
  • David Woodruff (CMU)


  • Costis Daskalakis (MIT)
  • Paul Hand (Northeastern)
  • Piotr Indyk (MIT)
  • Michael Mitzenmacher (Harvard)
  • Ronitt Rubinfeld (MIT)
  • Jelani Nelson (UC Berkeley)


Day 1: Tuesday, July 13, 2021

Time (EDT) Speaker Title
11:15 - 11:30 Opening remarks
11:30 - 12:10 Alex Dimakis Deep Generative models and Inverse Problems[Abstract]
Modern deep generative models like GANs, VAEs and invertible flows are demonstrating excellent performance in representing high-dimensional distributions, especially for images. We will show how they can be used to solve inverse problems like denoising, filling missing data, and recovery from linear projections. We generalize compressed sensing theory beyond sparsity, extending Restricted Isometries to sets created by deep generative models. We will present the general framework, recent results and open problems in this space.
12:10 - 12:50 Yonina Eldar Model Based Deep Learning: Applications to Imaging and Communications[Abstract]

Deep neural networks provide unprecedented performance gains in many real-world problems in signal and image processing. Despite these gains, the future development and practical deployment of deep networks are hindered by their black-box nature, i.e., a lack of interpretability and the need for very large training sets.

On the other hand, signal processing and communications have traditionally relied on classical statistical modeling techniques that utilize mathematical formulations representing the underlying physics, prior information and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. Here we introduce various approaches to model based learning which merge parametric models with optimization tools leading to efficient, interpretable networks from reasonably sized training sets. We will consider examples of such model-based deep networks to image deblurring, image separation, super resolution in ultrasound and microscopy, efficient communications systems, and finally we will see how model-based methods can also be used for efficient diagnosis of COVID19 using X-ray and ultrasound.

12:50 - 1:30 Reinhard Heckel Measuring Robustness in Deep Learning Based Compressive Sensing[Abstract]

Traditional algorithms for reconstructing images from few and noisy measurements are handcrafted. Today, algorithms in form of deep networks learned on training data outperform traditional, handcrafted algorithms in computational cost and image quality.

However, recent works have raised concerns that deep-learning-based image reconstruction methods are sensitive to perturbations and are less robust than traditional, handcrafted, methods: Neural networks may be sensitive to small, yet adversarially-selected perturbations, may perform poorly under distribution shifts, and may fail to recover small but important features in an image. To understand the sensitivity to such perturbations, we measured the robustness of a variety of deep network based and traditional methods.

Perhaps surprisingly, in the context of accelerated magnetic resonance imaging, we find no evidence that deep learning based algorithms are less robust than classical, un-trained methods. Even for natural distribution shifts, we find that classical algorithms with a single hyper-parameter tuned on a training set compromise as much in performance than a neural network with 50 million parameters. Our results indicate that the state-of-the-art deep-learning-based image reconstruction methods provide improved performance than traditional methods without compromising robustness.

1:30 - 2:00 BREAK
2:00 - 2:40 Tuomas Sandholm Configuring algorithms automatically: From practice to theory[Abstract]

In the first part of the talk I describe how in my first startup we developed automated algorithm configuration for winner determination for $60 billion of combinatorial auctions 2001-2010. In the second part I will present research on configuration with generalization guarantees. I present a new general theory for setting parameters for algorithms or other artifacts. If a parameter vector yields strong average performance on training instances, it will ideally perform well on future instances. However, if the training set is too small, average performance will not generalize to previously unseen instances. How large is large enough? I give the first generalization guarantees for tree search algorithms. I then answer this for any artifact that satisfies a new, more relaxed ubiquitous property: its performance is a piecewise-structured function of its parameters. I also present a strong impossibility for the status quo approach, namely discretizing the parameter space. I will also present results on how one can 1) approximate the performance function with significantly fewer pieces and thus get away with dramatically fewer training instances, 2) learn a finite set of promising parameter vectors from an infinite set, which enables one to train frugally in run time while still offering generalization guarantees, and 3) learning both a portfolio of candidate algorithm configurations and a selector for choosing, on a per-instance basis, a configuration from the portfolio while achieving generalization guarantees. I will discuss applications to optimizing algorithm speed (branching and cutting planes in integer programming), to optimizing solution quality (in linkage-based clustering), and optimizing proxy objectives for settings where costly physical experiments yield the ground truth for training instances (in various sequencing problems in computational biology). I will also discuss applications to profit maximizing auctions, pricing mechanisms, and lotteries, and to welfare maximization in voting and redistribution.

The second part of the talk covers joint work with Ellen Vitercik, Nina Balcan, Travis Dick, Carl Kingsford and Dan DeBlasio.

2:40 - 3:20 Ellen Vitercik How much data is sufficient to learn high-performing algorithms?[Abstract]
Algorithms often have tunable parameters that have a considerable impact on their runtime and solution quality. A growing body of research has demonstrated that data-driven algorithm design can lead to significant gains in runtime and solution quality. Data-driven algorithm design uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide a broadly applicable theory for deriving generalization guarantees for data-driven algorithm design, which bound the difference between the algorithm's expected performance and its average performance over the training set. The challenge is that for many combinatorial algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a cascade of changes in the algorithm’s behavior. Prior research has proved generalization bounds by employing case-by-case analyses of parameterized greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We uncover a unifying structure which we use to prove extremely general guarantees, yet we recover the bounds from prior research. Our guarantees apply whenever an algorithm's performance is a piecewise-constant, -linear, or—more generally—piecewise-structured function of its parameters. As we demonstrate, our theory also implies novel bounds for dynamic programming algorithms used in computational biology. This talk is based on joint work with Nina Balcan, Dan DeBlasio, Travis Dick, Carl Kingsford, and Tuomas Sandholm.
3:20 - 3:50 BREAK
3:50 - 4:30 Stefanie Jegelka Learning in Graph Neural Networks for Algorithms[Abstract]

Graph Neural Networks (GNNs) have become a popular tool for learning certain algorithmic tasks, in particular related to combinatorial optimization. In this talk, we will focus on the “algorithmic reasoning” task of learning a full algorithm.

While GNNs have shown promising empirical results, their generalization properties are less well understood. In particular, the task and data properties, architecture and learning algorithm all affect the results. For instance, empirically, we observe an interplay between the structure of the task — or target algorithm — and the inductive biases of the architecture: although many networks may be able to represent a task, some architectures learn it better than others. I will show an approach to formalize this relationship, and empirical and theoretical implications for generalization within and out of the training distribution.

This talk is based on joint work with Keyulu Xu, Jingling Li, Mozhi Zhang, Simon S. Du and Ken-ichi Kawarabayashi.

4:30 - 5:10 Anna Goldie Model Parallelism and Chip Floorplanning with Deep Reinforcement Learning[Abstract]
Rapid progress in machine learning (ML) has been fueled by advances in computer systems and hardware, but with the end of Moore's Law and Dennard Scaling, it is time for ML to return the favor and transform the way in which we design systems and hardware. In this talk, I will describe learning-based approaches to combinatorial optimization problems in computer systems. First, I will describe our approach to device placement (aka model parallelism), the task of partitioning a machine learning model across multiple, heterogeneous hardware devices for the purpose of minimizing runtime at training or inference time. Through repeated interactions with models running on real hardware, our RL agent implicitly learns to tradeoff load balancing and communication between the available hardware devices, achieving reductions in runtime over the best performing baselines. Next, I will discuss our work on a domain-transferable reinforcement learning method for chip floorplanning, a long pole in the overall chip design process. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our learning-based method can generate placements that are superhuman or comparable on modern accelerator chips, whereas the strongest baselines require human experts in the loop and can take several weeks or months.

Day 2: Wednesday, July 14, 2021

Time (EDT) Speaker Title
11:30 - 12:10 Tim Kraska Learned Index Structure: A progress report[Abstract]
Recent advancements in learned index structures propose replacing existing index structures, like B-Trees, with approximate learned models. In this talk, I will first provide an overview of the recent advances in Learned Index Structures and afterwards outline present our exchaustive benchmark study that compares well-tuned implementations of three learned index structures against several state-of-the-art “traditional” baselines. Using four real-world datasets, we demonstrate that learned index structures can indeed outperform non-learned indexes in read-only in-memory workloads over a dense array.
12:10 - 12:50 David Woodruff Learning-Augmented Data Stream Algorithms[Abstract]
The data stream model is a fundamental model for processing massive data sets with limited memory and fast processing time. Recently Hsu et al. (2019) incorporated machine learning techniques into the data stream model in order to learn relevant patterns in the input data. Such techniques were encapsulated by training an oracle to predict item frequencies in the streaming model. We explore the full power of such an oracle, showing that it can be applied to a wide array of problems in data streams, sometimes resulting in the first optimal bounds for such problems. Namely, we apply the oracle to counting distinct elements on the difference of streams, estimating frequency moments, estimating cascaded aggregates, and estimating moments of geometric data streams. For the distinct elements problem, we obtain the first memory-optimal algorithms. For estimating the p-th frequency moment for 0 < p < 2 we obtain the first algorithms with optimal update time. For estimating the p-th frequency moment for p > 2 we obtain a quadratic saving in memory. We empirically validate our results, demonstrating also our improvements in practice. Based on joint work with Tanqiu Jiang, Yi Li, Honghao Li, and Yisong Ruan
12:50 - 1:20 BREAK
1:20 - 2:00 Sergei Vassilvitskii Faster Matchings via Learned Duals[Abstract]
To study how maching learning can be used to speed up algorithms we combine the idea of machine-learned predictions with the idea of "warm-starting" primal-dual algorithms. We consider one of the most important primitives in combinatorial optimization: weighted bipartite matching and its generalization to b-matching. We identify three key challenges when using learned dual variables in a primal-dual algorithm. First, predicted duals may be infeasible, so we give an algorithm that efficiently maps predicted infeasible duals to nearby feasible solutions. Second, once the duals are feasible, they may not be optimal, so we show that they can be used to quickly find an optimal solution. Finally, such predictions are useful only if they can be learned, so we show that the problem of learning duals for matching has low sample complexity. We validate our theoretical findings through experiments on both real and synthetic data. As a result we give a rigorous, practical, and empirically effective method to compute bipartite matchings.
2:00 - 2:40 Ola Svensson Learning-Augmented Online Algorithms and the Primal-Dual Method[Abstract]

The design of learning-augmented online algorithms is a new and active research area. The goal is to understand how to best incorporate predictions of the future provided e.g. by machine learning algorithms that rarely come with guarantees on their accuracy.

In the absence of guarantees, the difficulty in the design of such learning-augmented algorithms is to find a good balance: on the one hand, following blindly the prediction might lead to a very bad solution if the prediction is misleading. On the other hand, if the algorithm does not trust the prediction at all, it will simply never benefit from an excellent prediction. An explosion of recent results solve this issue by designing smart algorithms that exploit the problem structure to achieve a good trade-off between these two cases.

In this talk, we will discuss this emerging line of work. In particular, we will show how to unify and generalize some of these results by extending the powerful primal-dual method for online algorithms to the learning augmented setting.

This is joint work with Etienne Bamas and Andreas Maggiori.

2:40 - 3:10 BREAK
3:10 - 3:50 Ben Moseley Learnable and Instance-Robust Predictions for Matchings, Flows and Load Balancing[Abstract]
This talk will discuss a model for augmenting algorithms with useful predictions that go beyond worst-case bounds on the algorithm performance. The model ensures predictions are formally learnable and instance robust. Learnability guarantees that predictions can be efficiently constructed from past data. Instance robustness formally ensures a prediction is robust to modest changes in the problem input. Further, the robustness model ensures two different predictions can be objectively compared. This talk will discuss predictions which satisfy these properties for a network flow allocation problem and the restricted assignment makespan minimization problem. The talk will further discuss an empirical investigation of these algorithms for online allocations which demonstrates the connection between theory and practice.
3:50 - 4:30 David Parkes Differentiable economics[Abstract]
In this talk, I will introduce and provide illustrations of the use of differentiable representations of economic rules, together with suitable objectives, architectures, and constraints, with which to use the methods of deep learning to discover incentive-aligned mechanisms that approximately optimize various kinds of desiderata. Time permitting, I will illustrate this approach to the design of auctions and two-sided matching mechanisms.