Computational Complexity of Statistical Problems Workshop

MIT. Cambridge, MA. June 14-16, 2023 (tutorial days June 12-13)

TUTORIAL LECTURES AND WORKSHOP

Register hereregistration is free but mandatory

Synopsis

The objective of the workshop is to help advance the methodology for reasoning about the computational complexity of statistical estimation. Over the last decade several communities and lines of work have begun to make progress on the core questions of the field. This workshop aims to highlight recent progress and form new collaborations among researchers from complexity theory, algorithms, statistics, learning theory, probability, information theory, and statistical physics.
  

Program

The workshop will take place at MIT on June 14-16, 2023 and will be held in Building 6, Room 6-120. We will also have two tutorial days June 12-13 in the same location: 6-120.

POSTER SUBMISSIONS: There will be a poster session on the first day of the workshop, June 14 and will be held in LIDS (6th Floor of 32D Stata Dreyfoos Tower). Poster submissions are welcome from all. To present a poster, please submit a brief (1-3 paragraphs) abstract to FODSI.CCSI@gmail.com by May 19, and a response will be given by May 23.
  

Organizers

  • Guy Bresler (MIT)
  • Sam Hopkins (MIT)
  • Prasad Raghavendra (UC Berkeley)
  • Nike Sun (MIT)

Tutorial Lectures

Day 1: Monday, June 12, 2023

Time (EDT) Speaker Title
9:30 - 10:00 Coffee and snacks
10:00 - 11:00 Vinod Vaikuntanathan (MIT) Lattices, Learning with Errors and Statistical Problems
11:10 - 12:10 Vinod Vaikuntanathan (MIT) Lattices, Learning with Errors and Statistical Problems
12:15 - 2:00 Lunch break
2:00 - 3:00 Virginia Vassilevska Williams (MIT) A fine-grained approach to algorithms and complexity
3:00 - 3:30 Coffee and Tea
3:30 - 4:30 Virginia Vassilevska Williams (MIT) A fine-grained approach to algorithms and complexity
4:30 - 5:00 Refreshments
  

Day 2: Tuesday, June 13, 2023

Time (EDT) Speaker Title
9:30 - 10:00 Coffee and snacks
10:00 - 11:00 Kuikui Liu (MIT) Spectral independence and applications to Markov chain mixing[Abstract]
I will discuss recent progress on the theory of Markov chain mixing times via the spectral independence framework. Through examples, I will describe intimate connections with statistical physics, the geometry of polynomials, and more. Finally, as an application, I will describe recent advances on our understanding of the uniform distribution over proper colorings of graphs.
11:10 - 12:10 Kuikui Liu (MIT) Spectral independence and applications to Markov chain mixing[Abstract]
I will discuss recent progress on the theory of Markov chain mixing times via the spectral independence framework. Through examples, I will describe intimate connections with statistical physics, the geometry of polynomials, and more. Finally, as an application, I will describe recent advances on our understanding of the uniform distribution over proper colorings of graphs.
12:15 - 2:00 Lunch break
2:00 - 3:00 Will Perkins (Georgia Tech) The cluster expansion and its statistical and algorithmic applications[Abstract]
I will introduce the cluster expansion, a classical tool from statistical physics used to prove absence of phase transition and describe the phase diagram of spin models on lattices. I'll then show how the cluster expansion can be applied in algorithmic and statistical settings to design efficient sampling algorithms and compute statistical indistinguishability thresholds.
3:00 - 3:30 Coffee and Tea
3:30 - 4:30 Will Perkins (Georgia Tech) The cluster expansion and its statistical and algorithmic applications
4:30 - 5:00 Refreshments
  

Workshop Program

Day 1: Wednesday, June 14, 2023

Time (EDT) Speaker Title
9:30 - 10:00 Coffee and snacks
10:00 - 10:45 David Steurer (ETH) Algorithms approaching the threshold for semi-random planted clique[Abstract]
We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian~\cite{FK01}. The previous best algorithms for this model succeed if the planted clique has size at least \(n^{2/3}\) in a graph with \(n\) vertices (Mehta, Mckenzie, Trevisan, 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching \(n^{1/2}\) -- the information-theoretic threshold in the semi-random model~\cite{steinhardt2017does} and a conjectured computational threshold even in the easier fully-random model. This result comes close to resolving open questions by Feige and Steinhardt. Our algorithms are based on higher constant degree sum-of-squares relaxation and rely on a new conceptual connection that translates certificates of upper bounds on biclique numbers in \emph{unbalanced} bipartite Erdős--Rényi random graphs into algorithms for semi-random planted clique. The use of a higher-constant degree sum-of-squares is essential in our setting: we prove a lower bound on the basic SDP for certifying bicliques that shows that the basic SDP cannot succeed for planted cliques of size \(k=o(n^2/3)\). We also provide some evidence that the information-computation trade-off of our current algorithms may be inherent by proving an average-case lower bound for unbalanced bicliques in the low-degree-polynomials model. The talk is based on a joint work with Rares-Darius Buhai and Pravesh K. Kothari.
10:45 - 11:15 Michael Celentano (UC Berkeley) Sudakov-Fernique post-AMP, and a new proof of the local convexity of the TAP free energy[Abstract]
In many problems in modern statistics and machine learning, it is often of interest to establish that a first order method on a non-convex risk function eventually enters a region of local strong convexity. In this talk, I will present an asymptotic comparison inequality, which we call the Sudakov-Fernique post-AMP inequality, which, in a certain class of problems involving a GOE matrix, is able to probe properties of an optimization landscape locally around the iterates of an approximate message passing (AMP) algorithm. As an example of its use, I will describe a new proof of a result in Celentano, Fan, Mei (2021), which establishes that the so-called TAP free energy in the Z2-synchronization problem is locally convex in the region to which AMP converges. As a consequence, the proof also confirms a conjecture of El Alaoui, Montanari, Sellke (2022) that their algorithm efficiently samples from the Sherrington-Kirkpatrick Gibbs measure throughout the "easy" regime.
11:30 - 12:15 Tselil Schramm (Stanford) Spectral clustering in high-dimensional Gaussian mixture block models [Abstract]
The Gaussian mixture block model is a simple generative model for networks: to generate a sample, we associate each node with a latent feature vector sampled from a mixture of Gaussians, and we add an edge between nodes if and only if their feature vectors are sufficiently similar. The different components of the Gaussian mixture represent the fact that there may be several types of nodes with different distributions over features -- for example, in a social network each component represents the different attributes of a distinct community. In this talk I will discuss recent results on the performance of spectral clustering algorithms on networks sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors grows as the size of the network goes to infinity. Our results merely begin to sketch out the information-computation landscape for clustering in these models, and I will make an effort to emphasize open questions. Based on joint work with Shuangping Li.
12:15 - 2:00 Lunch Break
2:00 - 2:45 Surbhi Goel (UPenn) Beyond Worst-case Sequential Prediction: Adversarial Robustness via Abstention[Abstract]
In this talk, we will focus on beyond worst-case sequential prediction. In particular, we will focus on the stochastic setting with an adversary that is allowed to inject clean-label adversarial (or out-of-distribution) examples. Traditional algorithms designed to handle purely stochastic data tend to fail in the presence of such adversarial examples, often leading to erroneous predictions, whereas, assuming fully adversarial data leads to very pessimistic bounds that are often vacuous in practice. To mitigate this, we will introduce a new model of sequential prediction that sits between the purely stochastic and fully adversarial settings by allowing the learner to abstain from making a prediction at no cost on adversarial examples. Assuming access to the marginal distribution on the non-adversarial examples, we will present a learner whose error scales with the VC dimension (mirroring the stochastic setting) of the hypothesis class, as opposed to the Littlestone dimension which characterizes the fully adversarial setting. Furthermore, we will design learners for certain special hypothesis classes including VC dimension 1 classes, which work even in the absence of access to the marginal distribution. We will conclude with several open questions and plausible connections of our framework with existing models. This is based on joint work with Steve Hanneke, Shay Moran, and Abhishek Shetty.
2:45 - 3:15 Shyam Narayanan (MIT) Robustness Implies Privacy in Statistical Estimation[Abstract]
We study the relationship between adversarial robustness and differential privacy in high-dimensional algorithmic statistics. We give the first black-box reduction from privacy to robustness which can produce private estimators with optimal tradeoffs among sample complexity, accuracy, and privacy for a wide range of fundamental high-dimensional parameter estimation problems, including mean and covariance estimation. We show that this reduction can be implemented in polynomial time in some important special cases. In particular, using nearly optimal polynomial-time robust estimators for the mean and covariance of high-dimensional Gaussians which are based on the Sum-of-Squares method, we design the first polynomial-time private estimators for these problems with nearly optimal samples-accuracy-privacy tradeoffs. Our algorithms are also robust to a constant fraction of adversarially-corrupted samples. This talk is based on joint work with Sam Hopkins, Gautam Kamath, and Mahbod Majid.
3:15 - 3:45 Coffee and Tea
3:45 - 4:30 Alex Wein (UC Davis) New Techniques for Low-Degree Lower Bounds [Abstract]
One form of rigorous evidence for computational hardness of statistical problems is low-degree lower bounds, i.e. provable failure of any algorithm that can be expressed as a low-degree multivariate polynomial in the observations. Inspired by sum-of-squares, low-degree lower bounds were first introduced in the context of distinguishing a "planted" distribution from an iid "null" distribution. I will give an overview of some new techniques that allow us to prove low-degree lower bounds in more complicated settings, including "planted-versus-planted" testing, estimation problems, and tensor decomposition.
4:30 - 5:00 Break
5:00 - 6:30 Poster Session Poster Session and Reception at LIDS (6th Floor of 32D Stata Dreyfoos Tower)
  

Day 2: Thursday, June 15, 2023

Time (EDT) Speaker Title
9:30 - 10:00 Coffee and snacks
10:00 - 10:45 Suriya Gunasekar (Microsoft) TBD[Abstract]
10:45 - 11:15 Sitan Chen (Harvard) Recent Progress on Provably Learning Neural Networks[Abstract]
We present recent upper and lower bounds for the well-studied problem of learning neural networks from Gaussian examples. In the first half, we give the first polynomial-time algorithm for learning arbitrary one-hidden-layer networks of constant width. The algorithm, which can be interpreted as solving a certain tensor decomposition problem in the absence of condition number bounds, is an iterative procedure for recovering “clumps” of neurons at multiple scales. In the second half, we give computational hardness results for learning two-hidden-layer networks. Contrary to conventional wisdom, we are able to prove statistical query lower bounds for a real-valued, noiseless learning problem. The talk is based on joint works with Zehao Dou, Surbhi Goel, Aravind Gollakota, Adam Klivans, and Raghu Meka.
11:30 - 12:15 Ronitt Rubinfeld (MIT) Properly learning monotone functions via local correction [Abstract]
We give a 2O~(n√/ε)-time algorithm for properly learning monotone Boolean functions under the uniform distribution over {0,1}n. Our algorithm is robust to adversarial label noise and has a running time nearly matching that of the state-of-the-art improper learning algorithm of Bshouty and Tamon (JACM 96) and an information-theoretic lower bound of Blais et al (RANDOM ’15). Prior to this work, no proper learning algorithm with running time smaller than 2Ω(n) was known to exist. The core of our proper learner is a local computation algorithm for sorting binary labels on a poset. Our algorithm is built on a body of work on distributed graph algorithms; specifically we rely on a recent work of Ghaffari (FOCS’22), which gives an efficient algorithm for computing maximal matchings in a graph in the LCA model of Rubinfeld et al and Alon et al (ICS’II, SODA’12). The applications of our local sorting algorithm extend beyond learning on the Boolean cube: we also give a tolerant tester for Boolean functions over general posets that distinguishes functions that are ε/3-close to monotone from those that are ε−far. Previous tolerant testers for the Boolean cube only distinguished between ε/Ω(n−−√)-close and ε−far. Joint work with Jane Lange and Arsen Vasilyan
12:15 - 2:00 Lunch Break
2:00 - 2:45 Li-Yang Tan (Stanford) Properly learning decision trees with queries is NP-hard[Abstract]
We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavín-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt–Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of decision tree minimization (Zantema–Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees, we give a general method for identifying a small set of inputs that is responsible for its complexity. Our technique even rules out query learners that are allowed constant error. This contrasts with existing lower bounds for the setting of random examples which only hold for inverse-polynomial error. Joint work with Caleb Koch and Carmen Strassle.
2:45 - 3:15 Sophie Yu (Duke) Random graph matching at Otter's tree-counting threshold via counting chandeliers[Abstract]
Given a pair of graphs, the problem of graph matching or network alignment refers to finding the underlying vertex correspondence that maximally aligns the edges. This is a ubiquitous problem arising in a variety of applications across diverse fields such as network privacy, computational biology, computer vision, and natural language processing. Graph matching is an instance of the notoriously difficult quadratic assignment problem (QAP), which is NP-hard to solve or approximate. Despite the worst-case computational hardness of QAP, I will present a computationally efficient graph matching algorithm based on counting a special family of trees. When the two networks are Erdős–Rényi random graphs with correlated edges through the hidden vertex correspondence, we show that our algorithm correctly matches all but a vanishing fraction of vertices with high probability as soon as the edge correlation exceeds the square root of Otter's constant. Moreover, we further upgrade the almost exact recovery to exact recovery whenever it is information-theoretically possible. This is the first polynomial-time algorithm that achieves exact and almost exact matching with an explicit constant correlation for both dense and sparse graphs. https://arxiv.org/pdf/2209.12313.pdf
3:15 - 3:45 Coffee and Tea
3:45 - 4:30 David Gamarnik (MIT) Combinatorial NLTS From the Overlap Gap Property [Abstract]
In an important recent development, Anshu, Breuckmann, and Nirkhe [ABN22] resolved positively the so-called No Low-Energy Trivial State (NLTS) conjecture by Freedman and Hastings [FH14]. The conjecture postulated the existence of linear-size local Hamiltonians on n qubit systems for which no near-ground state can be prepared by a shallow (sublogarithmic depth) circuit and thus is ‘’hard’’ (complex) to describe. Their construction, as well all of the prior similar constructions, is based on quantum codes. We provide the first “non-code” construction of a class of Hamiltonians satisfying a combinatorial version of the NLTS. Our proof uses complex solution space geometry of random K-SAT model. Specifically, it is known that above a certain clause-to-variables density the set of satisfying assignments of random K-SAT exhibits the overlap gap property, which implies that it can be partitioned into exponentially many clusters each constituting at most an exponentially small fraction of the total set of satisfying solutions. We show that for our constructed Hamiltonians every combinatorial near-ground state induces a near-uniform distribution supported by this set. Standard arguments then are used to show that such distributions cannot be prepared by quantum circuits with depth o(log n). Joint work with Eric R. Anschuetz (MIT) and Bobak Kiani (MIT)
4:30 - 5:00 Refreshments
  

Day 3: Friday, June 16, 2023

Time (EDT) Speaker Title
9:30 - 10:00 Coffee and snacks
10:00 - 10:45 Joan Bruna (NYU) Recent Progress on learning sparse high-dimensional functions[Abstract]
High-dimensional learning with provable guarantees is a delicate game where structural data assumptions interact with algorithms. Over the recent years, the class of sparse (or 'multi-index') functions has emerged as a model with both practical motivations and rich structure, enabling a mathematical theory of feature learning. In this talk, I will review recent progress on this front, by describing (i) the ability of gradient-descent algorithms to efficiently learn the multi-index class over Gaussian data, (ii) the robustness of such GD algorithms to non-Gaussian data, and (iii) SQ lower bounds for the single-index class. Joint work with Loucas Pillaud-Vivien, Aaron Zweig and Alex Damian.
10:45 - 11:15 Frederic Koehler (Stanford) When MCMC meets Variational Inference[Abstract]
Gibbs sampling fails to mix when the underlying distribution is multimodal. Nevertheless, such sampling problems may be computationally tractable. Rigorous study of the naive mean field approximation gives rise to a large and natural class of distributions which are "low complexity". Inspired by these ideas, we design a new algorithm combining MCMC and VI ideas which samples from a large class of (low threshold-rank) Ising models. Based on joint work with Holden Lee and Andrej Risteski.
11:30 - 12:15 Dana Yang (Cornell) TBD[Abstract]
Gibbs sampling fails to mix when the underlying distribution is multimodal. Nevertheless, such sampling problems may be computationally tractable. Rigorous study of the naive mean field approximation gives rise to a large and natural class of distributions which are "low complexity". Inspired by these ideas, we design a new algorithm combining MCMC and VI ideas which samples from a large class of (low threshold-rank) Ising models. Based on joint work with Holden Lee and Andrej Risteski.
12:15 - 2:00 Lunch Break
2:00 - 2:45 Aravindan Vijayraghavan (Northwestern) Finding planted elements in linear sections of varieties[Abstract]
We consider a class of algorithmic problems about finding elements in the intersection of a given algebraic variety with a given linear subspace. This has connections to important questions in quantum information theory and tensor decompositions, even in the special case when the variety is the set of rank-1 matrices. I will introduce a natural planted setting of this problem that captures "generic" instances of the problem. I will then describe new polynomial time algorithms along with their implications in certain parameter regimes, and also highlight some open questions. This is based on joint work with Nathaniel Johnston and Benjamin Lovitz.
2:45 - 3:15 Brice Huang (MIT) Algorithmic Threshold for Multi-Species Spherical Spin Glasses[Abstract]
This talk focuses on optimizing the random and non-convex Hamiltonians of spherical spin glasses with multiple species. Our main result gives a variational formula for the best possible value ALG achievable by class of Lipschitz algorithms and gives a matching algorithm in this class based on approximate message passing. Our hardness result is proved using the Branching OGP introduced in our previous work [H-Sellke 21] to identify ALG for single-species spin glasses. This and all other OGPs for spin glasses have been proved using Guerra's interpolation method. We introduce a new method to prove the Branching OGP which is both simpler and more robust. It works even for models in which the true maximum value of the objective function remains unknown. Based on joint work with Mark Sellke.
3:15 - 3:45 Coffee and Tea
3:45 - 4:30 Greg Valiant (Stanford) Optimization with Limited Memory?[Abstract]
In many high-dimensional optimization settings, there are significant gaps between the amount of data or number of iterations required by algorithms whose memory usage scales linearly with the dimension versus more complex and memory-intensive algorithms. Do some problems inherently require super-linear (or quadratic memory) to be solved efficiently? In this talk, we will survey the lay-of-the-land of fundamental tradeoffs between the amount of available memory, and the amount of data or queries to the function being optimized. This will include a discussion of our work showing that for optimizing a convex function, given the ability to query the function value/gradient at points, a significantly super-linear amount of memory is required to achieve the optimal rate of convergence obtained by algorithms using more than quadratic memory. Throughout, we will emphasize the many open problems in this area.
     

See also the 2021 FODSI workshop on ML 4 Algorithms here and 2022 FODSI workshop on sublinear algorithms here.