FODSI Seminar

Spring 2021

fodsi seminar

Suvrit Sra (MIT)

SGD without replacement: optimal rate analysis and more

Stochastic gradient descent (SGD) is omnipresent in machine learning. Two fundamental versions of SGD exist: (i) one that picks stochastic gradients with replacement, and (ii) one that picks gradients without replacement. Ironically, version (ii) is what is used in practice, while version (i) is what most theoretical works analyze. This mismatch is well-known. It arises because without replacement sampling leads to non-independent stochastic gradients, which makes analysis hard.
In this talk I will present recent progress on analyzing without replacement SGD, focusing in particular on two key variants: RandomShuffle and SingleShuffle. I will summarize the best known convergence rates, as well as important refinements that result under additional assumptions on the loss functions. The results presented remove drawbacks common to most previous works on this topic.
Based on work with: Chulhee Yun and Kwangjun Ahn
fodsi seminar

Jiantao Jiao (UC Berkeley)

Sharp Minimax Rates for Imitation Learning

We establish sharp minimax bounds on Imitation Learning (IL) in episodic Markov Decision Processes (MDPs), where the learner is provided a dataset of demonstrations from an expert. It is known that Behavior Cloning (BC) achieves suboptimality growing quadratically in horizon, which is termed as error compounding in the literature. We show that when the MDP transition function is unknown, all algorithms have to suffer a suboptimality that grows quadratically with the horizon, even if the algorithm can interactively query the expert such as in the setting of DAGGER. We then propose the setting of known transitions and show that one can provably break the quadratic dependence and improve the exponent to 3/2, which is shown to be tight. Our upper bound is established using a computationally efficient algorithm which we name as Mimic-MD, and the lower bound is established by proving a two-way reduction between IL and the value estimation problem of the unknown expert policy under any given reward function, as well as linear functional estimation with subsampled observations. We further show that under the additional assumption that the expert is optimal for the true reward function, there exists an efficient algorithm, which we term as Mimic-Mixture, that provably achieves suboptimality independent of the horizon for arbitrary 3-state MDPs with rewards only at the terminal layer. In contrast, no algorithm can achieve suboptimality growing slower than the square root of the horizon with high probability if the expert is not constrained to be optimal. We formally establish the benefit of expert optimal assumption in the known transition setting and show that this additional assumption does not help when the transition functions are unknown. Based on joint work with Nived Rajaraman, Yanjun Han, Lin F. Yang, and Kannan Ramchandran.
fodsi seminar

Costis Daskalakis (MIT)

Equilibrium Computation and the Foundations of Deep Learning

Deep Learning has recently yielded important advances in single-agent learning challenges, much of that progress being fueled by the empirical success of gradient descent and its variants in computing local optima of non-convex optimization problems. In multi-agent learning applications, the role of single-objective optimization is played by equilibrium computation, yet our understanding of its complexity in settings that are relevant for Deep Learning remains sparse. In this talk we focus on min-max optimization of nonconvex-nonconcave objectives, which has found applications in GANs, and other adversarial learning problems. Here, not only are there no known gradient-descent based methods converging to even local and approximate min-max equilibria, but the computational complexity of identifying them remains poorly understood. We show that finding approximate local min-max equilibria of Lipschitz and smooth objectives requires a number of queries to the function and its gradient that is exponential in the relevant parameters, in sharp contrast to the polynomial number of queries required to find approximate local minima of non-convex objectives. Our oracle lower bound is a byproduct of a complexity-theoretic result showing that finding approximate local min-max equilibria is computationally equivalent to finding Brouwer fixed points, and Nash equilibria in non zero-sum games, and thus PPAD-complete. Minimal complexity theory knowledge will be assumed in the talk. Joint work with Stratis Skoulakis and Manolis Zampetakis.
fodsi seminar

Peng Ding (UC Berkeley)

Multiply robust estimation of causal effects under principal ignorability

Causal inference concerns not only the average effect of the treatment on the outcome but also the underlying mechanism through an intermediate variable of interest. Principal stratification characterizes such mechanism by targeting subgroup causal effects within principal strata, which are defined by the joint potential values of an intermediate variable. Due to the fundamental problem of causal inference, principal strata are inherently latent, rendering it challenging to identify and estimate subgroup effects within them. A line of research leverages the principal ignorability assumption that the latent principal strata are mean independent of the potential outcomes conditioning on the observed covariates. Under principal ignorability, we derive various nonparametric identification formulas for causal effects within principal strata in observational studies, which motivate estimators relying on the correct specifications of different parts of the observed-data distribution. Appropriately combining these estimators further yields new triply robust estimators for the causal effects within principal strata. These new estimators are consistent if two of the treatment, intermediate variable, and outcome models are correctly specified, and they are locally efficient if all three models are correctly specified. We show that these estimators arise naturally from either the efficient influence functions in the semiparametric theory or the model-assisted estimators in the survey sampling theory. We evaluate different estimators based on their finite-sample performance through simulation, apply them to two observational studies, and implement them in an open-source software package. Joint work with Zhichao Jiang and Shu Yang.
fodsi seminar

Michael I. Jordan (UC Berkeley)

Towards a Blend of Machine Learning and Economics

Much of the recent focus in machine learning has been on the pattern-recognition side of the field. The decision-making side of the field is in need of a similar level of attention, given that it is in the decisions supported by, or made by, learning systems that many consequential, real-world problems arise. Many of the fundamental challenges in decision-making involve learning systems that must cope with scarcity, competition, and allocations among many decision makers. Progress on such problems requires new blends of machine learning and microeconomics. I'll exemplify this blending with a discussion of exploration-exploitation tradeoffs for bandits that compete over a scarce resource, focusing on decentralized versions of this problem.