scispace - formally typeset
Search or ask a question
Author

Karthyek Murthy

Bio: Karthyek Murthy is an academic researcher from Singapore University of Technology and Design. The author has contributed to research in topics: Robust optimization & Estimator. The author has an hindex of 10, co-authored 33 publications receiving 670 citations. Previous affiliations of Karthyek Murthy include Columbia University & Tata Institute of Fundamental Research.

Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors show that several machine learning estimators, including square-root LASSO (Least Absolute Shrinkage and Selection) and regularized logistic regression can be represented as solutions to distributionally robust optimization problems.
Abstract: We show that several machine learning estimators, including square-root LASSO (Least Absolute Shrinkage and Selection) and regularized logistic regression can be represented as solutions to distributionally robust optimization (DRO) problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (Robust Wasserstein Profile Inference), a novel inference methodology which extends the use of methods inspired by Empirical Likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence, we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.

230 citations

Journal ArticleDOI
TL;DR: In this paper, the problem of quantifying the impact of model misspecification when computing general expected values of interest is addressed, and a methodology that is applicable in great gene sequencing is proposed.
Abstract: This paper deals with the problem of quantifying the impact of model misspecification when computing general expected values of interest. The methodology that we propose is applicable in great gene...

192 citations

Posted Content
TL;DR: In this paper, the problem of quantifying the impact of model misspecification when computing general expected values of interest is addressed, and bounds for the expectation of interest regardless of the probability measure used, as long as the measure lies within a prescribed tolerance measured in terms of a flexible class of distances from a suitable baseline model.
Abstract: This paper deals with the problem of quantifying the impact of model misspecification when computing general expected values of interest. The methodology that we propose is applicable in great generality, in particular, we provide examples involving path dependent expectations of stochastic processes. Our approach consists in computing bounds for the expectation of interest regardless of the probability measure used, as long as the measure lies within a prescribed tolerance measured in terms of a flexible class of distances from a suitable baseline model. These distances, based on optimal transportation between probability measures, include Wasserstein's distances as particular cases. The proposed methodology is well-suited for risk analysis, as we demonstrate with a number of applications. We also discuss how to estimate the tolerance region non-parametrically using Skorokhod-type embeddings in some of these applications.

127 citations

Journal ArticleDOI
TL;DR: Wasserstein Profile Inference is introduced, a novel inference methodology which extends the use of methods inspired by Empirical Likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case).
Abstract: We show that several machine learning estimators, including square-root least absolute shrinkage and selection and regularized logistic regression, can be represented as solutions to distributionally robust optimization problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (robust Wasserstein profile inference), a novel inference methodology which extends the use of methods inspired by empirical likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.

110 citations

Journal ArticleDOI
TL;DR: The approach consists in computing bounds for the expectation of interest regardless of the probability measure used, as long as the measure lies within a prescribed tolerance measured within a flexible class of distances from a suitable baseline model.
Abstract: This paper deals with the problem of quantifying the impact of model misspecification when computing general expected values of interest. The methodology that we propose is applicable in great generality, in particular, we provide examples involving path-dependent expectations of stochastic processes. Our approach consists in computing bounds for the expectation of interest regardless of the probability measure used, as long as the measure lies within a prescribed tolerance measured in terms of a flexible class of distances from a suitable baseline model. These distances, based on optimal transportation between probability measures, include Wasserstein’s distances as particular cases. The proposed methodology is well-suited for risk analysis, as we demonstrate with a number of applications. We also discuss how to estimate the tolerance region non-parametrically using Skorokhod-type embeddings in some of these applications.

102 citations


Cited by
More filters
Book ChapterDOI
01 Jan 2011
TL;DR: Weakconvergence methods in metric spaces were studied in this article, with applications sufficient to show their power and utility, and the results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables.
Abstract: The author's preface gives an outline: "This book is about weakconvergence methods in metric spaces, with applications sufficient to show their power and utility. The Introduction motivates the definitions and indicates how the theory will yield solutions to problems arising outside it. Chapter 1 sets out the basic general theorems, which are then specialized in Chapter 2 to the space C[0, l ] of continuous functions on the unit interval and in Chapter 3 to the space D [0, 1 ] of functions with discontinuities of the first kind. The results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables. " The book develops and expands on Donsker's 1951 and 1952 papers on the invariance principle and empirical distributions. The basic random variables remain real-valued although, of course, measures on C[0, l ] and D[0, l ] are vitally used. Within this framework, there are various possibilities for a different and apparently better treatment of the material. More of the general theory of weak convergence of probabilities on separable metric spaces would be useful. Metrizability of the convergence is not brought up until late in the Appendix. The close relation of the Prokhorov metric and a metric for convergence in probability is (hence) not mentioned (see V. Strassen, Ann. Math. Statist. 36 (1965), 423-439; the reviewer, ibid. 39 (1968), 1563-1572). This relation would illuminate and organize such results as Theorems 4.1, 4.2 and 4.4 which give isolated, ad hoc connections between weak convergence of measures and nearness in probability. In the middle of p. 16, it should be noted that C*(S) consists of signed measures which need only be finitely additive if 5 is not compact. On p. 239, where the author twice speaks of separable subsets having nonmeasurable cardinal, he means "discrete" rather than "separable." Theorem 1.4 is Ulam's theorem that a Borel probability on a complete separable metric space is tight. Theorem 1 of Appendix 3 weakens completeness to topological completeness. After mentioning that probabilities on the rationals are tight, the author says it is an

3,554 citations

Posted Content
TL;DR: The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
Abstract: Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization---a stronger-than-typical L2 penalty or early stopping---we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

579 citations

Posted Content
TL;DR: In this paper, a training procedure that augments model parameter updates with worst-case perturbations of training data is proposed to guarantee moderate levels of robustness with little computational or statistical cost relative to empirical risk minimization.
Abstract: Neural networks are vulnerable to adversarial examples and researchers have proposed many heuristic attack and defense mechanisms. We address this problem through the principled lens of distributionally robust optimization, which guarantees performance under adversarial input perturbations. By considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserstein ball, we provide a training procedure that augments model parameter updates with worst-case perturbations of training data. For smooth losses, our procedure provably achieves moderate levels of robustness with little computational or statistical cost relative to empirical risk minimization. Furthermore, our statistical guarantees allow us to efficiently certify robustness for the population loss. For imperceptible perturbations, our method matches or outperforms heuristic approaches.

506 citations

Posted Content
TL;DR: The paper argues that the set of distributions chosen should be chosen to be appropriate for the application at hand, and that some of the choices that have been popular until recently are, for many applications, not good choices.
Abstract: Distributionally robust stochastic optimization (DRSO) is an approach to optimization under uncertainty in which, instead of assuming that there is an underlying probability distribution that is known exactly, one hedges against a chosen set of distributions. In this paper, we consider sets of distributions that are within a chosen Wasserstein distance from a nominal distribution. We argue that such a choice of sets has two advantages: (1) The resulting distributions hedged against are more reasonable than those resulting from other popular choices of sets, such as {\Phi}-divergence ambiguity set. (2) The problem of determining the worst-case expectation has desirable tractability properties. We derive a dual reformulation of the corresponding DRSO problem and construct approximate worst-case distributions (or an exact worst-case distribution if it exists) explicitly via the first-order optimality conditions of the dual problem. Our contributions are five-fold. (i) We identify necessary and sufficient conditions for the existence of a worst-case distribution, which is naturally related to the growth rate of the objective function. (ii) We show that the worst-case distributions resulting from an appropriate Wasserstein distance have a concise structure and a clear interpretation. (iii) Using this structure, we show that data-driven DRSO problems can be approximated to any accuracy by robust optimization problems, and thereby many DRSO problems become tractable by using tools from robust optimization. (iv) To the best of our knowledge, our proof of strong duality is the first constructive proof for DRSO problems, and we show that the constructive proof technique is also useful in other contexts. (v) Our strong duality result holds in a very general setting, and we show that it can be applied to infinite dimensional process control problems and worst-case value-at-risk analysis.

505 citations

Posted Content
TL;DR: Main concepts and contributions to DRO are surveyed, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization are surveyed.
Abstract: The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization.

348 citations