Home
/
Authors
/
Karthyek Murthy

Author

Karthyek Murthy

Singapore University of Technology and Design

Other affiliations: Columbia University, Tata Institute of Fundamental Research

Bio: Karthyek Murthy is an academic researcher from Singapore University of Technology and Design. The author has contributed to research in topics: Robust optimization & Estimator. The author has an hindex of 10, co-authored 33 publications receiving 670 citations. Previous affiliations of Karthyek Murthy include Columbia University & Tata Institute of Fundamental Research.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Robust Wasserstein Profile Inference and Applications to Machine Learning

[...]

Jose Blanchet¹, Yang Kang², Karthyek Murthy³•Institutions (3)

Stanford University¹, Columbia University², Singapore University of Technology and Design³

18 Oct 2016-arXiv: Statistics Theory

TL;DR: In this article, the authors show that several machine learning estimators, including square-root LASSO (Least Absolute Shrinkage and Selection) and regularized logistic regression can be represented as solutions to distributionally robust optimization problems.

...read moreread less

Abstract: We show that several machine learning estimators, including square-root LASSO (Least Absolute Shrinkage and Selection) and regularized logistic regression can be represented as solutions to distributionally robust optimization (DRO) problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (Robust Wasserstein Profile Inference), a novel inference methodology which extends the use of methods inspired by Empirical Likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence, we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.

...read moreread less

230 citations

Journal Article•DOI•

Quantifying Distributional Model Risk via Optimal Transport

[...]

Jose Blanchet¹, Karthyek Murthy²•Institutions (2)

Stanford University¹, Singapore University of Technology and Design²

10 Apr 2019-Mathematics of Operations Research

TL;DR: In this paper, the problem of quantifying the impact of model misspecification when computing general expected values of interest is addressed, and a methodology that is applicable in great gene sequencing is proposed.

...read moreread less

192 citations

Posted Content•

Quantifying Distributional Model Risk via Optimal Transport

[...]

Jose Blanchet¹, Karthyek Murthy²•Institutions (2)

Stanford University¹, Singapore University of Technology and Design²

05 Apr 2016-arXiv: Probability

TL;DR: In this paper, the problem of quantifying the impact of model misspecification when computing general expected values of interest is addressed, and bounds for the expectation of interest regardless of the probability measure used, as long as the measure lies within a prescribed tolerance measured in terms of a flexible class of distances from a suitable baseline model.

...read moreread less

Abstract: This paper deals with the problem of quantifying the impact of model misspecification when computing general expected values of interest. The methodology that we propose is applicable in great generality, in particular, we provide examples involving path dependent expectations of stochastic processes. Our approach consists in computing bounds for the expectation of interest regardless of the probability measure used, as long as the measure lies within a prescribed tolerance measured in terms of a flexible class of distances from a suitable baseline model. These distances, based on optimal transportation between probability measures, include Wasserstein's distances as particular cases. The proposed methodology is well-suited for risk analysis, as we demonstrate with a number of applications. We also discuss how to estimate the tolerance region non-parametrically using Skorokhod-type embeddings in some of these applications.

...read moreread less

127 citations

Journal Article•DOI•

Robust Wasserstein Profile Inference and Applications to Machine Learning

[...]

Jose Blanchet¹, Yang Kang², Karthyek Murthy³•Institutions (3)

Stanford University¹, Columbia University², Singapore University of Technology and Design³

01 Sep 2019-Journal of Applied Probability

TL;DR: Wasserstein Profile Inference is introduced, a novel inference methodology which extends the use of methods inspired by Empirical Likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case).

...read moreread less

Abstract: We show that several machine learning estimators, including square-root least absolute shrinkage and selection and regularized logistic regression, can be represented as solutions to distributionally robust optimization problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (robust Wasserstein profile inference), a novel inference methodology which extends the use of methods inspired by empirical likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.

...read moreread less

110 citations

Journal Article•DOI•

Quantifying Distributional Model Risk Via Optimal Transport

[...]

Jose Blanchet¹, Karthyek Murthy¹•Institutions (1)

Columbia University¹

04 Apr 2016-Social Science Research Network

TL;DR: The approach consists in computing bounds for the expectation of interest regardless of the probability measure used, as long as the measure lies within a prescribed tolerance measured within a flexible class of distances from a suitable baseline model.

...read moreread less

Abstract: This paper deals with the problem of quantifying the impact of model misspecification when computing general expected values of interest. The methodology that we propose is applicable in great generality, in particular, we provide examples involving path-dependent expectations of stochastic processes. Our approach consists in computing bounds for the expectation of interest regardless of the probability measure used, as long as the measure lies within a prescribed tolerance measured in terms of a flexible class of distances from a suitable baseline model. These distances, based on optimal transportation between probability measures, include Wasserstein’s distances as particular cases. The proposed methodology is well-suited for risk analysis, as we demonstrate with a number of applications. We also discuss how to estimate the tolerance region non-parametrically using Skorokhod-type embeddings in some of these applications.

...read moreread less

102 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Book Chapter•DOI•

Convergence of probability measures

[...]

Richard F. Bass

01 Jan 2011

TL;DR: Weakconvergence methods in metric spaces were studied in this article, with applications sufficient to show their power and utility, and the results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables.

...read moreread less

Abstract: The author's preface gives an outline: "This book is about weakconvergence methods in metric spaces, with applications sufficient to show their power and utility. The Introduction motivates the definitions and indicates how the theory will yield solutions to problems arising outside it. Chapter 1 sets out the basic general theorems, which are then specialized in Chapter 2 to the space C[0, l ] of continuous functions on the unit interval and in Chapter 3 to the space D [0, 1 ] of functions with discontinuities of the first kind. The results of the first three chapters are used in Chapter 4 to derive a variety of limit theorems for dependent sequences of random variables. " The book develops and expands on Donsker's 1951 and 1952 papers on the invariance principle and empirical distributions. The basic random variables remain real-valued although, of course, measures on C[0, l ] and D[0, l ] are vitally used. Within this framework, there are various possibilities for a different and apparently better treatment of the material. More of the general theory of weak convergence of probabilities on separable metric spaces would be useful. Metrizability of the convergence is not brought up until late in the Appendix. The close relation of the Prokhorov metric and a metric for convergence in probability is (hence) not mentioned (see V. Strassen, Ann. Math. Statist. 36 (1965), 423-439; the reviewer, ibid. 39 (1968), 1563-1572). This relation would illuminate and organize such results as Theorems 4.1, 4.2 and 4.4 which give isolated, ad hoc connections between weak convergence of measures and nearness in probability. In the middle of p. 16, it should be noted that C*(S) consists of signed measures which need only be finitely additive if 5 is not compact. On p. 239, where the author twice speaks of separable subsets having nonmeasurable cardinal, he means "discrete" rather than "separable." Theorem 1.4 is Ulam's theorem that a Borel probability on a complete separable metric space is tight. Theorem 1 of Appendix 3 weakens completeness to topological completeness. After mentioning that probabilities on the rationals are tight, the author says it is an

...read moreread less

3,554 citations

Posted Content•

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization.

[...]

Shiori Sagawa¹, Pang Wei Koh¹, Tatsunori Hashimoto², Percy Liang¹•Institutions (2)

Stanford University¹, Microsoft²

20 Nov 2019-arXiv: Learning

TL;DR: The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

...read moreread less

Abstract: Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization---a stronger-than-typical L2 penalty or early stopping---we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

...read moreread less

579 citations

Posted Content•

Certifying Some Distributional Robustness with Principled Adversarial Training

[...]

Aman Sinha¹, Hongseok Namkoong¹, Riccardo Volpi¹, John C. Duchi²•Institutions (2)

Stanford University¹, Istituto Italiano di Tecnologia²

29 Oct 2017-arXiv: Machine Learning

TL;DR: In this paper, a training procedure that augments model parameter updates with worst-case perturbations of training data is proposed to guarantee moderate levels of robustness with little computational or statistical cost relative to empirical risk minimization.

...read moreread less

Abstract: Neural networks are vulnerable to adversarial examples and researchers have proposed many heuristic attack and defense mechanisms. We address this problem through the principled lens of distributionally robust optimization, which guarantees performance under adversarial input perturbations. By considering a Lagrangian penalty formulation of perturbing the underlying data distribution in a Wasserstein ball, we provide a training procedure that augments model parameter updates with worst-case perturbations of training data. For smooth losses, our procedure provably achieves moderate levels of robustness with little computational or statistical cost relative to empirical risk minimization. Furthermore, our statistical guarantees allow us to efficiently certify robustness for the population loss. For imperceptible perturbations, our method matches or outperforms heuristic approaches.

...read moreread less

506 citations

Posted Content•

Distributionally Robust Stochastic Optimization with Wasserstein Distance

[...]

Rui Gao¹, Anton J. Kleywegt¹•Institutions (1)

Georgia Institute of Technology¹

08 Apr 2016-arXiv: Optimization and Control

TL;DR: The paper argues that the set of distributions chosen should be chosen to be appropriate for the application at hand, and that some of the choices that have been popular until recently are, for many applications, not good choices.

...read moreread less

Abstract: Distributionally robust stochastic optimization (DRSO) is an approach to optimization under uncertainty in which, instead of assuming that there is an underlying probability distribution that is known exactly, one hedges against a chosen set of distributions. In this paper, we consider sets of distributions that are within a chosen Wasserstein distance from a nominal distribution. We argue that such a choice of sets has two advantages: (1) The resulting distributions hedged against are more reasonable than those resulting from other popular choices of sets, such as {\Phi}-divergence ambiguity set. (2) The problem of determining the worst-case expectation has desirable tractability properties. We derive a dual reformulation of the corresponding DRSO problem and construct approximate worst-case distributions (or an exact worst-case distribution if it exists) explicitly via the first-order optimality conditions of the dual problem. Our contributions are five-fold. (i) We identify necessary and sufficient conditions for the existence of a worst-case distribution, which is naturally related to the growth rate of the objective function. (ii) We show that the worst-case distributions resulting from an appropriate Wasserstein distance have a concise structure and a clear interpretation. (iii) Using this structure, we show that data-driven DRSO problems can be approximated to any accuracy by robust optimization problems, and thereby many DRSO problems become tractable by using tools from robust optimization. (iv) To the best of our knowledge, our proof of strong duality is the first constructive proof for DRSO problems, and we show that the constructive proof technique is also useful in other contexts. (v) Our strong duality result holds in a very general setting, and we show that it can be applied to infinite dimensional process control problems and worst-case value-at-risk analysis.

...read moreread less

505 citations

Posted Content•

Distributionally Robust Optimization: A Review

[...]

Hamed Rahimian¹, Sanjay Mehrotra¹•Institutions (1)

Northwestern University¹

13 Aug 2019-arXiv: Optimization and Control

TL;DR: Main concepts and contributions to DRO are surveyed, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization are surveyed.

...read moreread less

Abstract: The concepts of risk-aversion, chance-constrained optimization, and robust optimization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling framework, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization.

...read moreread less

348 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121

Collapse