Invariant Risk Minimization
Citations
492 citations
Cites background or methods from "Invariant Risk Minimization"
...Akuzawa et al. (2019) extend DANN by considering cases where there exists an statistical dependence between the domain and the class label variables. Albuquerque et al. (2019) extend DANN by considering one-versus-all adversaries that try to predict to which training domain does each of the examples belong to....
[...]
...Akuzawa et al. (2019) extend DANN by considering cases where there exists an statistical dependence between the domain and the class label variables. Albuquerque et al. (2019) extend DANN by considering one-versus-all adversaries that try to predict to which training domain does each of the examples belong to. Li et al. (2018b) employ GANs and the maximum mean discrepancy criteria (Gretton et al., 2012) to align feature distributions across domains. Matsuura and Harada (2019) leverages clustering techniques to learn domaininvariant features even when the separation between training domains is not given. Li et al. (2018c;d) learns a feature transformation φ such that the conditional distributions P (φ(X) | Y d = y) match for all training domains d and label values y. Shankar et al. (2018) use a domain classifier to construct adversarial examples for a label classifier, and use a label classifier to construct adversarial examples for the domain classifier. This results in a label classifier with better domain generalization. Li et al. (2019a) train a robust feature extractor and classifier. The robustness comes from (i) asking the feature extractor to produce features such that a classifier trained on domain d can classify instances for domain d′ 6= d, and (ii) asking the classifier to predict labels on domain d using features produced by a feature extractor trained on domain d′ 6= d. Li et al. (2020) adopt a lifelong learning strategy to attack the problem of domain generalization. Motiian et al. (2017) learn a feature representation such that (i) examples from different domains but the same class are close, (ii) examples from different domains and classes are far, and (iii) training examples can be correctly classified. Ilse et al. (2019) train a variational autoencoder (Kingma and Welling, 2014) where the bottleneck representation factorizes knowledge about domain, class label, and residual variations in the input space. Fang et al. (2013) learn a structural SVM metric such that the neighborhood of each example contains examples from the same category and all training domains....
[...]
...Akuzawa et al. (2019) extend DANN by considering cases where there exists an statistical dependence between the domain and the class label variables....
[...]
...Akuzawa et al. (2019) extend DANN by considering cases where there exists an statistical dependence between the domain and the class label variables. Albuquerque et al. (2019) extend DANN by considering one-versus-all adversaries that try to predict to which training domain does each of the examples belong to. Li et al. (2018b) employ GANs and the maximum mean discrepancy criteria (Gretton et al., 2012) to align feature distributions across domains. Matsuura and Harada (2019) leverages clustering techniques to learn domaininvariant features even when the separation between training domains is not given. Li et al. (2018c;d) learns a feature transformation φ such that the conditional distributions P (φ(X) | Y d = y) match for all training domains d and label values y. Shankar et al. (2018) use a domain classifier to construct adversarial examples for a label classifier, and use a label classifier to construct adversarial examples for the domain classifier. This results in a label classifier with better domain generalization. Li et al. (2019a) train a robust feature extractor and classifier. The robustness comes from (i) asking the feature extractor to produce features such that a classifier trained on domain d can classify instances for domain d′ 6= d, and (ii) asking the classifier to predict labels on domain d using features produced by a feature extractor trained on domain d′ 6= d. Li et al. (2020) adopt a lifelong learning strategy to attack the problem of domain generalization. Motiian et al. (2017) learn a feature representation such that (i) examples from different domains but the same class are close, (ii) examples from different domains and classes are far, and (iii) training examples can be correctly classified....
[...]
...Akuzawa et al. (2019) extend DANN by considering cases where there exists an statistical dependence between the domain and the class label variables. Albuquerque et al. (2019) extend DANN by considering one-versus-all adversaries that try to predict to which training domain does each of the examples belong to. Li et al. (2018b) employ GANs and the maximum mean discrepancy criteria (Gretton et al., 2012) to align feature distributions across domains. Matsuura and Harada (2019) leverages clustering techniques to learn domaininvariant features even when the separation between training domains is not given. Li et al. (2018c;d) learns a feature transformation φ such that the conditional distributions P (φ(X) | Y d = y) match for all training domains d and label values y. Shankar et al. (2018) use a domain classifier to construct adversarial examples for a label classifier, and use a label classifier to construct adversarial examples for the domain classifier. This results in a label classifier with better domain generalization. Li et al. (2019a) train a robust feature extractor and classifier. The robustness comes from (i) asking the feature extractor to produce features such that a classifier trained on domain d can classify instances for domain d′ 6= d, and (ii) asking the classifier to predict labels on domain d using features produced by a feature extractor trained on domain d′ 6= d. Li et al. (2020) adopt a lifelong learning strategy to attack the problem of domain generalization. Motiian et al. (2017) learn a feature representation such that (i) examples from different domains but the same class are close, (ii) examples from different domains and classes are far, and (iii) training examples can be correctly classified. Ilse et al. (2019) train a variational autoencoder (Kingma and Welling, 2014) where the bottleneck representation factorizes knowledge about domain, class label, and residual variations in the input space....
[...]
400 citations
Cites background or methods from "Invariant Risk Minimization"
...Arjovsky et al. (2019) propose an extension of that work, called Invariant Risk Minimization (IRM), with the goal of learning a data representation that does not rely on spurious correlations....
[...]
...Arjovsky et al. (2019) construct a binary classification problem (with 0-4 and 5-9 each collapsed into a single class) based on the MNIST dataset, using color as a spurious fea- ture....
[...]
...…(Engstrom et al., 2019; Jacobsen et al., 2018) and non-adversarial (Hendrycks & Dietterich, 2019; Yin et al., 2019) robustness, causality (Arjovsky et al., 2019), and other works aimed at distinguishing statistical features from semantic features (Gowal et al., 2019; Geirhos et al.,…...
[...]
...Arjovsky et al. (2019) propose an extension of this work, called Invariant Risk Minimization (IRM), in order to learn a data representation that does not rely on spurious correlations....
[...]
...In Section C, we provide results on the synthetic structural equation models from Arjovsky et al. (2019)....
[...]
374 citations
Cites background from "Invariant Risk Minimization"
...In particular, concerns regarding “spurious correlations” and “shortcut learning” in trained models are now widespread (e.g., Geirhos et al., 2020; Arjovsky et al., 2019)....
[...]
...In this context, Peters et al. (2016); Heinze-Deml et al. (2018); Arjovsky et al. (2019); Magliacane et al. (2018) propose approaches to overcome this structural bias, often by using data collected in multiple environments to identify causal invariances....
[...]
...We call these structural failure modes, because they are often diagnosed as a misalignment between the predictor learned by empirical risk minimization and the causal structure of the desired predictor (Schölkopf, 2019; Arjovsky et al., 2019)....
[...]
...In such cases, the iid-optimal predictors must necessarily incorporate spurious associations (Caruana et al., 2015; Arjovsky et al., 2019; Ilyas et al., 2019)....
[...]
311 citations
233 citations
References
29,480 citations
"Invariant Risk Minimization" refers background in this paper
...Also, there are problems where we predict parts of the input from other parts of the input, like in self-supervised learning [14]....
[...]
26,531 citations
"Invariant Risk Minimization" refers background in this paper
...Because most machine learning algorithms depend on the assumption that training and testing data are sampled independently from the same distribution [51], it is common practice to shuffle at random the training and testing examples....
[...]
12,606 citations
"Invariant Risk Minimization" refers background or methods in this paper
...A Structural Equation Model (SEM) C := (S, N) governing the random vector X = (X1, . . . , Xd) is a set of structural equations: Si : Xi ← fi(Pa(Xi), Ni), where Pa(Xi) ⊆ {X1, . . . , Xd} \ {Xi} are called the parents of Xi, and the Ni are independent noise random variables....
[...]
...Third, in some cases the features X will not be directly observed, but only a scrambled version X ·S. Figure 3 summarizes the SEM generating the data (Xe, Y e) for all environments e in these experiments....
[...]
...An intervention e on C consists of replacing one or several of its structural equations to obtain an intervened SEM Ce = (Se, Ne), with structural equations: Sei : X e i ← fei (Pae(Xei ), Nei ), The variable Xe is intervened if Si 6= Sei or Ni 6= Nei ....
[...]
...We begin by assuming that the data from all the environments share the same underlying Structural Equation Model, or SEM [55, 39]:...
[...]
...Consider a SEM C = (S, N)....
[...]
8,377 citations
Additional excerpts
...Rubin’s ignorability [44] plays the same role....
[...]
3,051 citations