Analysis of Representations for Domain Adaptation
Citations
18,616 citations
Additional excerpts
...5...
[...]
4,862 citations
3,792 citations
Cites background from "Analysis of Representations for Dom..."
...(2009, Section 2) show the MMD minimizes the expected risk of a classifier with linear loss on the samples X and Y , and Ben-David et al. (2007, Section 4) use the error of a hyperplane classifier to approximate the A-distance between distributions (Kifer et al., 2004). Reid and Williamson (2011) provide further discussion and examples....
[...]
3,351 citations
Cites methods from "Analysis of Representations for Dom..."
...We provide an analysis of the expected target-domain risk of our approach, making use of the theory of domain transfer (Ben-David et al., 2007; 2010; Mansour et al., 2009) and the theory of kernel embedding of probability distributions (Sriperumbudur et al....
[...]
3,195 citations
References
26,531 citations
Additional excerpts
...ǫT (h) ≤ λT + PrDT [Zh∆Zh∗ ] ≤ λT + PrDS [Zh∆Zh∗ ] + |PrDS [Zh∆Zh∗ ] − PrDT [Zh∆Zh∗ ]| ≤ λT + PrDS [Zh∆Zh∗ ] + dH(D̃S , D̃T ) ≤ λT + λS + ǫS(h) + dH(D̃S , D̃T ) ≤ λ + ǫS(h) + dH(D̃S , D̃T ) The theorem now follows by a standard application Vapnik-Chervonenkis theory [14] to bound the true ǫS(h) by its empirical estimatêǫS(h)....
[...]
9,295 citations
1,672 citations
"Analysis of Representations for Dom..." refers background or methods in this paper
...For PoS tagging, the original feature space consists of high-dimensional, sparse binary vectors [6]....
[...]
...We show experimentally that the heuristic choices made by the recently proposed structural correspondence learning algorithm [6] do lead to lower values of the relevant quantities in our theoretical analysis, providing insight as to why this algorithm achieves its empirical success....
[...]
...Indeed recent empirical work in natural language processing [11, 6] has been targeted at exactly this setting....
[...]
...However, the assumption does not hold for domain adaptation [5, 7, 13, 6]....
[...]
...Section 5 shows how the bound behaves for the structural correspondence learning representation [6] on natural language data....
[...]
1,182 citations
"Analysis of Representations for Dom..." refers methods in this paper
...We minimize a modified Huber loss using stochastic gradient descent, described more completely in [15]....
[...]
883 citations
"Analysis of Representations for Dom..." refers background or methods in this paper
...The relevant distributional divergence term can be written as the Adistance of Kiferet al [9]....
[...]
...[9] show that the A-distance can be approximated arbitrarily well with increasing sample size....
[...]
...2 from [9], we can state a computable bound for the error on the target domain....
[...]
...We chose theA-distance, however, precisely because we can measure this from finite samples from the distrbutions D̃S andD̃T [9]....
[...]
...Unfortunately the variational distance between real-valued distributions cannot be computed from finite samples [2, 9] and therefore is not useful to us when investigating representations for domain adaptation on real-world data....
[...]