A theory of learning from different domains
Shai Ben-David,John Blitzer,Koby Crammer,Alex Kulesza,Fernando Pereira,Jennifer Wortman Vaughan +5 more
TLDR
A classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains and shows how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class.Abstract:
Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time?
We address the first question by bounding a classifier's target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier.
We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors.read more
Citations
More filters
Book ChapterDOI
Domain-adversarial training of neural networks
Yaroslav Ganin,Evgeniya Ustinova,Hana Ajakan,Pascal Germain,Hugo Larochelle,François Laviolette,Mario Marchand,Victor Lempitsky +7 more
TL;DR: In this article, a new representation learning approach for domain adaptation is proposed, in which data at training and test time come from similar but different distributions, and features that cannot discriminate between the training (source) and test (target) domains are used to promote the emergence of features that are discriminative for the main learning task on the source domain.
Posted Content
Learning Transferable Features with Deep Adaptation Networks
TL;DR: A new Deep Adaptation Network (DAN) architecture is proposed, which generalizes deep convolutional neural network to the domain adaptation scenario and can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding.
Posted Content
Unsupervised Domain Adaptation by Backpropagation
Yaroslav Ganin,Victor Lempitsky +1 more
TL;DR: In this paper, a gradient reversal layer is proposed to promote the emergence of deep features that are discriminative for the main learning task on the source domain and invariant with respect to the shift between the domains.
Proceedings Article
Unsupervised Domain Adaptation by Backpropagation
Yaroslav Ganin,Victor Lempitsky +1 more
TL;DR: The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets.
Proceedings Article
Deep transfer learning with joint adaptation networks
TL;DR: JAN as mentioned in this paper aligns the joint distributions of multiple domain-specific layers across domains based on a joint maximum mean discrepancy (JMMD) criterion to make the distributions of the source and target domains more distinguishable.
References
More filters
Statistical learning theory
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Journal ArticleDOI
Sample Selection Bias as a Specification Error
TL;DR: In this article, the bias that results from using non-randomly selected samples to estimate behavioral relationships as an ordinary specification error or "omitted variables" bias is discussed, and the asymptotic distribution of the estimator is derived.
Thumbs up? Sentiment Classiflcation using Machine Learning Techniques
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Proceedings ArticleDOI
Thumbs up? Sentiment Classification using Machine Learning Techniques
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Posted Content
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
TL;DR: A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.