Semi-Supervised Learning Literature Survey
Citations
18,616 citations
Cites background from "Semi-Supervised Learning Literature..."
...However, many machine learning methods work well only under a common assumption: the training and test data are drawn from the same feature space and the same distribution....
[...]
6,320 citations
Cites background from "Semi-Supervised Learning Literature..."
...The key idea of semisupervised learning is to exploit the unlabeled examples by using the labeled examples to modify, refine, or reprioritize the hypothesis obtained from the labeled data alone [135]....
[...]
5,227 citations
Cites background from "Semi-Supervised Learning Literature..."
...Zhu (2005a) reports that annotation at the word level can take ten times longer than the actual audio (e.g., one minute of speech takes ten minutes to label), and annotating phonemes can take 400 times as long (e.g., nearly seven hours)....
[...]
...Active learning and semi-supervised learning (for a good introduction, see Zhu, 2005b) both traffic in making the most out of unlabeled data....
[...]
2,194 citations
Cites background from "Semi-Supervised Learning Literature..."
...Existing generative approaches based on models such as Gaussian mixture or hidden Markov models (Zhu, 2006), have not been very successful due to the need for a large number of mixtures components or states to perform well....
[...]
...Existing generative approaches based on models such as Gaussian mixture or hidden Markov models (Zhu, 2006), have not been very successful due to the limited capacity and the need for many states to perform well....
[...]
1,913 citations
Cites background from "Semi-Supervised Learning Literature..."
...For further readings on these and other semi-supervised learning topics, there is a book collection from a machine learning perspective [37], a survey article with up-to-date papers [208], a book written for computational linguists [1], and a technical report [151]....
[...]
References
30,570 citations
"Semi-Supervised Learning Literature..." refers background or methods in this paper
...Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is one step further. It assumes the topic proportion of each document is drawn from a Dirichlet dis tribution. With variational approximation, each document is represented by a pos terior Dirichlet over the topics. This is a much lower dimensional representation. Gr iffiths et al. (2005) extend LDA model to ‘HMM-LDA’ which uses both shortterm syntactic and long-term topical dependencies, as an effort to integrate s emantics and syntax. Li and McCallum (2005) apply the HMM-LDA model to obtain word clusters, as a rudimentary way for semi-supervised learning on sequenc es. Some algorithms derive a metric entirely from the density of U . These are motivated by unsupervised clustering and based on the intuition that data points in the same high density ‘clump’ should be close in the new metric. For instance, if U is generated from a single Gaussian, then the Mahalanobis distance induce d by the covariance matrix is such a metric. Tipping (1999) generalizes the Mahalano bis distance by fittingU with a mixture of Gaussian, and define a Riemannian manifold with metric atx being the weighted average of individual component inverse covariance. The distance between x1 andx2 is computed along the straight line (in Euclidean space) between the two points. Rattray (2000) further genera lizes the metric so that it only depends on the change in log probabilities of the density, n ot on a particular Gaussian mixture assumption....
[...]
...Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is one step further. It assumes the topic proportion of each document is drawn from a Dirichlet dis tribution. With variational approximation, each document is represented by a pos terior Dirichlet over the topics. This is a much lower dimensional representation. Gr iffiths et al. (2005) extend LDA model to ‘HMM-LDA’ which uses both shortterm syntactic and long-term topical dependencies, as an effort to integrate s emantics and syntax....
[...]
...Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is one step further. It assumes the topic proportion of each document is drawn from a Dirichlet dis tribution. With variational approximation, each document is represented by a pos terior Dirichlet over the topics. This is a much lower dimensional representation. Gr iffiths et al. (2005) extend LDA model to ‘HMM-LDA’ which uses both shortterm syntactic and long-term topical dependencies, as an effort to integrate s emantics and syntax. Li and McCallum (2005) apply the HMM-LDA model to obtain word clusters, as a rudimentary way for semi-supervised learning on sequenc es. Some algorithms derive a metric entirely from the density of U . These are motivated by unsupervised clustering and based on the intuition that data points in the same high density ‘clump’ should be close in the new metric. For instance, if U is generated from a single Gaussian, then the Mahalanobis distance induce d by the covariance matrix is such a metric. Tipping (1999) generalizes the Mahalano bis distance by fittingU with a mixture of Gaussian, and define a Riemannian manifold with metric atx being the weighted average of individual component inverse covariance. The distance between x1 andx2 is computed along the straight line (in Euclidean space) between the two points. Rattray (2000) further genera lizes the metric so that it only depends on the change in log probabilities of the density, n ot on a particular Gaussian mixture assumption. And the distance is computed along a curve that minimizes the distance. The new metric is invariant to linear transfor mation of the features, and connected regions of relatively homogeneous d nsity in U will be close to each other. Such metric is attractive, yet it depends on the homogeneity of the initial Euclidean space. Their application in semi-supervise d learning needs further investigation. Sajama and Orlitsky (2005) analyze th e lower and upper bounds on estimating data-density-based distance....
[...]
...Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is one step further. It assumes the topic proportion of each document is drawn from a Dirichlet dis tribution. With variational approximation, each document is represented by a pos terior Dirichlet over the topics. This is a much lower dimensional representation. Gr iffiths et al. (2005) extend LDA model to ‘HMM-LDA’ which uses both shortterm syntactic and long-term topical dependencies, as an effort to integrate s emantics and syntax. Li and McCallum (2005) apply the HMM-LDA model to obtain word clusters, as a rudimentary way for semi-supervised learning on sequenc es. Some algorithms derive a metric entirely from the density of U . These are motivated by unsupervised clustering and based on the intuition that data points in the same high density ‘clump’ should be close in the new metric. For instance, if U is generated from a single Gaussian, then the Mahalanobis distance induce d by the covariance matrix is such a metric. Tipping (1999) generalizes the Mahalano bis distance by fittingU with a mixture of Gaussian, and define a Riemannian manifold with metric atx being the weighted average of individual component inverse covariance....
[...]
...Latent Dirichlet Allocation (LDA) (Blei et al., 2003) is one step further. It assumes the topic proportion of each document is drawn from a Dirichlet dis tribution. With variational approximation, each document is represented by a pos terior Dirichlet over the topics. This is a much lower dimensional representation. Gr iffiths et al. (2005) extend LDA model to ‘HMM-LDA’ which uses both shortterm syntactic and long-term topical dependencies, as an effort to integrate s emantics and syntax. Li and McCallum (2005) apply the HMM-LDA model to obtain word clusters, as a rudimentary way for semi-supervised learning on sequenc es....
[...]
26,531 citations
"Semi-Supervised Learning Literature..." refers background in this paper
...The decision boundary has the smallest generalization error bound on unlabeled data (Vapnik, 1998)....
[...]
...The name TSVM originates from the intention to work only on the observed data (though people use them for induction anyway), which according to (Vapnik, 1998) is solving a simpler problem....
[...]
25,546 citations
21,819 citations