scispace - formally typeset
Search or ask a question
Author

Yuanlong Shao

Other affiliations: Ohio State University
Bio: Yuanlong Shao is an academic researcher from Zhejiang University. The author has contributed to research in topics: Automatic image annotation & Semi-supervised learning. The author has an hindex of 5, co-authored 8 publications receiving 269 citations. Previous affiliations of Yuanlong Shao include Ohio State University.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper introduces a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized Gaussian Mixture Model (LapGMM), which is modeled by a nearest neighbor graph, and the graph structure is incorporated in the maximum likelihood objective function.
Abstract: Gaussian Mixture Models (GMMs) are among the most statistically mature methods for clustering. Each cluster is represented by a Gaussian distribution. The clustering process thereby turns to estimate the parameters of the Gaussian mixture, usually by the Expectation-Maximization algorithm. In this paper, we consider the case where the probability distribution that generates the data is supported on a submanifold of the ambient space. It is natural to assume that if two points are close in the intrinsic geometry of the probability distribution, then their conditional probability distributions are similar. Specifically, we introduce a regularized probabilistic model based on manifold structure for data clustering, called Laplacian regularized Gaussian Mixture Model (LapGMM). The data manifold is modeled by a nearest neighbor graph, and the graph structure is incorporated in the maximum likelihood objective function. As a result, the obtained conditional probability distribution varies smoothly along the geodesics of the data manifold. Experimental results on real data sets demonstrate the effectiveness of the proposed approach.

209 citations

Journal ArticleDOI
Guofeng Zhang1, Wei Hua1, Xueying Qin1, Yuanlong Shao1, Hujun Bao1 
TL;DR: This paper presents a novel approach to stabilize video sequences based on a 3D perspective camera model that uses approximate geometry representation and analyze the resulting warping errors to show that by appropriately constraining warping error, visually plausible results can be achieved even using planar structures.
Abstract: This paper presents a novel approach to stabilize video sequences based on a 3D perspective camera model. Compared to previous methods which are based on simplified models, our stabilization system can work in situations where significant depth variations exist in the scenes and the camera undergoes large translational movement. We formulate the stabilization problem as a quadratic cost function on smoothness and similarity constraints. This allows us to precisely control the smoothness by solving a sparse linear system of equations. By taking advantage of the sparseness, our optimization process is very efficient. Instead of recovering dense depths, we use approximate geometry representation and analyze the resulting warping errors. We show that by appropriately constraining warping error, visually plausible results can be achieved even using planar structures. A variety of experiments have been implemented, which demonstrates the robustness and efficiency of our approach.

77 citations

Proceedings ArticleDOI
Yuanlong Shao1, Yuan Zhou1, Xiaofei He1, Deng Cai1, Hujun Bao1 
19 Oct 2009
TL;DR: A novel technique for semi-supervised image annotation is proposed which introduces a harmonic regularizer based on the graph Laplacian of the data into the probabilistic semantic model for learning latent topics of the images.
Abstract: We propose a novel technique for semi-supervised image annotation which introduces a harmonic regularizer based on the graph Laplacian of the data into the probabilistic semantic model for learning latent topics of the images. By using a probabilistic semantic model, we connect visual features and textual annotations of images by their latent topics. Meanwhile, we incorporate the manifold assumption into the model to say that the probabilities of latent topics of images are drawn from a manifold, so that for images sharing similar visual features or the same annotations, their probability distribution of latent topics should also be similar. We create a nearest neighbor graph to model the manifold and propose a regularized EM algorithm to simultaneously learn a generative model and assign probability density of latent topics to images discriminatively. In this way, databases with very few labeled images can be annotated better than previous works.

16 citations

Posted Content
TL;DR: A lifelong learning system that leverages a fast, though non-differentiable, content-addressable memory which can be exploited to encode both a long history of sequential episodic knowledge and semantic knowledge over many episodes for an unbounded number of domains is described.
Abstract: The long-term memory of most connectionist systems lies entirely in the weights of the system. Since the number of weights is typically fixed, this bounds the total amount of knowledge that can be learned and stored. Though this is not normally a problem for a neural network designed for a specific task, such a bound is undesirable for a system that continually learns over an open range of domains. To address this, we describe a lifelong learning system that leverages a fast, though non-differentiable, content-addressable memory which can be exploited to encode both a long history of sequential episodic knowledge and semantic knowledge over many episodes for an unbounded number of domains. This opens the door for investigation into transfer learning, and leveraging prior knowledge that has been learned over a lifetime of experiences to new domains.

11 citations

Journal ArticleDOI
TL;DR: The graph regularization approaches are extended to a more general case where the regularization is imposed on the factorized variational distributions, instead of posterior distributions implicitly involved in EM-like algorithms.
Abstract: Image annotation is a typical area where there are multiple types of attributes associated with each individual image. In order to achieve better performance, it is important to develop effective modeling by utilizing prior knowledge. In this article, we extend the graph regularization approaches to a more general case where the regularization is imposed on the factorized variational distributions, instead of posterior distributions implicitly involved in EM-like algorithms. In this way, the problem modeling can be more flexible, and we can choose any factor in the problem domain to impose graph regularization wherever there are similarity constraints among the instances. We formulate the problem formally and show its geometrical background in manifold learning. We also design two practically effective algorithms and analyze their properties such as the convergence. Finally, we apply our approach to image annotation and show the performance improvement of our algorithm.

6 citations


Cited by
More filters
Posted Content
TL;DR: This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.
Abstract: When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.

1,037 citations

Posted Content
TL;DR: In this article, the authors proposed Variational Deep Embedding (VaDE), a novel unsupervised generative clustering approach within the framework of Variational Auto-Encoder (VAE).
Abstract: Clustering is among the most fundamental tasks in computer vision and machine learning. In this paper, we propose Variational Deep Embedding (VaDE), a novel unsupervised generative clustering approach within the framework of Variational Auto-Encoder (VAE). Specifically, VaDE models the data generative procedure with a Gaussian Mixture Model (GMM) and a deep neural network (DNN): 1) the GMM picks a cluster; 2) from which a latent embedding is generated; 3) then the DNN decodes the latent embedding into observables. Inference in VaDE is done in a variational way: a different DNN is used to encode observables to latent embeddings, so that the evidence lower bound (ELBO) can be optimized using Stochastic Gradient Variational Bayes (SGVB) estimator and the reparameterization trick. Quantitative comparisons with strong baselines are included in this paper, and experimental results show that VaDE significantly outperforms the state-of-the-art clustering methods on 4 benchmarks from various modalities. Moreover, by VaDE's generative nature, we show its capability of generating highly realistic samples for any specified cluster, without using supervised information during training. Lastly, VaDE is a flexible and extensible framework for unsupervised generative clustering, more general mixture models than GMM can be easily plugged in.

401 citations

Journal ArticleDOI
TL;DR: The proposed general Laplacian regularized low-rank representation framework for data representation takes advantage of the graph regularizer and can represent the global low-dimensional structures, but also capture the intrinsic non-linear geometric information in data.
Abstract: Low-rank representation (LRR) has recently attracted a great deal of attention due to its pleasing efficacy in exploring low-dimensional subspace structures embedded in data. For a given set of observed data corrupted with sparse errors, LRR aims at learning a lowest-rank representation of all data jointly. LRR has broad applications in pattern recognition, computer vision and signal processing. In the real world, data often reside on low-dimensional manifolds embedded in a high-dimensional ambient space. However, the LRR method does not take into account the non-linear geometric structures within data, thus the locality and similarity information among data may be missing in the learning process. To improve LRR in this regard, we propose a general Laplacian regularized low-rank representation framework for data representation where a hypergraph Laplacian regularizer can be readily introduced into, i.e., a Non-negative Sparse Hyper-Laplacian regularized LRR model (NSHLRR). By taking advantage of the graph regularizer, our proposed method not only can represent the global low-dimensional structures, but also capture the intrinsic non-linear geometric information in data. The extensive experimental results on image clustering, semi-supervised image classification and dimensionality reduction tasks demonstrate the effectiveness of the proposed method.

367 citations

Proceedings ArticleDOI
06 Nov 2017
TL;DR: This paper studies an important yet largely under-explored setting of graph embedding, i.e., embedding communities instead of each individual nodes, and proposes a novel community embedding framework that jointly solves the three tasks together.
Abstract: In this paper, we study an important yet largely under-explored setting of graph embedding, i.e., embedding communities instead of each individual nodes. We find that community embedding is not only useful for community-level applications such as graph visualization, but also beneficial to both community detection and node classification. To learn such embedding, our insight hinges upon a closed loop among community embedding, community detection and node embedding. On the one hand, node embedding can help improve community detection, which outputs good communities for fitting better community embedding. On the other hand, community embedding can be used to optimize the node embedding by introducing a community-aware high-order proximity. Guided by this insight, we propose a novel community embedding framework that jointly solves the three tasks together. We evaluate such a framework on multiple real-world datasets, and show that it improves graph visualization and outperforms state-of-the-art baselines in various application tasks, e.g., community detection and node classification.

345 citations

Journal ArticleDOI
TL;DR: This article focuses on the problem of transforming a set of input 2D motion trajectories so that they are both smooth and resemble visually plausible views of the imaged scene, and offers the first method that both achieves high-quality video stabilization and is practical enough for consumer applications.
Abstract: We present a robust and efficient approach to video stabilization that achieves high-quality camera motion for a wide range of videos. In this article, we focus on the problem of transforming a set of input 2D motion trajectories so that they are both smooth and resemble visually plausible views of the imaged scene; our key insight is that we can achieve this goal by enforcing subspace constraints on feature trajectories while smoothing them. Our approach assembles tracked features in the video into a trajectory matrix, factors it into two low-rank matrices, and performs filtering or curve fitting in a low-dimensional linear space. In order to process long videos, we propose a moving factorization that is both efficient and streamable. Our experiments confirm that our approach can efficiently provide stabilization results comparable with prior 3D methods in cases where those methods succeed, but also provides smooth camera motions in cases where such approaches often fail, such as videos that lack parallax. The presented approach offers the first method that both achieves high-quality video stabilization and is practical enough for consumer applications.

318 citations