Author

# Yong Yu

Other affiliations: University of Illinois at Urbana–Champaign, University of Texas at Austin, HEC Montréal ...read more

Bio: Yong Yu is an academic researcher from Shanghai Jiao Tong University. The author has contributed to research in topics: Reinforcement learning & Semantic Web. The author has an hindex of 78, co-authored 523 publications receiving 26956 citations. Previous affiliations of Yong Yu include University of Illinois at Urbana–Champaign & University of Texas at Austin.

##### Papers published on a yearly basis

##### Papers

More filters

••

Shanghai Jiao Tong University

^{1}, Peking University^{2}, National University of Singapore^{3}, Columbia University^{4}, Microsoft^{5}TL;DR: It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, it is proved that under certain conditions LRR can exactly recover the row space of the original data.

Abstract: In this paper, we address the subspace clustering problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to cluster the samples into their respective subspaces and remove possible outliers as well. To this end, we propose a novel objective function named Low-Rank Representation (LRR), which seeks the lowest rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, we prove that LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for data corrupted by arbitrary sparse errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace clustering and error correction in an efficient and effective way.

3,085 citations

•

13 Feb 2017TL;DR: SeqGAN as mentioned in this paper models the data generator as a stochastic policy in reinforcement learning (RL), and the RL reward signal comes from the discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search.

Abstract: As a new way of training generative models, Generative Adversarial
Net (GAN) that uses a discriminative model to guide
the training of the generative model has enjoyed considerable
success in generating real-valued data. However, it has limitations
when the goal is for generating sequences of discrete
tokens. A major reason lies in that the discrete outputs from
the generative model make it difficult to pass the gradient update
from the discriminative model to the generative model.
Also, the discriminative model can only assess a complete
sequence, while for a partially generated sequence, it is nontrivial
to balance its current score and the future one once
the entire sequence has been generated. In this paper, we propose
a sequence generation framework, called SeqGAN, to
solve the problems. Modeling the data generator as a stochastic
policy in reinforcement learning (RL), SeqGAN bypasses
the generator differentiation problem by directly performing
gradient policy update. The RL reward signal comes from
the GAN discriminator judged on a complete sequence, and
is passed back to the intermediate state-action steps using
Monte Carlo search. Extensive experiments on synthetic data
and real-world tasks demonstrate significant improvements
over strong baselines.

1,869 citations

•

21 Jun 2010

TL;DR: Both theoretical and experimental results show that low-rank representation is a promising tool for subspace segmentation from corrupted data.

Abstract: We propose low-rank representation (LRR) to segment data drawn from a union of multiple linear (or affine) subspaces. Given a set of data vectors, LRR seeks the lowest-rank representation among all the candidates that represent all vectors as the linear combination of the bases in a dictionary. Unlike the well-known sparse representation (SR), which computes the sparsest representation of each data vector individually, LRR aims at finding the lowest-rank representation of a collection of vectors jointly. LRR better captures the global structure of data, giving a more effective tool for robust subspace segmentation from corrupted data. Both theoretical and experimental results show that LRR is a promising tool for subspace segmentation.

1,542 citations

••

20 Jun 2007TL;DR: In this paper, the authors proposed a transfer learning framework called TrAdaBoost, which allows users to utilize a small amount of newly labeled data to leverage the old data to construct a high-quality classification model for the new data.

Abstract: Traditional machine learning makes a basic assumption: the training and test data should be under the same distribution. However, in many cases, this identical-distribution assumption does not hold. The assumption might be violated when a task from one new domain comes, while there are only labeled data from a similar old domain. Labeling the new data can be costly and it would also be a waste to throw away all the old data. In this paper, we present a novel transfer learning framework called TrAdaBoost, which extends boosting-based learning algorithms (Freund & Schapire, 1997). TrAdaBoost allows users to utilize a small amount of newly labeled data to leverage the old data to construct a high-quality classification model for the new data. We show that this method can allow us to learn an accurate model using only a tiny amount of new data and a large amount of old data, even when the new data are not sufficient to train a model alone. We show that TrAdaBoost allows knowledge to be effectively transferred from the old data to the new. The effectiveness of our algorithm is analyzed theoretically and empirically to show that our iterative algorithm can converge well to an accurate model.

1,509 citations

••

15 Aug 2005

TL;DR: In this paper, clusters generated from the training data provide the basis for data smoothing and neighborhood selection and show that the new proposed approach consistently outperforms other state-of-art collaborative filtering algorithms.

Abstract: Memory-based approaches for collaborative filtering identify the similarity between two users by comparing their ratings on a set of items. In the past, the memory-based approach has been shown to suffer from two fundamental problems: data sparsity and difficulty in scalability. Alternatively, the model-based approach has been proposed to alleviate these problems, but this approach tends to limit the range of users. In this paper, we present a novel approach that combines the advantages of these two approaches by introducing a smoothing-based method. In our approach, clusters generated from the training data provide the basis for data smoothing and neighborhood selection. As a result, we provide higher accuracy as well as increased efficiency in recommendations. Empirical studies on two datasets (EachMovie and MovieLens) show that our new proposed approach consistently outperforms other state-of-art collaborative filtering algorithms.

706 citations

##### Cited by

More filters

•

28,685 citations

••

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.

Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

18,616 citations

••

13 Aug 2016TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.

Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

14,872 citations

••

TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.

Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

13,333 citations

••

[...]

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations