Showing papers by "Zhi-Hua Zhou published in 2010"

PDF

Open Access

Journal Article•DOI•

Understanding bag-of-words model: A statistical framework

[...]

Yin Zhang¹, Rong Jin², Zhi-Hua Zhou¹•Institutions (2)

Nanjing University¹, Michigan State University²

28 Aug 2010-International Journal of Machine Learning and Cybernetics

TL;DR: A statistical framework which generalizes the bag-of-words representation, in which the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method.

...read moreread less

Abstract: The bag-of-words model is one of the most popular representation methods for object categorization. The key idea is to quantize each extracted key point into one of visual words, and then represent each image by a histogram of the visual words. For this purpose, a clustering algorithm (e.g., K-means), is generally used for generating the visual words. Although a number of studies have shown encouraging results of the bag-of-words representation for object categorization, theoretical studies on properties of the bag-of-words model is almost untouched, possibly due to the difficulty introduced by using a heuristic clustering process. In this paper, we present a statistical framework which generalizes the bag-of-words representation. In this framework, the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method. A theoretical analysis based on statistical consistency is presented for the proposed framework. Moreover, based on the framework we developed two algorithms which do not rely on clustering, while achieving competitive performance in object categorization when compared to clustering-based bag-of-words representations.

...read moreread less

923 citations

Proceedings Article•

Active Learning by Querying Informative and Representative Examples

[...]

Sheng-Jun Huang¹, Rong Jin², Zhi-Hua Zhou¹•Institutions (2)

Nanjing University¹, Michigan State University²

06 Dec 2010

TL;DR: The proposed QUIRE approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance by incorporating the correlation among labels and is extended to multi-label learning by actively querying instance-label pairs.

...read moreread less

Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of -the-art active learning approaches.

...read moreread less

518 citations

Journal Article•DOI•

Semi-supervised learning by disagreement

[...]

Zhi-Hua Zhou¹, Ming Li¹•Institutions (1)

Nanjing University¹

01 Sep 2010-Knowledge and Information Systems

TL;DR: An introduction to research advances in disagreement-based semi-supervised learning is provided, where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi- supervised learning process.

...read moreread less

Abstract: In many real-world tasks, there are abundant unlabeled examples but the number of labeled training examples is limited, because labeling the examples requires human efforts and expertise. So, semi-supervised learning which tries to exploit unlabeled examples to improve learning performance has become a hot topic. Disagreement-based semi-supervised learning is an interesting paradigm, where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi-supervised learning process. This survey article provides an introduction to research advances in this paradigm.

...read moreread less

373 citations

Journal Article•DOI•

Multilabel dimensionality reduction via dependence maximization

[...]

Yin Zhang¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

22 Oct 2010-ACM Transactions on Knowledge Discovery From Data

TL;DR: Zhang et al. as mentioned in this paper proposed a multilabel dimensionality reduction method, MDDM, with two kinds of projection strategies, attempting to project the original data into a lower-dimensional feature space maximizing the dependence between the original feature description and the associated class labels.

...read moreread less

Abstract: Multilabel learning deals with data associated with multiple labels simultaneously. Like other data mining and machine learning tasks, multilabel learning also suffers from the curse of dimensionality. Dimensionality reduction has been studied for many years, however, multilabel dimensionality reduction remains almost untouched. In this article, we propose a multilabel dimensionality reduction method, MDDM, with two kinds of projection strategies, attempting to project the original data into a lower-dimensional feature space maximizing the dependence between the original feature description and the associated class labels. Based on the Hilbert-Schmidt Independence Criterion, we derive a eigen-decomposition problem which enables the dimensionality reduction process to be efficient. Experiments validate the performance of MDDM.

...read moreread less

346 citations

Journal Article•DOI•

On multi‐class cost‐sensitive learning

[...]

Zhi-Hua Zhou¹, Xu-Ying Liu¹•Institutions (1)

Nanjing University¹

01 Aug 2010

TL;DR: In this article, the rescaling approach works well when the costs are consistent, while directly applying it to multi-class problems with inconsistent costs may not be a good choice based on this recognition.

...read moreread less

Abstract: Rescaling is possibly the most popular approach to cost-sensitive learning This approach works by rebalancing the classes according to their costs, and it can be realized in different ways, for example, re-weighting or resampling the training examples in proportion to their costs, moving the decision boundaries of classifiers faraway from high-cost classes in proportion to costs, etc This approach is very effective in dealing with two-class problems, yet some studies showed that it is often not so helpful on multi-class problems In this article, we try to explore why the rescaling approach is often helpless on multi-class problems Our analysis discloses that the rescaling approach works well when the costs are consistent, while directly applying it to multi-class problems with inconsistent costs may not be a good choice Based on this recognition, we advocate that before applying the rescaling approach, the consistency of the costs must be examined at first If the costs are consistent, the rescaling approach can be conducted directly; otherwise it is better to apply rescaling after decomposing the multi-class problem into a series of two-class problems An empirical study involving 20 multi-class data sets and seven types of cost-sensitive learners validates our proposal Moreover, we show that the proposal is also helpful for class-imbalance learning

...read moreread less

234 citations

Proceedings Article•

Multi-label learning with weak label

[...]

Yuyin Sun¹, Yin Zhang¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

11 Jul 2010

TL;DR: The WELL (WEak Label Learning) method is proposed, which considers that the classification boundary for each label should go across low density regions, and that each label generally has much smaller number of positive examples than negative examples.

...read moreread less

Abstract: Multi-label learning deals with data associated with multiple labels simultaneously. Previous work on multi-label learning assumes that for each instance, the "full" label set associated with each training instance is given by users. In many applications, however, to get the full label set for each instance is difficult and only a "partial" set of labels is available. In such cases, the appearance of a label means that the instance is associated with this label, while the absence of a label does not imply that this label is not proper for the instance. We call this kind of problem "weak label" problem. In this paper, we propose the WELL (WEak Label Learning) method to solve the weak label problem. We consider that the classification boundary for each label should go across low density regions, and that each label generally has much smaller number of positive examples than negative examples. The objective is formulated as a convex optimization problem which can be solved efficiently. Moreover, we exploit the correlation between labels by assuming that there is a group of low-rank base similarities, and the appropriate similarities between instances for different labels can be derived from these base similarities. Experiments validate the performance of WELL.

...read moreread less

189 citations

Journal Article•DOI•

Multi-View Video Summarization

[...]

Yanwei Fu¹, Yanwen Guo¹, Yanshu Zhu¹, Feng Liu², Chuan-Ming Song¹, Zhi-Hua Zhou¹ - Show less +2 more•Institutions (2)

Nanjing University¹, University of Wisconsin-Madison²

01 Nov 2010-IEEE Transactions on Multimedia

TL;DR: A spatio-temporal shot graph is constructed and the summarization problem is formulated as a graph labeling task, which encodes the correlations with different attributes among multi-view video shots in hyperedges and generates a result based on shot importance evaluated using a Gaussian entropy fusion scheme.

...read moreread less

Abstract: Previous video summarization studies focused on monocular videos, and the results would not be good if they were applied to multi-view videos directly, due to problems such as the redundancy in multiple views. In this paper, we present a method for summarizing multi-view videos. We construct a spatio-temporal shot graph and formulate the summarization problem as a graph labeling task. The spatio-temporal shot graph is derived from a hypergraph, which encodes the correlations with different attributes among multi-view video shots in hyperedges. We then partition the shot graph and identify clusters of event-centered shots with similar contents via random walks. The summarization result is generated through solving a multi-objective optimization problem based on shot importance evaluated using a Gaussian entropy fusion scheme. Different summarization objectives, such as minimum summary length and maximum information coverage, can be accomplished in the framework. Moreover, multi-level summarization can be achieved easily by configuring the optimization parameters. We also propose the multi-view storyboard and event board for presenting multi-view summaries. The storyboard naturally reflects correlations among multi-view summarized shots that describe the same important event. The event-board serially assembles event-centered multi-view shots in temporal order. Single video summary which facilitates quick browsing of the summarized multi-view video can be easily generated based on the event board representation.

...read moreread less

183 citations

Proceedings Article•

A New Analysis of Co-Training

[...]

Wei Wang¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

21 Jun 2010

TL;DR: In this analysis the co-training process is viewed as a combinative label propagation over two views; this provides a possibility to bring the graph-based and disagreement-based semi-supervised methods into a unified framework.

...read moreread less

Abstract: In this paper, we present a new analysis on co-training, a representative paradigm of disagreement-based semi-supervised learning methods. In our analysis the co-training process is viewed as a combinative label propagation over two views; this provides a possibility to bring the graph-based and disagreement-based semi-supervised methods into a unified framework. With the analysis we get some insight that has not been disclosed by previous theoretical studies. In particular, we provide the sufficient and necessary condition for co-training to succeed. We also discuss the relationship to previous theoretical results and give some other interesting implications of our results, such as combination of weight matrices and view split.

...read moreread less

182 citations

Journal Article•DOI•

Cost-Sensitive Face Recognition

[...]

Yin Zhang¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

01 Oct 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A framework is proposed which formulates the face recognition problem as a multiclass cost-sensitive learning task, and two theoretically sound methods for this task are developed.

...read moreread less

Abstract: Most traditional face recognition systems attempt to achieve a low recognition error rate, implicitly assuming that the losses of all misclassifications are the same. In this paper, we argue that this is far from a reasonable setting because, in almost all application scenarios of face recognition, different kinds of mistakes will lead to different losses. For example, it would be troublesome if a door locker based on a face recognition system misclassified a family member as a stranger such that she/he was not allowed to enter the house, but it would be a much more serious disaster if a stranger was misclassified as a family member and allowed to enter the house. We propose a framework which formulates the face recognition problem as a multiclass cost-sensitive learning task, and develop two theoretically sound methods for this task. Experimental results demonstrate the effectiveness and efficiency of the proposed methods.

...read moreread less

155 citations

Journal Article•DOI•

Social Learning

[...]

Qiang Yang¹, Zhi-Hua Zhou², Wenji Mao³, Wei Li⁴, Nathan Liu¹ - Show less +1 more•Institutions (4)

Hong Kong University of Science and Technology¹, Nanjing University², Chinese Academy of Sciences³, Beihang University⁴

01 Jul 2010-IEEE Intelligent Systems

TL;DR: This special issue gathers the state-of-the-art research in social learning and is devoted to exhibiting some of the best representative works in this area.

...read moreread less

Abstract: In recent years, social behavioral data have been exponentially expanding due to the tremendous success of various outlets on the social Web (aka Web 2.0) such as Facebook, Digg, Twitter, Wikipedia, and Delicious. As a result, there's a need for social learning to support the discovery, analysis, and modeling of human social behavioral data. The goal is to discover social intelligence, which encompasses a spectrum of knowledge that characterizes human interaction, communication, and collaborations. The social Web has thus become a fertile ground for machine learning and data mining research. This special issue gathers the state-of-the-art research in social learning and is devoted to exhibiting some of the best representative works in this area.

...read moreread less

132 citations

Proceedings Article•DOI•

Top-Eye: top-k evolving trajectory outlier detection

[...]

Yong Ge¹, Hui Xiong¹, Zhi-Hua Zhou², Hasan Timucin Ozdemir³, Jannite Yu³, Kuo Chu Lee³ - Show less +2 more•Institutions (3)

Rutgers University¹, Nanjing University², Princeton University³

26 Oct 2010

TL;DR: This paper provides an evolving trajectory outlier detection method, named TOP-EYE, which continuously computes the outlying score for each trajectory in an accumulating way, and introduces a decay function to mitigate the influence of the past trajectories on the evolving outlying scores.

...read moreread less

Abstract: The increasing availability of large-scale location traces creates unprecedent opportunities to change the paradigm for identifying abnormal moving activities. Indeed, various aspects of abnormality of moving patterns have recently been exploited, such as wrong direction and wandering. However, there is no recognized way of combining different aspects into an unified evolving abnormality score which has the ability to capture the evolving nature of abnormal moving trajectories. To that end, in this paper, we provide an evolving trajectory outlier detection method, named TOP-EYE, which continuously computes the outlying score for each trajectory in an accumulating way. Specifically, in TOP-EYE, we introduce a decay function to mitigate the influence of the past trajectories on the evolving outlying score, which is defined based on the evolving moving direction and density of trajectories. This decay function enables the evolving computation of accumulated outlying scores along the trajectories. An advantage of TOP-EYE is to identify evolving outliers at very early stage with relatively low false alarm rate. Finally, experimental results on real-world location traces show that TOP-EYE can effectively capture evolving abnormal trajectories.

...read moreread less

Proceedings Article•

Facial age estimation by learning from label distributions

[...]

Xin Geng¹, Kate Smith-Miles², Zhi-Hua Zhou³•Institutions (3)

Southeast University¹, Monash University², Nanjing University³

11 Jul 2010

TL;DR: Experimental results on two aging face databases show remarkable advantages of the proposed label distribution learning algorithms over the compared single-label learning algorithms, either specially designed for age estimation or for general purpose.

...read moreread less

Abstract: One of the main difficulties in facial age estimation is the lack of sufficient training data for many ages. Fortunately, the faces at close ages look similar since aging is a slow and smooth process. Inspired by this observation, in this paper, instead of considering each face image as an example with one label (age), we regard each face image as an example associated with a label distribution. The label distribution covers a number of class labels, representing the degree that each label describes the example. Through this way, in addition to the real age, one face image can also contribute to the learning of its adjacent ages. We propose an algorithm named IIS-LLD for learning from the label distributions, which is an iterative optimization process based on the maximum entropy model. Experimental results show the advantages of IIS-LLD over the traditional learning methods based on single-labeled data.

...read moreread less

Book Chapter•DOI•

On detecting clustered anomalies using SCiForest

[...]

Fei Tony Liu¹, Kai Ming Ting¹, Zhi-Hua Zhou²•Institutions (2)

Monash University¹, Nanjing University²

20 Sep 2010

TL;DR: Without using any density or distance measure, a new method is proposed called SCiForest to detect clustered anomalies that maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distance-based and density-based methods.

...read moreread less

Abstract: Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distance-based and density-based methods are inherently restricted by their basic assumptions--anomalies are either far from normal points or being sparse. Clustered anomalies are able to avoid detection since they defy these assumptions by being dense and, in many cases, in close proximity to normal instances. In this paper, without using any density or distance measure, we propose a new method called SCiForest to detect clustered anomalies. SCiForest separates clustered anomalies from normal points effectively even when clustered anomalies are very close to normal points. It maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distance-based and density-based methods.

...read moreread less

Proceedings Article•

Cost-sensitive semi-supervised support vector machine

[...]

Yu-Feng Li¹, James T. Kwok², Zhi-Hua Zhou¹•Institutions (2)

Nanjing University¹, Hong Kong University of Science and Technology²

11 Jul 2010

TL;DR: The CS4VM (Cost-Sensitive Semi-Supervised Support Vector Machine) is proposed, which closely approximates the supervised cost-sensitive SVM that has access to the ground-truth labels of all the unlabeled data.

...read moreread less

Abstract: In this paper, we study cost-sensitive semi-supervised learning where many of the training examples are un-labeled and different misclassification errors are associated with unequal costs. This scenario occurs in many real-world applications. For example, in some disease diagnosis, the cost of erroneously diagnosing a patient as healthy is much higher than that of diagnosing a healthy person as a patient. Also, the acquisition of labeled data requires medical diagnosis which is expensive, while the collection of unlabeled data such as basic health information is much cheaper. We propose the CS4VM (Cost-Sensitive Semi-Supervised Support Vector Machine) to address this problem. We show that the CS4VM, when given the label means of the unlabeled data, closely approximates the supervised cost-sensitive SVM that has access to the ground-truth labels of all the unlabeled data. This observation leads to an efficient algorithm which first estimates the label means and then trains the CS4VM with the plug-in label means by an efficient SVM solver. Experiments on a broad range of data sets show that the proposed method is capable of reducing the total cost and is computationally efficient.

...read moreread less

Book Chapter•DOI•

Multi-information ensemble diversity

[...]

Zhi-Hua Zhou¹, Nan Li²•Institutions (2)

Nanjing University¹, Soochow University (Suzhou)²

07 Apr 2010

TL;DR: The ensemble diversity can be decomposed over the component classifiers constituting the ensemble from the view of multi-information and an approximation is given for estimating the diversity in practice.

...read moreread less

Abstract: Understanding ensemble diversity is one of the most important fundamental issues in ensemble learning. Inspired by a recent work trying to explain ensemble diversity from the information theoretic perspective, in this paper we study the ensemble diversity from the view of multi-information. We show that from this view, the ensemble diversity can be decomposed over the component classifiers constituting the ensemble. Based on this formulation, an approximation is given for estimating the diversity in practice. Experimental results show that our formulation and approximation are promising.

...read moreread less

Journal Article•DOI•

Corrigendum: Corrigendum to Ensembling neural networks: Many could be better than all [Artificial Intelligence 137 (1-2) (2002) 239-263]

[...]

Zhi-Hua Zhou¹, Jianxin Wu¹, Wei Tang¹•Institutions (1)

Nanjing University¹

01 Dec 2010-Artificial Intelligence

TL;DR: The main contributions of the paper—the subset search strategy (GASEN) introduced in Section 3 after Eqs.

...read moreread less

Book Chapter•DOI•

Multi-manifold clustering

[...]

Yong Wang¹, Yuan Jiang², Yi Wu¹, Zhi-Hua Zhou²•Institutions (2)

National University of Defense Technology¹, Nanjing University²

30 Aug 2010

TL;DR: Experimental results show that the performance of the proposed mumCluster approach is encouraging, and many approaches require to know the number of clusters and the intrinsic dimensions of the manifolds in advance, while it is hard for the user to provide such information in practice.

...read moreread less

Abstract: Manifold clustering, which regards clusters as groups of points around compact manifolds, has been realized as a promising generalization of traditional clustering. A number of linear or nonlinear manifold clustering approaches have been developed recently. Although they have attained better performances than traditional clustering methods in many scenarios, most of these approaches suffer from two weaknesses. First, when the data are drawn from hybrid modeling, i.e., some data manifolds are separated but some are intersected, existing approaches could not work well although hybrid modeling often appears in real data. Second, many approaches require to know the number of clusters and the intrinsic dimensions of the manifolds in advance, while it is hard for the user to provide such information in practice. In this paper, we propose a new manifold clustering approach, mumCluster, to address these issues. Experimental results show that the performance of the proposed mumCluster approach is encouraging.

...read moreread less

Journal Article•DOI•

Generalized Low-Rank Approximations of Matrices Revisited

[...]

Jun Liu¹, Songcan Chen¹, Zhi-Hua Zhou¹, Xiaoyang Tan¹•Institutions (1)

Nanjing University¹

01 Apr 2010-IEEE Transactions on Neural Networks

TL;DR: A lower bound of GLRAM's objective function is derived, which answers one open problem raised by Ye (Machine Learning, 2005), and explores when and why GLRAM can perform well in terms of compression, which is a fundamental problem concerning the usability ofGLRAM.

...read moreread less

Abstract: Compared to singular value decomposition (SVD), generalized low-rank approximations of matrices (GLRAM) can consume less computation time, obtain higher compression ratio, and yield competitive classification performance. GLRAM has been successfully applied to applications such as image compression and retrieval, and quite a few extensions have been successively proposed. However, in literature, some basic properties and crucial problems with regard to GLRAM have not been explored or solved yet. For this sake, we revisit GLRAM in this paper. First, we reveal such a close relationship between GLRAM and SVD that GLRAM's objective function is identical to SVD's objective function except the imposed constraints. Second, we derive a lower bound of GLRAM's objective function, and discuss when the lower bound can be touched. Moreover, from the viewpoint of minimizing the lower bound, we answer one open problem raised by Ye (Machine Learning, 2005), i.e., a theoretical justification of the experimental phenomenon that, under given number of reduced dimension, the lowest reconstruction error is obtained when the left and right transformations have equal number of columns. Third, we explore when and why GLRAM can perform well in terms of compression, which is a fundamental problem concerning the usability of GLRAM.

...read moreread less

Proceedings Article•DOI•

Learning with cost intervals

[...]

Xu-Ying Liu¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

25 Jul 2010

TL;DR: The CISVM method, a support vector machine, is proposed, to work with cost interval information and can reduce 60% more risks than the standard cost-sensitive SVM which assumes the expected cost is the true value.

...read moreread less

Abstract: Existing cost-sensitive learning methods require that the unequal misclassification costs should be given as precise values. In many real-world applications, however, it is generally difficult to have a precise cost value since the user maybe only knows that one type of mistake is much more severe than another type, yet it is infeasible to give a precise description. In such situations, it is more meaningful to work with a cost interval instead of a precise cost value. In this paper we report the first study along this direction. We propose the CISVM method, a support vector machine, to work with cost interval information. Experiments show that when there are only cost intervals available, CISVM is significantly superior to standard cost-sensitive SVMs using any of the minimal cost, mean cost and maximal cost to learn. Moreover, considering that in some cases other information about costs can be obtained in addition to cost intervals, such as the distribution of costs, we propose a general approach CODIS for using the distribution information to help improve performance. Experiments show that this approach can reduce 60% more risks than the standard cost-sensitive SVM which assumes the expected cost is the true value.

...read moreread less

Posted Content•

Multi-View Active Learning in the Non-Realizable Case

[...]

Wei Wang¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

31 May 2010-arXiv: Learning

TL;DR: It is proved that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be O(log 1/e), contrasting to single-view setting where the polynomial improvement is the best possible achievement.

...read moreread less

Abstract: The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be $\widetilde{O}(\log\frac{1}{\epsilon})$, contrasting to single-view setting where the polynomial improvement is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is $\widetilde{O}(\frac{1}{\epsilon})$, where the order of $1/\epsilon$ is independent of the parameter in Tsybakov noise, contrasting to previous polynomial bounds where the order of $1/\epsilon$ is related to the parameter in Tsybakov noise.

...read moreread less

Proceedings Article•

Multi-View Active Learning in the Non-Realizable Case

[...]

Wei Wang¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

06 Dec 2010

TL;DR: In this article, the authors theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting, and show that with unbounded Tsybakov noise, sample complexity can be O(log 1/e), contrasting to single-view settings where the polynomial improvement is the best possible achievement.

...read moreread less

Abstract: The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be O(log 1/e), contrasting to single-view setting where the polynomial improvement is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is O(1/e), where the order of 1/e is independent of the parameter in Tsybakov noise, contrasting to previous polynomial bounds where the order of 1/e is related to the parameter in Tsybakov noise.

...read moreread less

Posted Content•

On the Doubt about Margin Explanation of Boosting

[...]

Wei Gao¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

19 Sep 2010-arXiv: Learning

TL;DR: Maurer et al. as discussed by the authors improved the previous empirical Bernstein bounds by incorporating factors such as average margin and variance, and presented a generalization error bound that is heavily related to the whole margin distribution.

...read moreread less

Abstract: Margin theory provides one of the most popular explanations to the success of \texttt{AdaBoost}, where the central point lies in the recognition that \textit{margin} is the key for characterizing the performance of \texttt{AdaBoost}. This theory has been very influential, e.g., it has been used to argue that \texttt{AdaBoost} usually does not overfit since it tends to enlarge the margin even after the training error reaches zero. Previously the \textit{minimum margin bound} was established for \texttt{AdaBoost}, however, \cite{Breiman1999} pointed out that maximizing the minimum margin does not necessarily lead to a better generalization. Later, \cite{Reyzin:Schapire2006} emphasized that the margin distribution rather than minimum margin is crucial to the performance of \texttt{AdaBoost}. In this paper, we first present the \textit{$k$th margin bound} and further study on its relationship to previous work such as the minimum margin bound and Emargin bound. Then, we improve the previous empirical Bernstein bounds \citep{Maurer:Pontil2009,Audibert:Munos:Szepesvari2009}, and based on such findings, we defend the margin-based explanation against Breiman's doubts by proving a new generalization error bound that considers exactly the same factors as \cite{Schapire:Freund:Bartlett:Lee1998} but is sharper than \cite{Breiman1999}'s minimum margin bound. By incorporating factors such as average margin and variance, we present a generalization error bound that is heavily related to the whole margin distribution. We also provide margin distribution bounds for generalization error of voting classifiers in finite VC-dimension space.

...read moreread less

Proceedings Article•

Multi-instance dimensionality reduction

[...]

Yuyin Sun¹, Michael K. Ng², Zhi-Hua Zhou¹•Institutions (2)

Nanjing University¹, Hong Kong Baptist University²

11 Jul 2010

TL;DR: An effective model is proposed and an efficient algorithm is developed to solve the multi-instance dimensionality reduction problem by considering orthonormality and sparsity constraints in the projection matrix for dimensionality reduce and solving it by the gradient descent along the tangent space of the orthonormal matrices.

...read moreread less

Abstract: Multi-instance learning deals with problems that treat bags of instances as training examples. In single-instance learning problems, dimensionality reduction is an essential step for high-dimensional data analysis and has been studied for years. The curse of dimensionality also exists in multi-instance learning tasks, yet this difficult task has not been studied before. Direct application of existing single-instance dimensionality reduction objectives to multi-instance learning tasks may not work well since it ignores the characteristic of multi-instance learning that the labels of bags are known while the labels of instances are unknown. In this paper, we propose an effective model and develop an efficient algorithm to solve the multi-instance dimensionality reduction problem. We formulate the objective as an optimization problem by considering orthonormality and sparsity constraints in the projection matrix for dimensionality reduction, and then solve it by the gradient descent along the tangent space of the orthonormal matrices. We also propose an approximation for improving the efficiency. Experimental results validate the effectiveness of the proposed method.

...read moreread less

Proceedings Article•DOI•

Exploiting Unlabeled Data to Enhance Ensemble Diversity

[...]

Min-Ling Zhang¹, Zhi-Hua Zhou²•Institutions (2)

Southeast University¹, Nanjing University²

13 Dec 2010

TL;DR: Experiments show that UDEED can effectively utilize unlabeled data for ensemble learning and is highly competitive to well-established semi-supervised ensemble methods.

...read moreread less

Abstract: Ensemble learning aims to improve generalization ability by using multiple base learners. It is well-known that to construct a good ensemble, the base learners should be accurate as well as diverse. In this paper, unlabeled data is exploited to facilitate ensemble learning by helping augment the diversity among the base learners. Specifically, a semi-supervised ensemble method named UDEED is proposed. Unlike existing semi-supervised ensemble methods where error-prone pseudo-labels are estimated for unlabeled data to enlarge the labeled data to improve accuracy, UDEED works by maximizing accuracies of base learners on labeled data while maximizing diversity among them on unlabeled data. Experiments show that UDEED can effectively utilize unlabeled data for ensemble learning and is highly competitive to well-established semi-supervised ensemble methods.

...read moreread less

Journal Article•

Social Learning INTRODUCTION

[...]

Qiang Yang, Zhi-Hua Zhou, Wenji Mao, Wei Li, Nathan Liu - Show less +1 more

01 Jul 2010-IEEE Intelligent Systems

Journal Article•DOI•

A framework for modeling positive class expansion with single snapshot

[...]

Yang Yu¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

01 Nov 2010

TL;DR: This paper forms the Positive Class Expansion with single Snapshot (PCES) problem and proposes a framework which involves the incorporation of desirable biases based on user preferences, and solves the problem by the Stochastic Gradient Boosting with Double Target approach.

...read moreread less

Abstract: In many real-world data mining tasks, the connotation of the target concept may change as time goes by. For example, the connotation of “learned knowledge” of a student today may be different from his/her “learned knowledge” tomorrow, since the “learned knowledge” of the student is expanding everyday. In order to learn a model capable of making accurate predictions, the evolution of the concept must be considered, and thus, a series of data sets collected at different time is needed. In many tasks, however, there is only a single data set instead of a series of data sets. In other words, only a single snapshot of the data along the time axis is available. In this paper, we formulate the Positive Class Expansion with single Snapshot (PCES) problem and discuss its difference with existing problem settings. To show that this new problem is addressable, we propose a framework which involves the incorporation of desirable biases based on user preferences. The resulting optimization problem is solved by the Stochastic Gradient Boosting with Double Target approach, which achieves encouraging performance on PCES problems in experiments.

...read moreread less

Book Chapter•DOI•

Towards analyzing recombination operators in evolutionary search

[...]

Yang Yu¹, Chao Qian¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

11 Sep 2010

TL;DR: A general approach is presented which allows to compare the runtime of an EA turning the recombination on and off, and thus helps to understand when a recombination operator works.

...read moreread less

Abstract: Recombination (also called crossover) operators are widely used in EAs to generate offspring solutions. Although the usefulness of recombination has been well recognized, theoretical analysis on recombination operators remains a hard problem due to the irregularity of the operators and their complicated interactions to mutation operators. In this paper, as a step towards analyzing recombination operators theoretically, we present a general approach which allows to compare the runtime of an EA turning the recombination on and off, and thus helps to understand when a recombination operator works. The key of our approach is the Markov Chain Switching Theorem which compares two Markov chains for the first hit of the target. As an illustration, we analyze some recombination operators in evolutionary search on the LeadingOnes problem using the proposed approach. The analysis identifies some insight on the choice of recombination operators, which is then verified in experiments.

...read moreread less

Book Chapter•DOI•

Approximation stability and boosting

[...]

Wei Gao¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

06 Oct 2010

TL;DR: The approximation stability is introduced and it is proved that AdaBoost has approximation stability and thus has good generalization, and an exponential bound for AdaBoost is provided.

...read moreread less

Abstract: Stability has been explored to study the performance of learning algorithms in recent years and it has been shown that stability is sufficient for generalization and is sufficient and necessary for consistency of ERM in the general learning setting. Previous studies showed that AdaBoost has almost-everywhere uniform stability if the base learner has L1 stability. The L1 stability, however, is too restrictive and we show that AdaBoost becomes constant learner if the base learner is not real-valued learner. Considering that AdaBoost is mostly successful as a classification algorithm, stability analysis for AdaBoost when the base learner is not real-valued learner is an important yet unsolved problem. In this paper, we introduce the approximation stability and prove that approximation stability is sufficient for generalization, and sufficient and necessary for learnability of AERM in the general learning setting. We prove that AdaBoost has approximation stability and thus has good generalization, and an exponential bound for AdaBoost is provided.

...read moreread less

Posted Content•

Improving Semi-Supervised Support Vector Machines Through Unlabeled Instances Selection

[...]

Yu-Feng Li¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

10 May 2010-arXiv: Learning

TL;DR: The authors proposed the S3VM-\emph{us} method by using hierarchical clustering to select the unlabeled instances, which reduced the chance of performance degeneration of S3VMs.

...read moreread less

Abstract: Semi-supervised support vector machines (S3VMs) are a kind of popular approaches which try to improve learning performance by exploiting unlabeled data. Though S3VMs have been found helpful in many situations, they may degenerate performance and the resultant generalization ability may be even worse than using the labeled data only. In this paper, we try to reduce the chance of performance degeneration of S3VMs. Our basic idea is that, rather than exploiting all unlabeled data, the unlabeled instances should be selected such that only the ones which are very likely to be helpful are exploited, while some highly risky unlabeled instances are avoided. We propose the S3VM-\emph{us} method by using hierarchical clustering to select the unlabeled instances. Experiments on a broad range of data sets over eighty-eight different settings show that the chance of performance degeneration of S3VM-\emph{us} is much smaller than that of existing S3VMs.

...read moreread less

Journal Article•DOI•

Exploiting Remote Learners in Internet Environment with Agents

[...]

Ming Li¹, Wei Wang¹, Zhi-Hua Zhou¹•Institutions (1)

Nanjing University¹

04 Feb 2010-Science in China Series F: Information Sciences

TL;DR: Theoretical analysis and simulation experiments show the superiority of the proposed aggregative-learning method, in this paradigm, every site maintains a local learner trained from its own data.

...read moreread less

Abstract: Data in the Internet are scattered on different sites indeliberately, and accumulated and updated frequently but not synchronously. It is infeasible to collect all the data together to train a global learner for prediction; even exchanging learners trained on different sites is costly. In this paper, aggregative-learning is proposed. In this paradigm, every site maintains a local learner trained from its own data. Upon receiving a request for prediction, an aggregative-learner at a local site activates and sends out many mobile agents taking the request to potential remote learners. The prediction of the aggregative-learner is made by combining the local prediction and the responses brought back by the agents. Theoretical analysis and simulation experiments show the superiority of the proposed method.

...read moreread less