scispace - formally typeset
Search or ask a question

Showing papers by "Zhi-Hua Zhou published in 2010"


Journal ArticleDOI
TL;DR: A statistical framework which generalizes the bag-of-words representation, in which the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method.
Abstract: The bag-of-words model is one of the most popular representation methods for object categorization. The key idea is to quantize each extracted key point into one of visual words, and then represent each image by a histogram of the visual words. For this purpose, a clustering algorithm (e.g., K-means), is generally used for generating the visual words. Although a number of studies have shown encouraging results of the bag-of-words representation for object categorization, theoretical studies on properties of the bag-of-words model is almost untouched, possibly due to the difficulty introduced by using a heuristic clustering process. In this paper, we present a statistical framework which generalizes the bag-of-words representation. In this framework, the visual words are generated by a statistical process rather than using a clustering algorithm, while the empirical performance is competitive to clustering-based method. A theoretical analysis based on statistical consistency is presented for the proposed framework. Moreover, based on the framework we developed two algorithms which do not rely on clustering, while achieving competitive performance in object categorization when compared to clustering-based bag-of-words representations.

923 citations


Proceedings Article
06 Dec 2010
TL;DR: The proposed QUIRE approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance by incorporating the correlation among labels and is extended to multi-label learning by actively querying instance-label pairs.
Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of -the-art active learning approaches.

518 citations


Journal ArticleDOI
TL;DR: An introduction to research advances in disagreement-based semi-supervised learning is provided, where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi- supervised learning process.
Abstract: In many real-world tasks, there are abundant unlabeled examples but the number of labeled training examples is limited, because labeling the examples requires human efforts and expertise. So, semi-supervised learning which tries to exploit unlabeled examples to improve learning performance has become a hot topic. Disagreement-based semi-supervised learning is an interesting paradigm, where multiple learners are trained for the task and the disagreements among the learners are exploited during the semi-supervised learning process. This survey article provides an introduction to research advances in this paradigm.

373 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a multilabel dimensionality reduction method, MDDM, with two kinds of projection strategies, attempting to project the original data into a lower-dimensional feature space maximizing the dependence between the original feature description and the associated class labels.
Abstract: Multilabel learning deals with data associated with multiple labels simultaneously. Like other data mining and machine learning tasks, multilabel learning also suffers from the curse of dimensionality. Dimensionality reduction has been studied for many years, however, multilabel dimensionality reduction remains almost untouched. In this article, we propose a multilabel dimensionality reduction method, MDDM, with two kinds of projection strategies, attempting to project the original data into a lower-dimensional feature space maximizing the dependence between the original feature description and the associated class labels. Based on the Hilbert-Schmidt Independence Criterion, we derive a eigen-decomposition problem which enables the dimensionality reduction process to be efficient. Experiments validate the performance of MDDM.

346 citations


Journal ArticleDOI
01 Aug 2010
TL;DR: In this article, the rescaling approach works well when the costs are consistent, while directly applying it to multi-class problems with inconsistent costs may not be a good choice based on this recognition.
Abstract: Rescaling is possibly the most popular approach to cost-sensitive learning This approach works by rebalancing the classes according to their costs, and it can be realized in different ways, for example, re-weighting or resampling the training examples in proportion to their costs, moving the decision boundaries of classifiers faraway from high-cost classes in proportion to costs, etc This approach is very effective in dealing with two-class problems, yet some studies showed that it is often not so helpful on multi-class problems In this article, we try to explore why the rescaling approach is often helpless on multi-class problems Our analysis discloses that the rescaling approach works well when the costs are consistent, while directly applying it to multi-class problems with inconsistent costs may not be a good choice Based on this recognition, we advocate that before applying the rescaling approach, the consistency of the costs must be examined at first If the costs are consistent, the rescaling approach can be conducted directly; otherwise it is better to apply rescaling after decomposing the multi-class problem into a series of two-class problems An empirical study involving 20 multi-class data sets and seven types of cost-sensitive learners validates our proposal Moreover, we show that the proposal is also helpful for class-imbalance learning

234 citations


Proceedings Article
11 Jul 2010
TL;DR: The WELL (WEak Label Learning) method is proposed, which considers that the classification boundary for each label should go across low density regions, and that each label generally has much smaller number of positive examples than negative examples.
Abstract: Multi-label learning deals with data associated with multiple labels simultaneously. Previous work on multi-label learning assumes that for each instance, the "full" label set associated with each training instance is given by users. In many applications, however, to get the full label set for each instance is difficult and only a "partial" set of labels is available. In such cases, the appearance of a label means that the instance is associated with this label, while the absence of a label does not imply that this label is not proper for the instance. We call this kind of problem "weak label" problem. In this paper, we propose the WELL (WEak Label Learning) method to solve the weak label problem. We consider that the classification boundary for each label should go across low density regions, and that each label generally has much smaller number of positive examples than negative examples. The objective is formulated as a convex optimization problem which can be solved efficiently. Moreover, we exploit the correlation between labels by assuming that there is a group of low-rank base similarities, and the appropriate similarities between instances for different labels can be derived from these base similarities. Experiments validate the performance of WELL.

189 citations


Journal ArticleDOI
TL;DR: A spatio-temporal shot graph is constructed and the summarization problem is formulated as a graph labeling task, which encodes the correlations with different attributes among multi-view video shots in hyperedges and generates a result based on shot importance evaluated using a Gaussian entropy fusion scheme.
Abstract: Previous video summarization studies focused on monocular videos, and the results would not be good if they were applied to multi-view videos directly, due to problems such as the redundancy in multiple views. In this paper, we present a method for summarizing multi-view videos. We construct a spatio-temporal shot graph and formulate the summarization problem as a graph labeling task. The spatio-temporal shot graph is derived from a hypergraph, which encodes the correlations with different attributes among multi-view video shots in hyperedges. We then partition the shot graph and identify clusters of event-centered shots with similar contents via random walks. The summarization result is generated through solving a multi-objective optimization problem based on shot importance evaluated using a Gaussian entropy fusion scheme. Different summarization objectives, such as minimum summary length and maximum information coverage, can be accomplished in the framework. Moreover, multi-level summarization can be achieved easily by configuring the optimization parameters. We also propose the multi-view storyboard and event board for presenting multi-view summaries. The storyboard naturally reflects correlations among multi-view summarized shots that describe the same important event. The event-board serially assembles event-centered multi-view shots in temporal order. Single video summary which facilitates quick browsing of the summarized multi-view video can be easily generated based on the event board representation.

183 citations


Proceedings Article
21 Jun 2010
TL;DR: In this analysis the co-training process is viewed as a combinative label propagation over two views; this provides a possibility to bring the graph-based and disagreement-based semi-supervised methods into a unified framework.
Abstract: In this paper, we present a new analysis on co-training, a representative paradigm of disagreement-based semi-supervised learning methods. In our analysis the co-training process is viewed as a combinative label propagation over two views; this provides a possibility to bring the graph-based and disagreement-based semi-supervised methods into a unified framework. With the analysis we get some insight that has not been disclosed by previous theoretical studies. In particular, we provide the sufficient and necessary condition for co-training to succeed. We also discuss the relationship to previous theoretical results and give some other interesting implications of our results, such as combination of weight matrices and view split.

182 citations


Journal ArticleDOI
TL;DR: A framework is proposed which formulates the face recognition problem as a multiclass cost-sensitive learning task, and two theoretically sound methods for this task are developed.
Abstract: Most traditional face recognition systems attempt to achieve a low recognition error rate, implicitly assuming that the losses of all misclassifications are the same. In this paper, we argue that this is far from a reasonable setting because, in almost all application scenarios of face recognition, different kinds of mistakes will lead to different losses. For example, it would be troublesome if a door locker based on a face recognition system misclassified a family member as a stranger such that she/he was not allowed to enter the house, but it would be a much more serious disaster if a stranger was misclassified as a family member and allowed to enter the house. We propose a framework which formulates the face recognition problem as a multiclass cost-sensitive learning task, and develop two theoretically sound methods for this task. Experimental results demonstrate the effectiveness and efficiency of the proposed methods.

155 citations


Journal ArticleDOI
TL;DR: This special issue gathers the state-of-the-art research in social learning and is devoted to exhibiting some of the best representative works in this area.
Abstract: In recent years, social behavioral data have been exponentially expanding due to the tremendous success of various outlets on the social Web (aka Web 2.0) such as Facebook, Digg, Twitter, Wikipedia, and Delicious. As a result, there's a need for social learning to support the discovery, analysis, and modeling of human social behavioral data. The goal is to discover social intelligence, which encompasses a spectrum of knowledge that characterizes human interaction, communication, and collaborations. The social Web has thus become a fertile ground for machine learning and data mining research. This special issue gathers the state-of-the-art research in social learning and is devoted to exhibiting some of the best representative works in this area.

132 citations


Proceedings ArticleDOI
26 Oct 2010
TL;DR: This paper provides an evolving trajectory outlier detection method, named TOP-EYE, which continuously computes the outlying score for each trajectory in an accumulating way, and introduces a decay function to mitigate the influence of the past trajectories on the evolving outlying scores.
Abstract: The increasing availability of large-scale location traces creates unprecedent opportunities to change the paradigm for identifying abnormal moving activities. Indeed, various aspects of abnormality of moving patterns have recently been exploited, such as wrong direction and wandering. However, there is no recognized way of combining different aspects into an unified evolving abnormality score which has the ability to capture the evolving nature of abnormal moving trajectories. To that end, in this paper, we provide an evolving trajectory outlier detection method, named TOP-EYE, which continuously computes the outlying score for each trajectory in an accumulating way. Specifically, in TOP-EYE, we introduce a decay function to mitigate the influence of the past trajectories on the evolving outlying score, which is defined based on the evolving moving direction and density of trajectories. This decay function enables the evolving computation of accumulated outlying scores along the trajectories. An advantage of TOP-EYE is to identify evolving outliers at very early stage with relatively low false alarm rate. Finally, experimental results on real-world location traces show that TOP-EYE can effectively capture evolving abnormal trajectories.

Proceedings Article
11 Jul 2010
TL;DR: Experimental results on two aging face databases show remarkable advantages of the proposed label distribution learning algorithms over the compared single-label learning algorithms, either specially designed for age estimation or for general purpose.
Abstract: One of the main difficulties in facial age estimation is the lack of sufficient training data for many ages. Fortunately, the faces at close ages look similar since aging is a slow and smooth process. Inspired by this observation, in this paper, instead of considering each face image as an example with one label (age), we regard each face image as an example associated with a label distribution. The label distribution covers a number of class labels, representing the degree that each label describes the example. Through this way, in addition to the real age, one face image can also contribute to the learning of its adjacent ages. We propose an algorithm named IIS-LLD for learning from the label distributions, which is an iterative optimization process based on the maximum entropy model. Experimental results show the advantages of IIS-LLD over the traditional learning methods based on single-labeled data.

Book ChapterDOI
20 Sep 2010
TL;DR: Without using any density or distance measure, a new method is proposed called SCiForest to detect clustered anomalies that maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distance-based and density-based methods.
Abstract: Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distance-based and density-based methods are inherently restricted by their basic assumptions--anomalies are either far from normal points or being sparse. Clustered anomalies are able to avoid detection since they defy these assumptions by being dense and, in many cases, in close proximity to normal instances. In this paper, without using any density or distance measure, we propose a new method called SCiForest to detect clustered anomalies. SCiForest separates clustered anomalies from normal points effectively even when clustered anomalies are very close to normal points. It maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distance-based and density-based methods.

Proceedings Article
11 Jul 2010
TL;DR: The CS4VM (Cost-Sensitive Semi-Supervised Support Vector Machine) is proposed, which closely approximates the supervised cost-sensitive SVM that has access to the ground-truth labels of all the unlabeled data.
Abstract: In this paper, we study cost-sensitive semi-supervised learning where many of the training examples are un-labeled and different misclassification errors are associated with unequal costs. This scenario occurs in many real-world applications. For example, in some disease diagnosis, the cost of erroneously diagnosing a patient as healthy is much higher than that of diagnosing a healthy person as a patient. Also, the acquisition of labeled data requires medical diagnosis which is expensive, while the collection of unlabeled data such as basic health information is much cheaper. We propose the CS4VM (Cost-Sensitive Semi-Supervised Support Vector Machine) to address this problem. We show that the CS4VM, when given the label means of the unlabeled data, closely approximates the supervised cost-sensitive SVM that has access to the ground-truth labels of all the unlabeled data. This observation leads to an efficient algorithm which first estimates the label means and then trains the CS4VM with the plug-in label means by an efficient SVM solver. Experiments on a broad range of data sets show that the proposed method is capable of reducing the total cost and is computationally efficient.

Book ChapterDOI
07 Apr 2010
TL;DR: The ensemble diversity can be decomposed over the component classifiers constituting the ensemble from the view of multi-information and an approximation is given for estimating the diversity in practice.
Abstract: Understanding ensemble diversity is one of the most important fundamental issues in ensemble learning. Inspired by a recent work trying to explain ensemble diversity from the information theoretic perspective, in this paper we study the ensemble diversity from the view of multi-information. We show that from this view, the ensemble diversity can be decomposed over the component classifiers constituting the ensemble. Based on this formulation, an approximation is given for estimating the diversity in practice. Experimental results show that our formulation and approximation are promising.

Journal ArticleDOI
TL;DR: The main contributions of the paper—the subset search strategy (GASEN) introduced in Section 3 after Eqs.

Book ChapterDOI
30 Aug 2010
TL;DR: Experimental results show that the performance of the proposed mumCluster approach is encouraging, and many approaches require to know the number of clusters and the intrinsic dimensions of the manifolds in advance, while it is hard for the user to provide such information in practice.
Abstract: Manifold clustering, which regards clusters as groups of points around compact manifolds, has been realized as a promising generalization of traditional clustering. A number of linear or nonlinear manifold clustering approaches have been developed recently. Although they have attained better performances than traditional clustering methods in many scenarios, most of these approaches suffer from two weaknesses. First, when the data are drawn from hybrid modeling, i.e., some data manifolds are separated but some are intersected, existing approaches could not work well although hybrid modeling often appears in real data. Second, many approaches require to know the number of clusters and the intrinsic dimensions of the manifolds in advance, while it is hard for the user to provide such information in practice. In this paper, we propose a new manifold clustering approach, mumCluster, to address these issues. Experimental results show that the performance of the proposed mumCluster approach is encouraging.

Journal ArticleDOI
TL;DR: A lower bound of GLRAM's objective function is derived, which answers one open problem raised by Ye (Machine Learning, 2005), and explores when and why GLRAM can perform well in terms of compression, which is a fundamental problem concerning the usability ofGLRAM.
Abstract: Compared to singular value decomposition (SVD), generalized low-rank approximations of matrices (GLRAM) can consume less computation time, obtain higher compression ratio, and yield competitive classification performance. GLRAM has been successfully applied to applications such as image compression and retrieval, and quite a few extensions have been successively proposed. However, in literature, some basic properties and crucial problems with regard to GLRAM have not been explored or solved yet. For this sake, we revisit GLRAM in this paper. First, we reveal such a close relationship between GLRAM and SVD that GLRAM's objective function is identical to SVD's objective function except the imposed constraints. Second, we derive a lower bound of GLRAM's objective function, and discuss when the lower bound can be touched. Moreover, from the viewpoint of minimizing the lower bound, we answer one open problem raised by Ye (Machine Learning, 2005), i.e., a theoretical justification of the experimental phenomenon that, under given number of reduced dimension, the lowest reconstruction error is obtained when the left and right transformations have equal number of columns. Third, we explore when and why GLRAM can perform well in terms of compression, which is a fundamental problem concerning the usability of GLRAM.

Proceedings ArticleDOI
25 Jul 2010
TL;DR: The CISVM method, a support vector machine, is proposed, to work with cost interval information and can reduce 60% more risks than the standard cost-sensitive SVM which assumes the expected cost is the true value.
Abstract: Existing cost-sensitive learning methods require that the unequal misclassification costs should be given as precise values. In many real-world applications, however, it is generally difficult to have a precise cost value since the user maybe only knows that one type of mistake is much more severe than another type, yet it is infeasible to give a precise description. In such situations, it is more meaningful to work with a cost interval instead of a precise cost value. In this paper we report the first study along this direction. We propose the CISVM method, a support vector machine, to work with cost interval information. Experiments show that when there are only cost intervals available, CISVM is significantly superior to standard cost-sensitive SVMs using any of the minimal cost, mean cost and maximal cost to learn. Moreover, considering that in some cases other information about costs can be obtained in addition to cost intervals, such as the distribution of costs, we propose a general approach CODIS for using the distribution information to help improve performance. Experiments show that this approach can reduce 60% more risks than the standard cost-sensitive SVM which assumes the expected cost is the true value.

Posted Content
TL;DR: It is proved that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be O(log 1/e), contrasting to single-view setting where the polynomial improvement is the best possible achievement.
Abstract: The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be $\widetilde{O}(\log\frac{1}{\epsilon})$, contrasting to single-view setting where the polynomial improvement is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is $\widetilde{O}(\frac{1}{\epsilon})$, where the order of $1/\epsilon$ is independent of the parameter in Tsybakov noise, contrasting to previous polynomial bounds where the order of $1/\epsilon$ is related to the parameter in Tsybakov noise.

Proceedings Article
06 Dec 2010
TL;DR: In this article, the authors theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting, and show that with unbounded Tsybakov noise, sample complexity can be O(log 1/e), contrasting to single-view settings where the polynomial improvement is the best possible achievement.
Abstract: The sample complexity of active learning under the realizability assumption has been well-studied. The realizability assumption, however, rarely holds in practice. In this paper, we theoretically characterize the sample complexity of active learning in the non-realizable case under multi-view setting. We prove that, with unbounded Tsybakov noise, the sample complexity of multi-view active learning can be O(log 1/e), contrasting to single-view setting where the polynomial improvement is the best possible achievement. We also prove that in general multi-view setting the sample complexity of active learning with unbounded Tsybakov noise is O(1/e), where the order of 1/e is independent of the parameter in Tsybakov noise, contrasting to previous polynomial bounds where the order of 1/e is related to the parameter in Tsybakov noise.

Posted Content
TL;DR: Maurer et al. as discussed by the authors improved the previous empirical Bernstein bounds by incorporating factors such as average margin and variance, and presented a generalization error bound that is heavily related to the whole margin distribution.
Abstract: Margin theory provides one of the most popular explanations to the success of \texttt{AdaBoost}, where the central point lies in the recognition that \textit{margin} is the key for characterizing the performance of \texttt{AdaBoost}. This theory has been very influential, e.g., it has been used to argue that \texttt{AdaBoost} usually does not overfit since it tends to enlarge the margin even after the training error reaches zero. Previously the \textit{minimum margin bound} was established for \texttt{AdaBoost}, however, \cite{Breiman1999} pointed out that maximizing the minimum margin does not necessarily lead to a better generalization. Later, \cite{Reyzin:Schapire2006} emphasized that the margin distribution rather than minimum margin is crucial to the performance of \texttt{AdaBoost}. In this paper, we first present the \textit{$k$th margin bound} and further study on its relationship to previous work such as the minimum margin bound and Emargin bound. Then, we improve the previous empirical Bernstein bounds \citep{Maurer:Pontil2009,Audibert:Munos:Szepesvari2009}, and based on such findings, we defend the margin-based explanation against Breiman's doubts by proving a new generalization error bound that considers exactly the same factors as \cite{Schapire:Freund:Bartlett:Lee1998} but is sharper than \cite{Breiman1999}'s minimum margin bound. By incorporating factors such as average margin and variance, we present a generalization error bound that is heavily related to the whole margin distribution. We also provide margin distribution bounds for generalization error of voting classifiers in finite VC-dimension space.

Proceedings Article
11 Jul 2010
TL;DR: An effective model is proposed and an efficient algorithm is developed to solve the multi-instance dimensionality reduction problem by considering orthonormality and sparsity constraints in the projection matrix for dimensionality reduce and solving it by the gradient descent along the tangent space of the orthonormal matrices.
Abstract: Multi-instance learning deals with problems that treat bags of instances as training examples. In single-instance learning problems, dimensionality reduction is an essential step for high-dimensional data analysis and has been studied for years. The curse of dimensionality also exists in multi-instance learning tasks, yet this difficult task has not been studied before. Direct application of existing single-instance dimensionality reduction objectives to multi-instance learning tasks may not work well since it ignores the characteristic of multi-instance learning that the labels of bags are known while the labels of instances are unknown. In this paper, we propose an effective model and develop an efficient algorithm to solve the multi-instance dimensionality reduction problem. We formulate the objective as an optimization problem by considering orthonormality and sparsity constraints in the projection matrix for dimensionality reduction, and then solve it by the gradient descent along the tangent space of the orthonormal matrices. We also propose an approximation for improving the efficiency. Experimental results validate the effectiveness of the proposed method.

Proceedings ArticleDOI
13 Dec 2010
TL;DR: Experiments show that UDEED can effectively utilize unlabeled data for ensemble learning and is highly competitive to well-established semi-supervised ensemble methods.
Abstract: Ensemble learning aims to improve generalization ability by using multiple base learners. It is well-known that to construct a good ensemble, the base learners should be accurate as well as diverse. In this paper, unlabeled data is exploited to facilitate ensemble learning by helping augment the diversity among the base learners. Specifically, a semi-supervised ensemble method named UDEED is proposed. Unlike existing semi-supervised ensemble methods where error-prone pseudo-labels are estimated for unlabeled data to enlarge the labeled data to improve accuracy, UDEED works by maximizing accuracies of base learners on labeled data while maximizing diversity among them on unlabeled data. Experiments show that UDEED can effectively utilize unlabeled data for ensemble learning and is highly competitive to well-established semi-supervised ensemble methods.


Journal ArticleDOI
01 Nov 2010
TL;DR: This paper forms the Positive Class Expansion with single Snapshot (PCES) problem and proposes a framework which involves the incorporation of desirable biases based on user preferences, and solves the problem by the Stochastic Gradient Boosting with Double Target approach.
Abstract: In many real-world data mining tasks, the connotation of the target concept may change as time goes by. For example, the connotation of “learned knowledge” of a student today may be different from his/her “learned knowledge” tomorrow, since the “learned knowledge” of the student is expanding everyday. In order to learn a model capable of making accurate predictions, the evolution of the concept must be considered, and thus, a series of data sets collected at different time is needed. In many tasks, however, there is only a single data set instead of a series of data sets. In other words, only a single snapshot of the data along the time axis is available. In this paper, we formulate the Positive Class Expansion with single Snapshot (PCES) problem and discuss its difference with existing problem settings. To show that this new problem is addressable, we propose a framework which involves the incorporation of desirable biases based on user preferences. The resulting optimization problem is solved by the Stochastic Gradient Boosting with Double Target approach, which achieves encouraging performance on PCES problems in experiments.

Book ChapterDOI
11 Sep 2010
TL;DR: A general approach is presented which allows to compare the runtime of an EA turning the recombination on and off, and thus helps to understand when a recombination operator works.
Abstract: Recombination (also called crossover) operators are widely used in EAs to generate offspring solutions. Although the usefulness of recombination has been well recognized, theoretical analysis on recombination operators remains a hard problem due to the irregularity of the operators and their complicated interactions to mutation operators. In this paper, as a step towards analyzing recombination operators theoretically, we present a general approach which allows to compare the runtime of an EA turning the recombination on and off, and thus helps to understand when a recombination operator works. The key of our approach is the Markov Chain Switching Theorem which compares two Markov chains for the first hit of the target. As an illustration, we analyze some recombination operators in evolutionary search on the LeadingOnes problem using the proposed approach. The analysis identifies some insight on the choice of recombination operators, which is then verified in experiments.

Book ChapterDOI
06 Oct 2010
TL;DR: The approximation stability is introduced and it is proved that AdaBoost has approximation stability and thus has good generalization, and an exponential bound for AdaBoost is provided.
Abstract: Stability has been explored to study the performance of learning algorithms in recent years and it has been shown that stability is sufficient for generalization and is sufficient and necessary for consistency of ERM in the general learning setting. Previous studies showed that AdaBoost has almost-everywhere uniform stability if the base learner has L1 stability. The L1 stability, however, is too restrictive and we show that AdaBoost becomes constant learner if the base learner is not real-valued learner. Considering that AdaBoost is mostly successful as a classification algorithm, stability analysis for AdaBoost when the base learner is not real-valued learner is an important yet unsolved problem. In this paper, we introduce the approximation stability and prove that approximation stability is sufficient for generalization, and sufficient and necessary for learnability of AERM in the general learning setting. We prove that AdaBoost has approximation stability and thus has good generalization, and an exponential bound for AdaBoost is provided.

Posted Content
TL;DR: The authors proposed the S3VM-\emph{us} method by using hierarchical clustering to select the unlabeled instances, which reduced the chance of performance degeneration of S3VMs.
Abstract: Semi-supervised support vector machines (S3VMs) are a kind of popular approaches which try to improve learning performance by exploiting unlabeled data. Though S3VMs have been found helpful in many situations, they may degenerate performance and the resultant generalization ability may be even worse than using the labeled data only. In this paper, we try to reduce the chance of performance degeneration of S3VMs. Our basic idea is that, rather than exploiting all unlabeled data, the unlabeled instances should be selected such that only the ones which are very likely to be helpful are exploited, while some highly risky unlabeled instances are avoided. We propose the S3VM-\emph{us} method by using hierarchical clustering to select the unlabeled instances. Experiments on a broad range of data sets over eighty-eight different settings show that the chance of performance degeneration of S3VM-\emph{us} is much smaller than that of existing S3VMs.

Journal ArticleDOI
TL;DR: Theoretical analysis and simulation experiments show the superiority of the proposed aggregative-learning method, in this paradigm, every site maintains a local learner trained from its own data.
Abstract: Data in the Internet are scattered on different sites indeliberately, and accumulated and updated frequently but not synchronously. It is infeasible to collect all the data together to train a global learner for prediction; even exchanging learners trained on different sites is costly. In this paper, aggregative-learning is proposed. In this paradigm, every site maintains a local learner trained from its own data. Upon receiving a request for prediction, an aggregative-learner at a local site activates and sends out many mobile agents taking the request to potential remote learners. The prediction of the aggregative-learner is made by combining the local prediction and the responses brought back by the agents. Theoretical analysis and simulation experiments show the superiority of the proposed method.