scispace - formally typeset
Search or ask a question

Showing papers on "Semi-supervised learning published in 2006"


Journal Article
TL;DR: A semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner is proposed and properties of reproducing kernel Hilbert spaces are used to prove new Representer theorems that provide theoretical basis for the algorithms.
Abstract: We propose a family of learning algorithms based on a new form of regularization that allows us to exploit the geometry of the marginal distribution. We focus on a semi-supervised framework that incorporates labeled and unlabeled data in a general-purpose learner. Some transductive graph learning algorithms and standard methods including support vector machines and regularized least squares can be obtained as special cases. We use properties of reproducing kernel Hilbert spaces to prove new Representer theorems that provide theoretical basis for the algorithms. As a result (in contrast to purely graph-based approaches) we obtain a natural out-of-sample extension to novel examples and so are able to handle both transductive and truly semi-supervised settings. We present experimental evidence suggesting that our semi-supervised algorithms are able to use unlabeled data effectively. Finally we have a brief discussion of unsupervised and fully supervised learning within our general framework.

3,919 citations


Proceedings ArticleDOI
25 Jun 2006
TL;DR: A large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps is presented.
Abstract: A number of supervised learning methods have been introduced in the last decade. Unfortunately, the last comprehensive empirical evaluation of supervised learning was the Statlog Project in the early 90's. We present a large-scale empirical comparison between ten supervised learning methods: SVMs, neural nets, logistic regression, naive bayes, memory-based learning, random forests, decision trees, bagged trees, boosted trees, and boosted stumps. We also examine the effect that calibrating the models via Platt Scaling and Isotonic Regression has on their performance. An important aspect of our study is the use of a variety of performance criteria to evaluate the learning methods.

2,450 citations


Journal ArticleDOI
TL;DR: Various classification algorithms and the recent attempt for improving classification accuracy—ensembles of classifiers are described.
Abstract: Supervised classification is one of the tasks most frequently carried out by so-called Intelligent Systems. Thus, a large number of techniques have been developed based on Artificial Intelligence (Logic-based techniques, Perceptron-based techniques) and Statistics (Bayesian Networks, Instance-based techniques). The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various classification algorithms and the recent attempt for improving classification accuracy--ensembles of classifiers.

1,127 citations


01 Jan 2006
TL;DR: This research identifies a set of features that are key to the superior performance under the supervised learning setup, and shows that a small subset of features always plays a significant role in the link prediction job.
Abstract: Social network analysis has attracted much attention in recent years. Link prediction is a key research directions within this area. In this research, we study link prediction as a supervised learning task. Along the way, we identify a set of features that are key to the superior performance under the supervised learning setup. The identified features are very easy to compute, and at the same time surprisingly effective in solving the link prediction problem. We also explain the effectiveness of the features from their class density distribution. Then we compare different classes of supervised learning algorithms in terms of their prediction performance using various performance metrics, such as accuracy, precision-recall, F-values, squared error etc. with a 5-fold cross validation. Our results on two practical social network datasets shows that most of the well-known classification algorithms (decision tree, k-nn,multilayer perceptron, SVM, rbf network) can predict link with surpassing performances, but SVM defeats all of them with narrow margin in all different performance measures. Again, ranking of features with popular feature ranking algorithms shows that a small subset of features always plays a significant role in the link prediction job.

883 citations


01 Jan 2006
TL;DR: This chapter contains sections titled: Supervised, Unsupervised, and Semi-Supervised Learning, When Can Semi- Supervised Learning Work, Classes of Algorithms and Organization of This Book.
Abstract: This chapter contains sections titled: Supervised, Unsupervised, and Semi-Supervised Learning, When Can Semi-Supervised Learning Work?, Classes of Algorithms and Organization of This Book

627 citations


Proceedings ArticleDOI
25 Jun 2006
TL;DR: A framework for "batch mode active learning" that applies the Fisher information matrix to select a number of informative examples simultaneously and is more effective than the state-of-the-art algorithms for active learning.
Abstract: The goal of active learning is to select the most informative examples for manual labeling. Most of the previous studies in active learning have focused on selecting a single unlabeled example in each iteration. This could be inefficient since the classification model has to be retrained for every labeled example. In this paper, we present a framework for "batch mode active learning" that applies the Fisher information matrix to select a number of informative examples simultaneously. The key computational challenge is how to efficiently identify the subset of unlabeled examples that can result in the largest reduction in the Fisher information. To resolve this challenge, we propose an efficient greedy algorithm that is based on the property of submodular functions. Our empirical studies with five UCI datasets and one real-world medical image classification show that the proposed batch mode active learning algorithm is more effective than the state-of-the-art algorithms for active learning.

473 citations


Proceedings Article
04 Dec 2006
TL;DR: This paper formalizes multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels, and proposes the MIMLBOOST and MIMLSVM algorithms which achieve good performance in an application to scene classification.
Abstract: In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels Such a problem can occur in many real-world tasks, eg an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multi-instance learning and multi-label learning Then, we propose the MIMLBOOST and MIMLSVM algorithms which achieve good performance in an application to scene classification

455 citations


Proceedings ArticleDOI
Kai Yu1, Jinbo Bi1, Volker Tresp1
25 Jun 2006
TL;DR: This paper considers the problem of selecting the most informative experiments x to get measurements y for learning a regression model y, and proposes a novel and simple concept for active learning, transductive experimental design, that has a better scalability in comparison with classic experimental design methods.
Abstract: This paper considers the problem of selecting the most informative experiments x to get measurements y for learning a regression model y = f(x). We propose a novel and simple concept for active learning, transductive experimental design, that explores available unmeasured experiments (i.e., unlabeled data) and has a better scalability in comparison with classic experimental design methods. Our in-depth analysis shows that the new method tends to favor experiments that are on the one side hard-to-predict and on the other side representative for the rest of the experiments. Efficient optimization of the new design problem is achieved through alternating optimization and sequential greedy search. Extensive experimental results on synthetic problems and three real-world tasks, including questionnaire design for preference learning, active learning for text categorization, and spatial sensor placement, highlight the advantages of the proposed approaches.

380 citations


Journal Article
TL;DR: A new design of storage and numerical operations is proposed, which speeds up the training of an incremental SVM by a factor of 5 to 20, and various applications of the algorithm can be foreseen.
Abstract: Incremental Support Vector Machines (SVM) are instrumental in practical applications of online learning. This work focuses on the design and analysis of efficient incremental SVM learning, with the aim of providing a fast, numerically stable and robust implementation. A detailed analysis of convergence and of algorithmic complexity of incremental SVM learning is carried out. Based on this analysis, a new design of storage and numerical operations is proposed, which speeds up the training of an incremental SVM by a factor of 5 to 20. The performance of the new algorithm is demonstrated in two scenarios: learning with limited resources and active learning. Various applications of the algorithm, such as in drug discovery, online monitoring of industrial devices and and surveillance of network traffic, can be foreseen.

369 citations


Proceedings ArticleDOI
09 Jun 2006
TL;DR: A graph-based semi-supervised learning algorithm is presented to address the sentiment analysis task of rating inference and achieves significantly better predictive accuracy over other methods that ignore the unlabeled examples during training.
Abstract: We present a graph-based semi-supervised learning algorithm to address the sentiment analysis task of rating inference. Given a set of documents (e.g., movie reviews) and accompanying ratings (e.g., "4 stars"), the task calls for inferring numerical ratings for unlabeled documents based on the perceived sentiment expressed by their text. In particular, we are interested in the situation where labeled data is scarce. We place this task in the semi-supervised setting and demonstrate that considering unlabeled reviews in the learning process can improve rating-inference performance. We do so by creating a graph on both labeled and unlabeled data to encode certain assumptions for this task. We then solve an optimization problem to obtain a smooth rating function over the whole graph. When only limited labeled data is available, this method achieves significantly better predictive accuracy over other methods that ignore the unlabeled examples during training.

348 citations


Proceedings ArticleDOI
25 Jun 2006
TL;DR: An algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task, which relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent.
Abstract: Many applications of supervised learning require good generalization from limited labeled data. In the Bayesian setting, we can try to achieve this goal by using an informative prior over the parameters, one that encodes useful domain knowledge. Focusing on logistic regression, we present an algorithm for automatically constructing a multivariate Gaussian prior with a full covariance matrix for a given supervised learning task. This prior relaxes a commonly used but overly simplistic independence assumption, and allows parameters to be dependent. The algorithm uses other "similar" learning problems to estimate the covariance of pairs of individual parameters. We then use a semidefinite program to combine these estimates and learn a good prior for the current learning task. We apply our methods to binary text classification, and demonstrate a 20 to 40% test error reduction over a commonly used prior.

Proceedings ArticleDOI
25 Jun 2006
TL;DR: The first active learning algorithm which works in the presence of arbitrary forms of noise is state and analyzed, and it is shown that A2 achieves an exponential improvement over the usual sample complexity of supervised learning.
Abstract: We state and analyze the first active learning algorithm which works in the presence of arbitrary forms of noise. The algorithm, A2 (for Agnostic Active), relies only upon the assumption that the samples are drawn i.i.d. from a fixed distribution. We show that A2 achieves an exponential improvement (i.e., requires only O (ln 1/e) samples to find an e-optimal classifier) over the usual sample complexity of supervised learning, for several settings considered before in the realizable case. These include learning threshold classifiers and learning homogeneous linear separators with respect to an input distribution which is uniform over the unit sphere.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: Experiments show that the proposed algorithms, BalanceCascade and EasyEnsemble, have better AUC scores than many existing class-imbalance learning methods and have approximately the same training time as that of under-sampling, which trains significantly faster than other methods.
Abstract: Under-sampling is a class-imbalance learning method which uses only a subset of major class examples and thus is very efficient. The main deficiency is that many major class examples are ignored. We propose two algorithms to overcome the deficiency. EasyEnsemble samples several subsets from the major class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade is similar to EasyEnsemble except that it removes correctly classified major class examples of trained learners from further consideration. Experiments show that both of the proposed algorithms have better AUC scores than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of under-sampling, which trains significantly faster than other methods.

Proceedings ArticleDOI
20 Aug 2006
TL;DR: This work proposes a semi-supervised technique for building time series classifiers and shows that special considerations must be made to make them both efficient and effective for the time series domain.
Abstract: The problem of time series classification has attracted great interest in the last decade. However current research assumes the existence of large amounts of labeled training data. In reality, such data may be very difficult or expensive to obtain. For example, it may require the time and expertise of cardiologists, space launch technicians, or other domain specialists. As in many other domains, there are often copious amounts of unlabeled data available. For example, the PhysioBank archive contains gigabytes of ECG data. In this work we propose a semi-supervised technique for building time series classifiers. While such algorithms are well known in text domains, we will show that special considerations must be made to make them both efficient and effective for the time series domain. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, handwritten documents, and video datasets. The experimental results demonstrate that our approach requires only a handful of labeled examples to construct accurate classifiers.

Journal ArticleDOI
TL;DR: An incremental version of ISOMAP, one of the key manifold learning algorithms, is described and it is demonstrated that this modified algorithm can maintain an accurate low-dimensional representation of the data in an efficient manner.
Abstract: Understanding the structure of multidimensional patterns, especially in unsupervised cases, is of fundamental importance in data mining, pattern recognition, and machine learning. Several algorithms have been proposed to analyze the structure of high-dimensional data based on the notion of manifold learning. These algorithms have been used to extract the intrinsic characteristics of different types of high-dimensional data by performing nonlinear dimensionality reduction. Most of these algorithms operate in a "batch" mode and cannot be efficiently applied when data are collected sequentially. In this paper, we describe an incremental version of ISOMAP, one of the key manifold learning algorithms. Our experiments on synthetic data as well as real world images demonstrate that our modified algorithm can maintain an accurate low-dimensional representation of the data in an efficient manner.

Book ChapterDOI
16 Aug 2006
TL;DR: Two kernel-based reinforcement learning algorithms, the e – KRL and the least squares kernel based reinforcement learning (LS-KRL) are proposed and an example shows that the proposed methods can deal effectively with the reinforcement learning problem without having to explore many states.
Abstract: We consider the problem of approximating the cost-to-go functions in reinforcement learning By mapping the state implicitly into a feature space, we perform a simple algorithm in the feature space, which corresponds to a complex algorithm in the original state space Two kernel-based reinforcement learning algorithms, the e -insensitive kernel based reinforcement learning (e – KRL) and the least squares kernel based reinforcement learning (LS-KRL) are proposed An example shows that the proposed methods can deal effectively with the reinforcement learning problem without having to explore many states

Proceedings Article
04 Dec 2006
TL;DR: A local learning approach for clustering that ensures that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods.
Abstract: We present a local learning approach for clustering. The basic idea is that a good clustering result should have the property that the cluster label of each data point can be well predicted based on its neighboring data and their cluster labels, using current supervised learning methods. An optimization problem is formulated such that its solution has the above property. Relaxation and eigen-decomposition are applied to solve this optimization problem. We also briefly investigate the parameter selection issue and provide a simple parameter selection method for the proposed algorithm. Experimental results are provided to validate the effectiveness of the proposed approach.


Journal ArticleDOI
TL;DR: A simple learning algorithm capable of real-time learning which can automatically select appropriate values of neural quantizers and analytically determine the parameters (weights and bias) of the network at one time only is proposed.
Abstract: In some practical applications of neural networks, fast response to external events within an extremely short time is highly demanded and expected. However, the extensively used gradient-descent-based learning algorithms obviously cannot satisfy the real-time learning needs in many applications, especially for large-scale applications and/or when higher generalization performance is required. Based on Huang's constructive network model, this paper proposes a simple learning algorithm capable of real-time learning which can automatically select appropriate values of neural quantizers and analytically determine the parameters (weights and bias) of the network at one time only. The performance of the proposed algorithm has been systematically investigated on a large batch of benchmark real-world regression and classification problems. The experimental results demonstrate that our algorithm can not only produce good generalization performance but also have real-time learning and prediction capability. Thus, it may provide an alternative approach for the practical applications of neural networks where real-time learning and prediction implementation is required.

Book ChapterDOI
11 Oct 2006
TL;DR: This work presents a framework for the identification of user’s interest in an automatic way, based on the analysis of query logs, and establishes that the combination of supervised and unsupervised learning is a good alternative to find user‘s goals.
Abstract: The identification of the user’s intention or interest through queries that they submit to a search engine can be very useful to offer them more adequate results. In this work we present a framework for the identification of user’s interest in an automatic way, based on the analysis of query logs. This identification is made from two perspectives, the objectives or goals of a user and the categories in which these aims are situated. A manual classification of the queries was made in order to have a reference point and then we applied supervised and unsupervised learning techniques. The results obtained show that for a considerable amount of cases supervised learning is a good option, however through unsupervised learning we found relationships between users and behaviors that are not easy to detect just taking the query words. Also, through unsupervised learning we established that there are categories that we are not able to determine in contrast with other classes that were not considered but naturally appear after the clustering process. This allowed us to establish that the combination of supervised and unsupervised learning is a good alternative to find user’s goals. From supervised learning we can identify the user interest given certain established goals and categories; on the other hand, with unsupervised learning we can validate the goals and categories used, refine them and select the most appropriate to the user’s needs.

Journal ArticleDOI
TL;DR: In this article, a multi-view active learning framework is proposed, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept.
Abstract: Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multi-view domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we introduce Co-Testing, which is the first approach to multi-view active learning. Second, we extend the multi-view learning framework by also exploiting weak views, which are adequate only for learning a concept that is more general/specific than the target concept. Finally, we empirically show that Co-Testing outperforms existing active learners on a variety of real world domains such as wrapper induction, Web page classification, advertisement removal, and discourse tree parsing.

Proceedings ArticleDOI
25 Jun 2006
TL;DR: Efficient algorithms for learning low-rank kernel matrices are proposed; these algorithms scale linearly in the number of data points and quadratically in the rank of the kernel.
Abstract: Kernel learning plays an important role in many machine learning tasks. However, algorithms for learning a kernel matrix often scale poorly, with running times that are cubic in the number of data points. In this paper, we propose efficient algorithms for learning low-rank kernel matrices; our algorithms scale linearly in the number of data points and quadratically in the rank of the kernel. We introduce and employ Bregman matrix divergences for rank-deficient matrices---these divergences are natural for our problem since they preserve the rank as well as positive semi-definiteness of the kernel matrix. Special cases of our framework yield faster algorithms for various existing kernel learning problems. Experimental results demonstrate the effectiveness of our algorithms in learning both low-rank and full-rank kernels.

Journal Article
TL;DR: In this paper, a weakly supervised approach is proposed to learn both a model of local part appearance and a model for the spatial relations between those parts, and the results show that the effect on performance depends substantially on the particular object class and on the difficulty of the test dataset.
Abstract: In this paper we investigate a new method of learning part-based models for visual object recognition, from training data that only provides information about class membership (and not object location or configuration). This method learns both a model of local part appearance and a model of the spatial relations between those parts. In contrast, other work using such a weakly supervised learning paradigm has not considered the problem of simultaneously learning appearance and spatial models. Some of these methods use a bag model where only part appearance is considered whereas other methods learn spatial models but only given the output of a particular feature detector. Previous techniques for learning both part appearance and spatial relations have instead used a highly supervised learning process that provides substantial information about object part location. We show that our weakly supervised technique produces better results than these previous highly supervised methods. Moreover, we investigate the degree to which both richer spatial models and richer appearance models are helpful in improving recognition performance. Our results show that while both spatial and appearance information can be useful, the effect on performance depends substantially on the particular object class and on the difficulty of the test dataset.


Book ChapterDOI
07 May 2006
TL;DR: This paper investigates a new method of learning part-based models for visual object recognition, from training data that only provides information about class membership (and not object location or configuration), and shows that this weakly supervised technique produces better results.
Abstract: In this paper we investigate a new method of learning part-based models for visual object recognition, from training data that only provides information about class membership (and not object location or configuration). This method learns both a model of local part appearance and a model of the spatial relations between those parts. In contrast, other work using such a weakly supervised learning paradigm has not considered the problem of simultaneously learning appearance and spatial models. Some of these methods use a “bag” model where only part appearance is considered whereas other methods learn spatial models but only given the output of a particular feature detector. Previous techniques for learning both part appearance and spatial relations have instead used a highly supervised learning process that provides substantial information about object part location. We show that our weakly supervised technique produces better results than these previous highly supervised methods. Moreover, we investigate the degree to which both richer spatial models and richer appearance models are helpful in improving recognition performance. Our results show that while both spatial and appearance information can be useful, the effect on performance depends substantially on the particular object class and on the difficulty of the test dataset.

Proceedings ArticleDOI
20 Aug 2006
TL;DR: The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e., in multi-task learning problems) and derive an efficient EM learning algorithm for both models.
Abstract: Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g., in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e., in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S2PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e., in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S2PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.

Journal Article
TL;DR: In this article, the authors give dimension-free and data-dependent bounds for linear multi-task learning where a common linear operator is chosen to preprocess data for a vector of task specific linear-thresholding classifiers.
Abstract: We give dimension-free and data-dependent bounds for linear multi-task learning where a common linear operator is chosen to preprocess data for a vector of task specific linear-thresholding classifiers. The complexity penalty of multi-task learning is bounded by a simple expression involving the margins of the task-specific classifiers, the Hilbert-Schmidt norm of the selected preprocessor and the Hilbert-Schmidt norm of the covariance operator for the total mixture of all task distributions, or, alternatively, the Frobenius norm of the total Gramian matrix for the data-dependent version. The results can be compared to state-of-the-art results on linear single-task learning.

Journal ArticleDOI
TL;DR: Experiments show that using semisupervised learning and active learning simultaneously in CBIR is beneficial, and the proposed method achieves better performance than some existing methods.
Abstract: Relevance feedback is an effective scheme bridging the gap between high-level semantics and low-level features in content-based image retrieval (CBIR). In contrast to previous methods which rely on labeled images provided by the user, this article attempts to enhance the performance of relevance feedback by exploiting unlabeled images existing in the database. Concretely, this article integrates the merits of semisupervised learning and active learning into the relevance feedback process. In detail, in each round of relevance feedback two simple learners are trained from the labeled data, that is, images from user query and user feedback. Each learner then labels some unlabeled images in the database for the other learner. After retraining with the additional labeled data, the learners reclassify the images in the database and then their classifications are merged. Images judged to be positive with high confidence are returned as the retrieval result, while those judged with low confidence are put into the pool which is used in the next round of relevance feedback. Experiments show that using semisupervised learning and active learning simultaneously in CBIR is beneficial, and the proposed method achieves better performance than some existing methods.

Journal Article
TL;DR: This paper proposes a new active learning method also using the weighted least-squares learning, which it proves that the proposed active learning criterion is a more accurate predictor of the single-trial generalization error than the existing criterion.
Abstract: The goal of active learning is to determine the locations of training input points so that the generalization error is minimized. We discuss the problem of active learning in linear regression scenarios. Traditional active learning methods using least-squares learning often assume that the model used for learning is correctly specified. In many practical situations, however, this assumption may not be fulfilled. Recently, active learning methods using "importance"-weighted least-squares learning have been proposed, which are shown to be robust against misspecification of models. In this paper, we propose a new active learning method also using the weighted least-squares learning, which we call ALICE (Active Learning using the Importance-weighted least-squares learning based on Conditional Expectation of the generalization error). An important difference from existing methods is that we predict the conditional expectation of the generalization error given training input points, while existing methods predict the full expectation of the generalization error. Due to this difference, the training input design can be fine-tuned depending on the realization of training input points. Theoretically, we prove that the proposed active learning criterion is a more accurate predictor of the single-trial generalization error than the existing criterion. Numerical studies with toy and benchmark data sets show that the proposed method compares favorably to existing methods.

Journal Article
TL;DR: This paper introduces a learning method for two-layer feedforward neural networks based on sensitivity analysis, which uses a linear training algorithm for each of the two layers, which gives the local sensitivities of the least square errors with respect to input and output data.
Abstract: This paper introduces a learning method for two-layer feedforward neural networks based on sensitivity analysis, which uses a linear training algorithm for each of the two layers First, random values are assigned to the outputs of the first layer; later, these initial values are updated based on sensitivity formulas, which use the weights in each of the layers; the process is repeated until convergence Since these weights are learnt solving a linear system of equations, there is an important saving in computational time The method also gives the local sensitivities of the least square errors with respect to input and output data, with no extra computational cost, because the necessary information becomes available without extra calculations This method, called the Sensitivity-Based Linear Learning Method, can also be used to provide an initial set of weights, which significantly improves the behavior of other learning algorithms The theoretical basis for the method is given and its performance is illustrated by its application to several examples in which it is compared with several learning algorithms and well known data sets The results have shown a learning speed generally faster than other existing methods In addition, it can be used as an initialization tool for other well known methods with significant improvements