scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Extreme learning machine: Theory and applications

01 Dec 2006-Neurocomputing (Elsevier)-Vol. 70, Iss: 1, pp 489-501
TL;DR: A new learning algorithm called ELM is proposed for feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs which tends to provide good generalization performance at extremely fast learning speed.
About: This article is published in Neurocomputing.The article was published on 2006-12-01. It has received 10217 citations till now. The article focuses on the topics: Extreme learning machine & Wake-sleep algorithm.
Citations
More filters
Journal ArticleDOI
01 Apr 2012
TL;DR: ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly and in theory, ELM can approximate any target continuous function and classify any disjoint regions.
Abstract: Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built. ELM works for the “generalized” single-hidden-layer feedforward networks (SLFNs), but the hidden layer (or called feature mapping) in ELM need not be tuned. Such SLFNs include but are not limited to SVM, polynomial network, and the conventional feedforward neural networks. This paper shows the following: 1) ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly; 2) from the optimization method point of view, ELM has milder optimization constraints compared to LS-SVM and PSVM; 3) in theory, compared to ELM, LS-SVM and PSVM achieve suboptimal solutions and require higher computational complexity; and 4) in theory, ELM can approximate any target continuous function and classify any disjoint regions. As verified by the simulation results, ELM tends to have better scalability and achieve similar (for regression and binary class cases) or much better (for multiclass cases) generalization performance at much faster learning speed (up to thousands times) than traditional SVM and LS-SVM.

4,835 citations


Cites background or methods from "Extreme learning machine: Theory an..."

  • ...ELM [12], [13] and its variants [14]–[16], [24]–[28] mainly focus on the regression applications....

    [...]

  • ...ELM is to minimize the training error as well as the norm of the output weights [12], [13]...

    [...]

  • ...The original solutions (21) of ELM [12], [13], [26], TERELM [22], and the weighted regularized ELM [21] are not able to apply kernels in their implementations....

    [...]

  • ...The minimal norm least square method instead of the standard optimization method was used in the original implementation of ELM [12], [13]...

    [...]

Journal ArticleDOI
TL;DR: This paper proves in an incremental constructive method that in order to let SLFNs work as universal approximators, one may simply randomly choose hidden nodes and then only need to adjust the output weights linking the hidden layer and the output layer.
Abstract: According to conventional neural network theories, single-hidden-layer feedforward networks (SLFNs) with additive or radial basis function (RBF) hidden nodes are universal approximators when all the parameters of the networks are allowed adjustable. However, as observed in most neural network implementations, tuning all the parameters of the networks may cause learning complicated and inefficient, and it may be difficult to train networks with nondifferential activation functions such as threshold networks. Unlike conventional neural network theories, this paper proves in an incremental constructive method that in order to let SLFNs work as universal approximators, one may simply randomly choose hidden nodes and then only need to adjust the output weights linking the hidden layer and the output layer. In such SLFNs implementations, the activation functions for additive nodes can be any bounded nonconstant piecewise continuous functions g:R→R and the activation functions for RBF nodes can be any integrable piecewise continuous functions g:R→R and ∫Rg(x)dx≠0. The proposed incremental method is efficient not only for SFLNs with continuous (including nondifferentiable) activation functions but also for SLFNs with piecewise continuous (such as threshold) activation functions. Compared to other popular methods such a new network is fully automatic and users need not intervene the learning process by manually tuning control parameters.

2,413 citations

Journal ArticleDOI
TL;DR: The challenges of using deep learning for remote-sensing data analysis are analyzed, recent advances are reviewed, and resources are provided that hope will make deep learning in remote sensing seem ridiculously simple.
Abstract: Central to the looming paradigm shift toward data-intensive science, machine-learning techniques are becoming increasingly important. In particular, deep learning has proven to be both a major breakthrough and an extremely powerful tool in many fields. Shall we embrace deep learning as the key to everything? Or should we resist a black-box solution? These are controversial issues within the remote-sensing community. In this article, we analyze the challenges of using deep learning for remote-sensing data analysis, review recent advances, and provide resources we hope will make deep learning in remote sensing seem ridiculously simple. More importantly, we encourage remote-sensing scientists to bring their expertise into deep learning and use it as an implicit general model to tackle unprecedented, large-scale, influential challenges, such as climate change and urbanization.

2,095 citations


Cites methods from "Extreme learning machine: Theory an..."

  • ...The ELM was introduced for efficient feature pooling and classification, making the ship detection accurate and fast....

    [...]

  • ...Tang et al. [106] offered a compressed-domain ship detection framework combined with SDA and an extreme learning machine (ELM) [107] for optical spaceborne images....

    [...]

  • ...[106] offered a compressed-domain ship detection framework combined with SDA and an extreme learning machine (ELM) [107] for optical spaceborne images....

    [...]

Journal ArticleDOI
TL;DR: The results show that the OS-ELM is faster than the other sequential algorithms and produces better generalization performance on benchmark problems drawn from the regression, classification and time series prediction areas.
Abstract: In this paper, we develop an online sequential learning algorithm for single hidden layer feedforward networks (SLFNs) with additive or radial basis function (RBF) hidden nodes in a unified framework. The algorithm is referred to as online sequential extreme learning machine (OS-ELM) and can learn data one-by-one or chunk-by-chunk (a block of data) with fixed or varying chunk size. The activation functions for additive nodes in OS-ELM can be any bounded nonconstant piecewise continuous functions and the activation functions for RBF nodes can be any integrable piecewise continuous functions. In OS-ELM, the parameters of hidden nodes (the input weights and biases of additive nodes or the centers and impact factors of RBF nodes) are randomly selected and the output weights are analytically determined based on the sequentially arriving data. The algorithm uses the ideas of ELM of Huang developed for batch learning which has been shown to be extremely fast with generalization performance better than other batch training methods. Apart from selecting the number of hidden nodes, no other control parameters have to be manually chosen. Detailed performance comparison of OS-ELM is done with other popular sequential learning algorithms on benchmark problems drawn from the regression, classification and time series prediction areas. The results show that the OS-ELM is faster than the other sequential algorithms and produces better generalization performance

1,800 citations


Cites background or methods from "Extreme learning machine: Theory an..."

  • ...Huang et al. [ 27 ], the basic idea of the proof can be summarized as follows....

    [...]

  • ...These have been formally stated in the following theorems [ 27 ]....

    [...]

  • ...In real applications, the number of hidden nodes will always be less than the number of training samples and, hence, the training error cannot be made exactly zero but can approach a nonzero training error . The following theorem formally states this fact [ 27 ]....

    [...]

  • ...OS-ELM originates from the batch learning extreme learning machine (ELM) [20]–[22], [ 27 ], [30] developed for SLFNs with additive and RBF nodes....

    [...]

  • ...Huang et al. [20]–[22], [ 27 ], [30] to provide the necessary background for the development of OS-ELM in Section III....

    [...]

Journal ArticleDOI
TL;DR: A survey on Extreme learning machine (ELM) and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELm, (3) online sequential ELM; and (4) incremental ELM and (5) ensemble ofELM.
Abstract: Computational intelligence techniques have been used in wide applications. Out of numerous computational intelligence techniques, neural networks and support vector machines (SVMs) have been playing the dominant roles. However, it is known that both neural networks and SVMs face some challenging issues such as: (1) slow learning speed, (2) trivial human intervene, and/or (3) poor computational scalability. Extreme learning machine (ELM) as emergent technology which overcomes some challenges faced by other techniques has recently attracted the attention from more and more researchers. ELM works for generalized single-hidden layer feedforward networks (SLFNs). The essence of ELM is that the hidden layer of SLFNs need not be tuned. Compared with those traditional computational intelligence techniques, ELM provides better generalization performance at a much faster learning speed and with least human intervene. This paper gives a survey on ELM and its variants, especially on (1) batch learning mode of ELM, (2) fully complex ELM, (3) online sequential ELM, (4) incremental ELM, and (5) ensemble of ELM.

1,767 citations


Cites background from "Extreme learning machine: Theory an..."

  • ...The hidden layer of ELM need not be iteratively tuned [5, 6]....

    [...]

  • ...[6–9]....

    [...]

  • ...The ith row of H is the hidden layer feature mapping with respect to the ith input xi : hðxiÞ: It has been proved [6] that from the interpolation capability point of view, if the activation function g is infinitely differentiable in any interval the hidden layer parameters can be randomly generated....

    [...]

  • ...1 [6] Given any small positive value [ 0; activation function g : R ! R which is infinitely differentiable in any interval, and N arbitrary distinct samples ðxi; tiÞ 2 R R; there exists L B N such that for any Int....

    [...]

  • ...The learning capability of extreme learning machines have been studied in two aspects: interpolation capability [6] and universal approximation capability [7–9]....

    [...]

References
More filters
Book
16 Jul 1998
TL;DR: Thorough, well-organized, and completely up to date, this book examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks.
Abstract: From the Publisher: This book represents the most comprehensive treatment available of neural networks from an engineering perspective. Thorough, well-organized, and completely up to date, it examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks. Written in a concise and fluid manner, by a foremost engineering textbook author, to make the material more accessible, this book is ideal for professional engineers and graduate students entering this exciting field. Computer experiments, problems, worked examples, a bibliography, photographs, and illustrations reinforce key concepts.

29,130 citations

01 Jan 1998

12,940 citations


"Extreme learning machine: Theory an..." refers methods in this paper

  • ...The forest cover type [2] for 30 30m cells was obtained from US forest service (USFS) region 2 resource information system (RIS) data....

    [...]

  • ...Medium size classification applications The ELM performance has also been tested on the Banana database(7) and some other multiclass databases from the Statlog collection [2]: Landsat satellite image (SatImage), Image segmentation (Segment) and Shuttle landing control database....

    [...]

Proceedings Article
Yoav Freund1, Robert E. Schapire1
03 Jul 1996
TL;DR: This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.
Abstract: In an earlier paper, we introduced a new "boosting" algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that con- sistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a "pseudo-loss" which is a method for forcing a learning algorithm of multi-label concepts to concentrate on the labels that are hardest to discriminate. In this paper, we describe experiments we carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems. We performed two sets of experiments. The first set compared boosting to Breiman's "bagging" method when used to aggregate various classifiers (including decision trees and single attribute- value tests). We compared the performance of the two methods on a collection of machine-learning benchmarks. In the second set of experiments, we studied in more detail the performance of boosting using a nearest-neighbor classifier on an OCR problem.

7,601 citations


"Extreme learning machine: Theory an..." refers methods in this paper

  • ...For this problem, as usually done in the literature [20,21,5,25] 75% and 25% samples are randomly chosen for training and testing at each trial, respectively....

    [...]

  • ...57% with 20 nodes, which is obviously higher than all the results so far reported in the literature using various popular algorithms such as SVM [20], SAOCIF [21], Cascade-Correlation algorithm [21], bagging and boosting methods [5], C4....

    [...]

01 Jan 1996

7,386 citations

Journal ArticleDOI
TL;DR: Decomposition implementations for two "all-together" multiclass SVM methods are given and it is shown that for large problems methods by considering all data at once in general need fewer support vectors.
Abstract: Support vector machines (SVMs) were originally designed for binary classification. How to effectively extend it for multiclass classification is still an ongoing research issue. Several methods have been proposed where typically we construct a multiclass classifier by combining several binary classifiers. Some authors also proposed methods that consider all classes at once. As it is computationally more expensive to solve multiclass problems, comparisons of these methods using large-scale problems have not been seriously conducted. Especially for methods solving multiclass SVM in one step, a much larger optimization problem is required so up to now experiments are limited to small data sets. In this paper we give decomposition implementations for two such "all-together" methods. We then compare their performance with three methods based on binary classifications: "one-against-all," "one-against-one," and directed acyclic graph SVM (DAGSVM). Our experiments indicate that the "one-against-one" and DAG methods are more suitable for practical use than the other methods. Results also show that for large problems methods by considering all data at once in general need fewer support vectors.

6,562 citations


"Extreme learning machine: Theory an..." refers methods in this paper

  • ...As proposed by Hsu and Lin [8], for each problem, we estimate the generalized accuracy using different combination of cost parameters C and kernel parameters g: C 1⁄4 1⁄22(12); 2(11); ....

    [...]

  • ...As proposed by Hsu and Lin [8], for each problem, we estimate the generalized accuracy using different combination of cost parameters C and kernel parameters g: C ¼ ½212; 211; . . . ; 2 1; 2 2 and g ¼ ½24; 23; . . . ; 2 9; 2 10 ....

    [...]