scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Analysis of function of rectified linear unit used in deep learning

TL;DR: A rectified linear unit (ReLU) is proposed to speed up the learning convergence of the deep learning using a using simpler network called the soft-committee machine and the reasons for the speedup are clarified.
Abstract: Deep Learning is attracting much attention in object recognition and speech processing. A benefit of using the deep learning is that it provides automatic pre-training. Several proposed methods that include auto-encoder are being successfully used in various applications. Moreover, deep learning uses a multilayer network that consists of many layers, a huge number of units, and huge amount of data. Thus, executing deep learning requires heavy computation, so deep learning is usually utilized with parallel computation with many cores or many machines. Deep learning employs the gradient algorithm, however this traps the learning into the saddle point or local minima. To avoid this difficulty, a rectified linear unit (ReLU) is proposed to speed up the learning convergence. However, the reasons the convergence is speeded up are not well understood. In this paper, we analyze the ReLU by a using simpler network called the soft-committee machine and clarify the reason for the speedup. We also train the network in an on-line manner. The soft-committee machine provides a good test bed to analyze deep learning. The results provide some reasons for the speedup of the convergence of the deep learning.
Citations
More filters
Journal ArticleDOI
TL;DR: The goal of this study is to provide a new computer-vision based technique to detect Alzheimer's disease in an efficient way using convolutional neural network and increased the classification accuracy by approximately 5% compared to state-of-the-art methods.
Abstract: Alzheimer’s disease (AD) is a progressive brain disease. The goal of this study is to provide a new computer-vision based technique to detect it in an efficient way. The brain-imaging data of 98 AD patients and 98 healthy controls was collected using data augmentation method. Then, convolutional neural network (CNN) was used, CNN is the most successful tool in deep learning. An 8-layer CNN was created with optimal structure obtained by experiences. Three activation functions (AFs): sigmoid, rectified linear unit (ReLU), and leaky ReLU. The three pooling-functions were also tested: average pooling, max pooling, and stochastic pooling. The numerical experiments demonstrated that leaky ReLU and max pooling gave the greatest result in terms of performance. It achieved a sensitivity of 97.96%, a specificity of 97.35%, and an accuracy of 97.65%, respectively. In addition, the proposed approach was compared with eight state-of-the-art approaches. The method increased the classification accuracy by approximately 5% compared to state-of-the-art methods.

229 citations


Cites background from "Analysis of function of rectified l..."

  • ...To solve this problem, the rectified linear unit (ReLU) became popular [31], since it accelerated the convergence of stochastic gradient descent compared to the sigmoid function....

    [...]

Journal ArticleDOI
TL;DR: A hybrid deep network framework to improve classification accuracy of four-class MI-EEG signal is proposed and could be of great interest for real-life brain-computer interfaces (BCIs).
Abstract: Objective Learning the structures and unknown correlations of a motor imagery electroencephalogram (MI-EEG) signal is important for its classification. It is also a major challenge to obtain good classification accuracy from the increased number of classes and increased variability from different people. In this study, a four-class MI task is investigated. Approach An end-to-end novel hybrid deep learning scheme is developed to decode the MI task from EEG data. The proposed algorithm consists of two parts: a. A one-versus-rest filter bank common spatial pattern is adopted to preprocess and pre-extract the features of the four-class MI signal. b. A hybrid deep network based on the convolutional neural network and long-term short-term memory network is proposed to extract and learn the spatial and temporal features of the MI signal simultaneously. Main results The main contribution of this paper is to propose a hybrid deep network framework to improve the classification accuracy of the four-class MI-EEG signal. The hybrid deep network is a subject-independent shared neural network, which means it can be trained by using the training data from all subjects to form one model. Significance The classification performance obtained by the proposed algorithm on brain-computer interface (BCI) competition IV dataset 2a in terms of accuracy is 83% and Cohen's kappa value is 0.80. Finally, the shared hybrid deep network is evaluated by every subject respectively, and the experimental results illustrate that the shared neural network has satisfactory accuracy. Thus, the proposed algorithm could be of great interest for real-life BCIs.

116 citations

Journal ArticleDOI
01 Jun 2019
TL;DR: The role of many different types of activation functions, as well as their respective advantages and disadvantages and applicable fields are discussed, so people can choose the appropriate activation functions to get the superior performance of ANNs.
Abstract: The development of Artificial Neural Networks (ANNs) has achieved a lot of fruitful results so far, and we know that activation function is one of the principal factors which will affect the performance of the networks. In this work, the role of many different types of activation functions, as well as their respective advantages and disadvantages and applicable fields are discussed, so people can choose the appropriate activation functions to get the superior performance of ANNs.

106 citations

Journal ArticleDOI
Evgin Goceri1
TL;DR: Experimental results and quantitative evaluations indicated that the proposed network model is able to achieve to extract desired features from images and provides automated diagnosis with 98.06% accuracy.
Abstract: Alzheimer's disease is a neuropsychiatric, progressive, also an irreversible disease. There is not an effective cure for the disease. However, early diagnosis has an important role for treatment planning to delay its progression since the treatments have the most impact at the early stage of the disease. Neuroimages obtained by different imaging techniques (for example, diffusion tensor-based and magnetic resonance-based imaging) provide powerful information and help to diagnose the disease. In this work, a deeply supervised and robust method has been developed using three dimensional features to provide objective and accurate diagnosis from magnetic resonance images. The main contributions are (a) a new three dimensional convolutional neural network topology; (b) a new Sobolev gradient-based optimization with weight values for each decision parameters; (c) application of the proposed topology and optimizer to diagnose Alzheimer's disease; (d) comparisons of the results obtained from the recent techniques that have been implemented for Alzheimer's disease diagnosis. Experimental results and quantitative evaluations indicated that the proposed network model is able to achieve to extract desired features from images and provides automated diagnosis with 98.06% accuracy.

85 citations


Cites methods from "Analysis of function of rectified l..."

  • ...However, training of neural networks with a gradient‐based learning is not efficient when the activation function is sigmoid because the sigmoid function has a widespread saturation property.(53) To overcome this problem, ReLU, which is defined by FReLU = max(0,x), has been used in many studies....

    [...]

Journal ArticleDOI
TL;DR: A multi-layer bidirectional recurrent neural network model based on LSTM and GRU is proposed to forecast short-term power load and is validated on two data sets and shows that the proposed method is superior to the competition winner in the precision of forecasting on the European Intelligent Technology Network competition data.
Abstract: Accurate power load forecasting is of great significance to ensure the safety, stability, and economic operation of the power system. In particular, short-term power load forecasting is the basis for grid planning and decision making. In recent years, machine learning algorithms have been widely used for short-term power load forecasting. Specifically, long short-term memory (LSTM) and gated recurrent unit (GRU) are tailored to time series data. In this study, a multi-layer bidirectional recurrent neural network model based on LSTM and GRU is proposed to forecast short-term power load and is validated on two data sets. The experimental result shows that the proposed method is superior to the competition winner in the precision of forecasting on the European Intelligent Technology Network competition data. On power company data in Chongqing, considering the differences of the seasonal load, the hourly peak load of different types of load data is used for experiments. The authors separately forecast the seasonal load and compare it with LSTM, support vector regression and back propagation models. The results of the comparison show the priority of the proposed method in terms of forecasting accuracy as compared to the adopted models.

84 citations


Cites background from "Analysis of function of rectified l..."

  • ...The rectified linear unit (ReLU) [31] layer is added before and after the hidden layer to introduce non-linear factors, which makes the model's expression ability stronger....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Abstract: We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

15,055 citations


"Analysis of function of rectified l..." refers background in this paper

  • ...In the field of neural network and its applications include object recognition and speech processing, deep learning [5] is attracting much attention....

    [...]

Proceedings Article
14 Jun 2011
TL;DR: This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity.
Abstract: While logistic sigmoid neurons are more biologically plausible than hyperbolic tangent neurons, the latter work better for training multi-layer neural networks. This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity

6,790 citations


"Analysis of function of rectified l..." refers background in this paper

  • ...There is a similar function called "softplus" [11] defined as In(l + exp(Yk' ))....

    [...]

Proceedings Article
04 Dec 2006
TL;DR: These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
Abstract: Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

4,385 citations


"Analysis of function of rectified l..." refers background in this paper

  • ...Key technology in deep learning is an automatic pre-training that extracts specifications of data while learning [5] , [6]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors used information geometry to calculate the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the spaces of linear dynamical systems for blind source deconvolution, and proved that Fisher efficient online learning has asymptotically the same performance as the optimal batch estimation of parameters.
Abstract: When a parameter space has a certain underlying structure, the ordinary gradient of a function does not represent its steepest direction, but the natural gradient does. Information geometry is used for calculating the natural gradients in the parameter space of perceptrons, the space of matrices (for blind source separation), and the space of linear dynamical systems (for blind source deconvolution). The dynamical behavior of natural gradient online learning is analyzed and is proved to be Fisher efficient, implying that it has asymptotically the same performance as the optimal batch estimation of parameters. This suggests that the plateau phenomenon, which appears in the backpropagation learning algorithm of multilayer perceptrons, might disappear or might not be so serious when the natural gradient is used. An adaptive method of updating the learning rate is proposed and analyzed.

2,504 citations

01 Jan 1988
TL;DR: A new learning algorithm is developed that is faster than standard backprop by an order of magnitude or more and that appears to scale up very well as the problem size increases.
Abstract: Most connectionist or "neural network" learning systems use some form of the back-propagation algorithm. However, back-propagation learning is too slow for many applications, and it scales up poorly as tasks become larger and more complex. The factors governing learning speed are poorly understood. I have begun a systematic, empirical study of learning speed in backprop-like algorithms, measured against a variety of benchmark problems. The goal is twofold: to develop faster learning algorithms and to contribute to the development of a methodology that will be of value in future studies of this kind. This paper is a progress report describing the results obtained during the first six months of this study. To date I have looked only at a limited set of benchmark problems, but the results on these are encouraging: I have developed a new learning algorithm that is faster than standard backprop by an order of magnitude or more and that appears to scale up very well as the problem size increases. This research was sponsored in part by the National Science Foundation under Contract Number EET-8716324 and by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 4976 under Contract F33615-87C-1499 and monitored by the Avionics Laboratory, Air Force Wright Aeronautical Laboratories, Aeronautical Systems Division (AFSC), Wright-Patterson AFB, OH 45433-6543. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of these agencies or of the U.S. Government.

934 citations

Trending Questions (1)
Is SVM a part of deep learning?

Thus, executing deep learning requires heavy computation, so deep learning is usually utilized with parallel computation with many cores or many machines.