Showing papers on "Activation function published in 2021"

PDF

Open Access

Journal Article•DOI•

A survey on modern trainable activation functions.

[...]

Andrea Apicella¹, Francesco Donnarumma², Francesco Isgrò¹, Roberto Prevete¹•Institutions (2)

University of Naples Federico II¹, National Research Council²

01 Jun 2021-Neural Networks

TL;DR: A taxonomy of trainable activation functions is proposed, common and distinctive proprieties of recent and past models are highlighted, main advantages and limitations of this type of approach are discussed, and it is shown that many of the proposed approaches are equivalent to adding neuron layers which use fixed activation functions and some simple local rule that constrains the corresponding weight layers.

...read moreread less

162 citations

Journal Article•DOI•

A Novel Sigmoid-Function-Based Adaptive Weighted Particle Swarm Optimizer

[...]

Weibo Liu¹, Zidong Wang¹, Yuan Yuan¹, Nianyin Zeng², Kate Hone¹, Xiaohui Liu¹ - Show less +2 more•Institutions (2)

Brunel University London¹, Xiamen University²

15 Jan 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A novel particle swarm optimization (PSO) algorithm is put forward where a sigmoid-function-based weighting strategy is developed to adaptively adjust the acceleration coefficients, inspired by the activation function of neural networks.

...read moreread less

Abstract: In this paper, a novel particle swarm optimization (PSO) algorithm is put forward where a sigmoid-function-based weighting strategy is developed to adaptively adjust the acceleration coefficients. The newly proposed adaptive weighting strategy takes into account both the distances from the particle to the global best position and from the particle to its personal best position, thereby having the distinguishing feature of enhancing the convergence rate. Inspired by the activation function of neural networks, the new strategy is employed to update the acceleration coefficients by using the sigmoid function. The search capability of the developed adaptive weighting PSO (AWPSO) algorithm is comprehensively evaluated via eight well-known benchmark functions including both the unimodal and multimodal cases. The experimental results demonstrate that the designed AWPSO algorithm substantially improves the convergence rate of the particle swarm optimizer and also outperforms some currently popular PSO algorithms.

...read moreread less

160 citations

Journal Article•DOI•

Deep Residual Networks With Adaptively Parametric Rectifier Linear Units for Fault Diagnosis

[...]

Minghang Zhao¹, Shisheng Zhong¹, Xuyun Fu¹, Baoping Tang², Shaojiang Dong³, Michael Pecht⁴ - Show less +2 more•Institutions (4)

Harbin Institute of Technology¹, Chongqing University², Chongqing Jiaotong University³, University of Maryland, College Park⁴

01 Mar 2021-IEEE Transactions on Industrial Electronics

TL;DR: This article develops a new activation function, i.e., adaptively parametric rectifier linear units, and inserts the activation function into deep residual networks to improve the feature learning ability, so that each input signal is trained to have its own set of nonlinear transformations.

...read moreread less

Abstract: Vibration signals under the same health state often have large differences due to changes in operating conditions. Likewise, the differences among vibration signals under different health states can be small under some operating conditions. Traditional deep learning methods apply fixed nonlinear transformations to all the input signals, which have a negative impact on the discriminative feature learning ability, i.e., projecting the intraclass signals into the same region and the interclass signals into distant regions. Aiming at this issue, this article develops a new activation function, i.e., adaptively parametric rectifier linear units, and inserts the activation function into deep residual networks to improve the feature learning ability, so that each input signal is trained to have its own set of nonlinear transformations. To be specific, a subnetwork is inserted as an embedded module to learn slopes to be used in the nonlinear transformation. The slopes are dependent on the input signal, and thereby the developed method has more flexible nonlinear transformations than the traditional deep learning methods. Finally, the improved performance of the developed method in learning discriminative features has been validated through fault diagnosis applications.

...read moreread less

116 citations

Book Chapter•DOI•

Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks

[...]

Tomasz Szandała¹•Institutions (1)

Wrocław University of Technology¹

01 Jan 2021-arXiv: Learning

TL;DR: This research paper will evaluate the commonly used additive functions, such as swish, ReLU, Sigmoid, and so forth, followed by their properties, own cons and pros, and particular formula application recommendations.

...read moreread less

Abstract: The primary neural networks’ decision-making units are activation functions. Moreover, they evaluate the output of networks neural node; thus, they are essential for the performance of the whole network. Hence, it is critical to choose the most appropriate activation function in neural networks calculation. Acharya et al. (2018) suggest that numerous recipes have been formulated over the years, though some of them are considered deprecated these days since they are unable to operate properly under some conditions. These functions have a variety of characteristics, which are deemed essential to successfully learning. Their monotonicity, individual derivatives, and finite of their range are some of these characteristics. This research paper will evaluate the commonly used additive functions, such as swish, ReLU, Sigmoid, and so forth. This will be followed by their properties, own cons and pros, and particular formula application recommendations.

...read moreread less

113 citations

Journal Article•DOI•

Deep Neural Networks for Estimation and Inference

[...]

Max H. Farrell, Tengyuan Liang, Sanjog Misra

01 Jan 2021-Econometrica

TL;DR: In this paper, the authors studied the nonasymptotic high probability bounds for deep feed-forward neural networks and their use in semiparametric inference, and showed that these results can be used to develop a semi-parametric inference model for direct mail marketing.

...read moreread less

Abstract: We study deep neural networks and their use in semiparametric inference. We establish novel nonasymptotic high probability bounds for deep feedforward neural nets. These deliver rates of convergence that are sufficiently fast (in some cases minimax optimal) to allow us to establish valid second‐step inference after first‐step estimation with deep learning, a result also new to the literature. Our nonasymptotic high probability bounds, and the subsequent semiparametric inference, treat the current standard architecture: fully connected feedforward neural networks (multilayer perceptrons), with the now‐common rectified linear unit activation function, unbounded weights, and a depth explicitly diverging with the sample size. We discuss other architectures as well, including fixed‐width, very deep networks. We establish the nonasymptotic bounds for these deep nets for a general class of nonparametric regression‐type loss functions, which includes as special cases least squares, logistic regression, and other generalized linear models. We then apply our theory to develop semiparametric inference, focusing on causal parameters for concreteness, and demonstrate the effectiveness of deep learning with an empirical application to direct mail marketing.

...read moreread less

96 citations

Journal Article•DOI•

Subsurface drain spacing in the unsteady conditions by HYDRUS-3D and artificial neural networks

[...]

Kaveh Ostad-Ali-Askari¹, Mohammed Shayan²•Institutions (2)

Isfahan University of Technology¹, New York University²

01 Sep 2021-Arabian Journal of Geosciences

TL;DR: In this article, an artificial neural network analyst model was advanced based on the information from the well-tested model HYDRUS-2D/3D, and the methodological process for defining the drainage retention capacity of surface layers under conditions of unsteady-state groundwater flow was demonstrated.

...read moreread less

Abstract: The methodological process for defining the drainage retention capacity of surface layers under conditions of unsteady-state groundwater flow was demonstrated. An artificial neural network analyst model was advanced based on the information from the well-tested model HYDRUS-2D/3D. Artificial neural network knowledge is reported as an intermittent to physical-based modeling of subsurface water distribution from trickle emitters. Three options are prospected to create input-output functional relations from information created using a numerical model (HYDRUS-2D). Artificial neural networks are a tool for modeling of non-linear systems in various engineering fields. These networks are effective tools for modeling non-linear systems. Each artificial neural network includes an input layer and an output layer between which there are one or some hidden layers. In each layer, there are one or several processing elements or neurons. The neurons of the input layer are independent variables of the understudy issue and the neurons of the output layer are its dependent variables. An artificial neural system, through exerting weight on inputs and by using an activation function, attempts to achieve a desirable output. In this research, in order to calculate the drain spacing in an unsteady state in a region situated in the northeast of Ahwaz, Iran, with different soil properties and drain spacing, the artificial neural networks have been used. The neurons in the input layer were specific yield, hydraulic conductivity, depth of the impermeable layer, and height of the water table in the middle of the interval between the drains in two-time steps. The neurons in the output layer were drain spacing. The network designed in this research included a hidden layer with four neurons. The distance of drains computed via this method had a good agreement with real values and had a high precision in comparison with other methods. This was done for three types of linear activation functions and hyperbolic and sigmoid tangents. The mean error was 0.1455, 0.092, and 0.0491, respectively.

...read moreread less

88 citations

Journal Article•DOI•

“SPOCU”: scaled polynomial constant unit activation function

[...]

Jozef Kiseľák¹, Ying Lu², Jan Svihra³, Peter Szépe, Milan Stehlík⁴, Milan Stehlík¹, Milan Stehlík⁵ - Show less +3 more•Institutions (5)

Johannes Kepler University of Linz¹, Stanford University², Jessenius Faculty of Medicine³, University of Iowa⁴, University of Valparaíso⁵

01 Apr 2021-Neural Computing and Applications

TL;DR: A general novel methodology, scaled polynomial constant unit activation function “SPOCU,” is introduced and shown to work satisfactorily on a variety of problems, and it is shown that SPOCU can overcome already introduced activation functions with good properties on generic problems.

...read moreread less

Abstract: We address the following problem: given a set of complex images or a large database, the numerical and computational complexity and quality of approximation for neural network may drastically differ from one activation function to another. A general novel methodology, scaled polynomial constant unit activation function “SPOCU,” is introduced and shown to work satisfactorily on a variety of problems. Moreover, we show that SPOCU can overcome already introduced activation functions with good properties, e.g., SELU and ReLU, on generic problems. In order to explain the good properties of SPOCU, we provide several theoretical and practical motivations, including tissue growth model and memristive cellular nonlinear networks. We also provide estimation strategy for SPOCU parameters and its relation to generation of random type of Sierpinski carpet, related to the [pppq] model. One of the attractive properties of SPOCU is its genuine normalization of the output of layers. We illustrate SPOCU methodology on cancer discrimination, including mammary and prostate cancer and data from Wisconsin Diagnostic Breast Cancer dataset. Moreover, we compared SPOCU with SELU and ReLU on large dataset MNIST, which justifies usefulness of SPOCU by its very good performance.

...read moreread less

67 citations

Journal Article•DOI•

Topological properties of the set of functions generated by neural networks of fixed size

[...]

Philipp Petersen¹, Mones Raslan², Felix Voigtlaender³•Institutions (3)

University of Vienna¹, Technical University of Berlin², Catholic University of Eichstätt-Ingolstadt³

01 Apr 2021-Foundations of Computational Mathematics

TL;DR: Overall, the findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.

...read moreread less

Abstract: We analyze the topological properties of the set of functions that can be implemented by neural networks of a fixed size. Surprisingly, this set has many undesirable properties. It is highly non-convex, except possibly for a few exotic activation functions. Moreover, the set is not closed with respect to $$L^p$$ -norms, $$0< p < \infty $$ , for all practically used activation functions, and also not closed with respect to the $$L^\infty $$ -norm for all practically used activation functions except for the ReLU and the parametric ReLU. Finally, the function that maps a family of weights to the function computed by the associated network is not inverse stable for every practically used activation function. In other words, if $$f_1, f_2$$ are two functions realized by neural networks and if $$f_1, f_2$$ are close in the sense that $$\Vert f_1 - f_2\Vert _{L^\infty } \le \varepsilon $$ for $$\varepsilon > 0$$ , it is, regardless of the size of $$\varepsilon $$ , usually not possible to find weights $$w_1, w_2$$ close together such that each $$f_i$$ is realized by a neural network with weights $$w_i$$ . Overall, our findings identify potential causes for issues in the training procedure of deep learning such as no guaranteed convergence, explosion of parameters, and slow convergence.

...read moreread less

66 citations

Journal Article•DOI•

Fractal Solitons, Arbitrary Function Solutions, Exact Periodic Wave and Breathers for a Nonlinear Partial Differential Equation by Using Bilinear Neural Network Method

[...]

Runfa Zhang¹, Sudao Bilige¹, Temuer Chaolu²•Institutions (2)

Inner Mongolia University of Technology¹, Shanghai Maritime University²

01 Feb 2021-Journal of Systems Science & Complexity

TL;DR: This paper extends a method, called bilinear neural network method (BNNM), to solve exact solutions to nonlinear partial differential equation and new, test functions are constructed by using this method.

...read moreread less

Abstract: This paper extends a method, called bilinear neural network method (BNNM), to solve exact solutions to nonlinear partial differential equation. New, test functions are constructed by using this method. These test functions are composed of specific activation functions of single-layer model, specific activation functions of “2-2” model and arbitrary functions of “2-2-3” model. By means of the BNNM, nineteen sets of exact analytical solutions and twenty-four arbitrary function solutions of the dimensionally reduced p-gBKP equation are obtained via symbolic computation with the help of Maple. The fractal solitons waves are obtained by choosing appropriate values and the self-similar characteristics of these waves are observed by reducing the observation range and amplifying the partial picture. By giving a specific activation function in the single layer neural network model, exact periodic waves and breathers are obtained. Via various three-dimensional plots, contour plots and density plots, the evolution characteristic of these waves are exhibited.

...read moreread less

62 citations

Journal Article•DOI•

Approximation Spaces of Deep Neural Networks

[...]

Rémi Gribonval, Gitta Kutyniok, Morten Nielsen¹, Felix Voigtlaender•Institutions (1)

Aalborg University¹

05 May 2021-Constructive Approximation

TL;DR: It is established that allowing the networks to have certain types of “skip connections” does not change the resulting approximation spaces, and some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep.

...read moreread less

Abstract: We study the expressivity of deep neural networks. Measuring a network's complexity by its number of connections or by its number of neurons, we consider the class of functions for which the error of best approximation with networks of a given complexity decays at a certain rate when increasing the complexity budget. Using results from classical approximation theory, we show that this class can be endowed with a (quasi)-norm that makes it a linear function space, called approximation space. We establish that allowing the networks to have certain types of "skip connections" does not change the resulting approximation spaces. We also discuss the role of the network's nonlinearity (also known as activation function) on the resulting spaces, as well as the role of depth. For the popular ReLU nonlinearity and its powers, we relate the newly constructed spaces to classical Besov spaces. The established embeddings highlight that some functions of very low Besov smoothness can nevertheless be well approximated by neural networks, if these networks are sufficiently deep.

...read moreread less

58 citations

Journal Article•DOI•

Ensemble of convolutional neural networks trained with different activation functions

[...]

Gianluca Maguolo¹, Loris Nanni¹, Stefano Ghidoni¹•Institutions (1)

University of Padua¹

15 Mar 2021-Expert Systems With Applications

TL;DR: The goal of this work is to propose an ensemble of Convolutional Neural Networks trained using several different activation functions, and a novel activation function is here proposed for the first time to improve the performance of Convolved Neural Networks in small/medium size biomedical datasets.

...read moreread less

Abstract: Activation functions play a vital role in the training of Convolutional Neural Networks. For this reason, developing efficient and well-performing functions is a crucial problem in the deep learning community. The idea of these approaches is to allow a reliable parameter learning, avoiding vanishing gradient problems. The goal of this work is to propose an ensemble of Convolutional Neural Networks trained using several different activation functions. Moreover, a novel activation function is here proposed for the first time. Our aim is to improve the performance of Convolutional Neural Networks in small/medium sized biomedical datasets. Our results clearly show that the proposed ensemble outperforms Convolutional Neural Networks trained with a standard ReLU as activation function. The proposed ensemble outperforms with a p-value of 0.01 each tested stand-alone activation function; for reliable performance comparison we tested our approach on more than 10 datasets, using two well-known Convolutional Neural Networks: Vgg16 and ResNet50.

...read moreread less

Journal Article•DOI•

Reluplex: a calculus for reasoning about deep neural networks

[...]

Guy Katz¹, Guy Katz², Clark Barrett², David L. Dill², Kyle D. Julian², Mykel J. Kochenderfer² - Show less +2 more•Institutions (2)

Hebrew University of Jerusalem¹, Stanford University²

01 Jul 2021-Formal Methods in System Design

TL;DR: A novel, scalable, and efficient technique based on the simplex method, extended to handle the non-convex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks.

...read moreread less

Abstract: Deep neural networks have emerged as a widely used and effective means for tackling complex, real-world problems. However, a major obstacle in applying them to safety-critical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counter-examples). The technique is based on the simplex method, extended to handle the non-convex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the next-generation airborne collision avoidance system for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks that could be verified previously.

...read moreread less

Journal Article•DOI•

Biased ReLU neural networks

[...]

XingLong Liang¹, Jun Xu¹•Institutions (1)

Harbin Institute of Technology¹

29 Jan 2021-Neurocomputing

TL;DR: The results indicate that for the same input dimension, theBRNN divides the input space into a greater number of linear regions than the ReLU network, which explains to a certain extent why the BRNN has the superior approximation ability.

...read moreread less

Journal Article•DOI•

RSigELU: A nonlinear activation function for deep neural networks

[...]

Serhat Kiliçarslan¹, Mete Celik²•Institutions (2)

Gaziosmanpaşa University¹, Erciyes University²

15 Jul 2021-Expert Systems With Applications

TL;DR: The proposed RSigELU activation functions can overcome the vanishing gradient and negative region problems and can be effective in the positive, negative, and linear activation regions.

...read moreread less

Abstract: In deep learning models, the inputs to the network are processed using activation functions to generate the output corresponding to these inputs. Deep learning models are of particular importance in analyzing big data with numerous parameters and forecasting and are useful for image processing, natural language processing, object recognition, and financial forecasting. Sigmoid and tangent activation functions, which are traditional activation functions, are widely used in deep learning models. However, the sigmoid and tangent activation functions face the vanishing gradient problem. In order to overcome this problem, the ReLU activation function and its derivatives were proposed in the literature. However, there is a negative region problem in these activation functions. In this study, novel RSigELU activation functions, such as single-parameter RSigELU (RSigELUS) and double-parameter (RSigELUD), which are a combination of ReLU, sigmoid, and ELU activation functions, were proposed. The proposed RSigELUS and RSigELUD activation functions can overcome the vanishing gradient and negative region problems and can be effective in the positive, negative, and linear activation regions. Performance evaluation of the proposed RSigELU activation functions was performed on the MNIST, Fashion MNIST, CIFAR-10, and IMDb Movie benchmark datasets. Experimental evaluations showed that the proposed activation functions perform better than other activation functions.

...read moreread less

Journal Article•DOI•

Efficient approximation of solutions of parametric linear transport equations by ReLU DNNs

[...]

Fabian Laakmann¹, Philipp Petersen²•Institutions (2)

University of Oxford¹, University of Vienna²

28 Jan 2021-Advances in Computational Mathematics

TL;DR: In this paper, deep neural networks with the ReLU activation function can efficiently approximate the solutions of various types of parametric linear transport equations for non-smooth initial conditions, however, approximation of these functions suffers from a curse of dimension.

...read moreread less

Abstract: We demonstrate that deep neural networks with the ReLU activation function can efficiently approximate the solutions of various types of parametric linear transport equations For non-smooth initial conditions, the solutions of these PDEs are high-dimensional and non-smooth Therefore, approximation of these functions suffers from a curse of dimension We demonstrate that through their inherent compositionality deep neural networks can resolve the characteristic flow underlying the transport equations and thereby allow approximation rates independent of the parameter dimension

...read moreread less

Journal Article•DOI•

Adaptive sigmoid-like and PReLU activation functions for all-optical perceptron.

[...]

Jasna V. Crnjanski¹, Marko Krstić¹, Angelina Totovic², Nikos Pleros², Dejan M. Gvozdić¹ - Show less +1 more•Institutions (2)

University of Belgrade¹, Aristotle University of Thessaloniki²

01 May 2021-Optics Letters

TL;DR: In this article, an approach for the generation of an adaptive sigmoid-like and PReLU nonlinear activation function of an all-optical perceptron, exploiting the bistability of an injection-locked Fabry-Perot semiconductor laser, is presented.

...read moreread less

Abstract: We present an approach for the generation of an adaptive sigmoid-like and PReLU nonlinear activation function of an all-optical perceptron, exploiting the bistability of an injection-locked Fabry–Perot semiconductor laser. The profile of the activation function can be tailored by adjusting the injection-locked side-mode order, frequency detuning of the input optical signal, Henry factor, or bias current. The universal fitting function for both families of the activation functions is presented.

...read moreread less

Journal Article•DOI•

Multiple and Complete Stability of Recurrent Neural Networks With Sinusoidal Activation Function

[...]

Peng Liu¹, Jun Wang², Zhenyuan Guo³•Institutions (3)

Zhengzhou University of Light Industry¹, City University of Hong Kong², Hunan University³

01 Jan 2021-IEEE Transactions on Neural Networks

TL;DR: New theoretical results on multistability and complete stability of recurrent neural networks with a sinusoidal activation function are presented, and criteria for complete stability and instability of equilibria are derived for recurrent Neural networks without time delay.

...read moreread less

Abstract: This article presents new theoretical results on multistability and complete stability of recurrent neural networks with a sinusoidal activation function. Sufficient criteria are provided for ascertaining the stability of recurrent neural networks with various numbers of equilibria, such as a unique equilibrium, finite, and countably infinite numbers of equilibria. Multiple exponential stability criteria of equilibria are derived, and the attraction basins of equilibria are estimated. Furthermore, criteria for complete stability and instability of equilibria are derived for recurrent neural networks without time delay. In contrast to the existing stability results with a finite number of equilibria, the new criteria, herein, are applicable for both finite and countably infinite numbers of equilibria. Two illustrative examples with finite and countably infinite numbers of equilibria are elaborated to substantiate the results.

...read moreread less

Journal Article•DOI•

The Universal Approximation Property

[...]

Anastasis Kratsios

01 Jun 2021-Annals of Mathematics and Artificial Intelligence

TL;DR: A characterization, a representation, a construction method, and an existence result are presented, each of which applies to any universal approximator on most function spaces of practical interest, which improves the known capabilities of the feed-forward architecture.

...read moreread less

Abstract: The universal approximation property of various machine learning models is currently only understood on a case-by-case basis, limiting the rapid development of new theoretically justified neural network architectures and blurring our understanding of our current models’ potential. This paper works towards overcoming these challenges by presenting a characterization, a representation, a construction method, and an existence result, each of which applies to any universal approximator on most function spaces of practical interest. Our characterization result is used to describe which activation functions allow the feed-forward architecture to maintain its universal approximation capabilities when multiple constraints are imposed on its final layers and its remaining layers are only sparsely connected. These include a rescaled and shifted Leaky ReLU activation function but not the ReLU activation function. Our construction and representation result is used to exhibit a simple modification of the feed-forward architecture, which can approximate any continuous function with non-pathological growth, uniformly on the entire Euclidean input space. This improves the known capabilities of the feed-forward architecture.

...read moreread less

Journal Article•DOI•

Fully Complex-valued Dendritic Neuron Model.

[...]

Shangce Gao¹, MengChu Zhou², Ziqian Wang¹, Daiki Sugiyama, Jiujun Cheng³, Jiahai Wang⁴, Yuki Todo⁵ - Show less +3 more•Institutions (5)

University of Toyama¹, New Jersey Institute of Technology², Tongji University³, Sun Yat-sen University⁴, Kanazawa University⁵

06 Sep 2021-IEEE Transactions on Neural Networks

TL;DR: In this article, a complex-valued DNM (CDNM) model was proposed, which consists of a number of multiple/deep-layer McCulloch-Pitts neurons.

...read moreread less

Abstract: A single dendritic neuron model (DNM) that owns the nonlinear information processing ability of dendrites has been widely used for classification and prediction. Complex-valued neural networks that consist of a number of multiple/deep-layer McCulloch-Pitts neurons have achieved great successes so far since neural computing was utilized for signal processing. Yet no complex value representations appear in single neuron architectures. In this article, we first extend DNM from a real-value domain to a complex-valued one. Performance of complex-valued DNM (CDNM) is evaluated through a complex xor problem, a non-minimum phase equalization problem, and a real-world wind prediction task. Also, a comparative analysis on a set of elementary transcendental functions as an activation function is implemented and preparatory experiments are carried out for determining hyperparameters. The experimental results indicate that the proposed CDNM significantly outperforms real-valued DNM, complex-valued multi-layer perceptron, and other complex-valued neuron models.

...read moreread less

Journal Article•

Discovering Parametric Activation Functions

[...]

Garrett Bingham¹, Risto Miikkulainen¹•Institutions (1)

University of Texas at Austin¹

04 May 2021-arXiv: Learning

TL;DR: This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance, and discovers both general activation functions and specialized functions for different architectures, consistently improving accuracy over ReLU and other activation functions by significant margins.

...read moreread less

Abstract: Recent studies have shown that the choice of activation function can significantly affect the performance of deep learning networks. However, the benefits of novel activation functions have been inconsistent and task-dependent, and therefore the rectified linear unit (ReLU) is still the most commonly used. This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Evolutionary search is used to discover the general form of the function, and gradient descent to optimize its parameters for different parts of the network and over the learning process. Experiments with three different neural network architectures on the CIFAR-100 image classification dataset show that this approach is effective. It discovers different activation functions for different architectures, and consistently improves accuracy over ReLU and other recently proposed activation functions by significant margins. The approach can therefore be used as an automated optimization step in applying deep learning to new tasks.

...read moreread less

Journal Article•DOI•

Novel Analog Implementation of a Hyperbolic Tangent Neuron in Artificial Neural Networks

[...]

Fatemeh Mohammadi Shakiba¹, MengChu Zhou¹•Institutions (1)

New Jersey Institute of Technology¹

01 Nov 2021-IEEE Transactions on Industrial Electronics

TL;DR: The purpose of implementing a CMOS-based design for a hyperbolic tangent activation function (Tanh) to be used in memristive-based neuromorphic architectures is to decrease power dissipation and area usage and increase the overall speed of computation in ANNs.

...read moreread less

Abstract: Recently, enormous datasets have made power dissipation and area usage lie at the heart of designs for artificial neural networks (ANNs). Considering the significant role of activation functions in neurons and the growth of hardware-based neural networks like memristive neural networks, this work proposes a novel design for a hyperbolic tangent activation function (Tanh) to be used in memristive-based neuromorphic architectures. The purpose of implementing a CMOS-based design for Tanh is to decrease power dissipation and area usage. This design also increases the overall speed of computation in ANNs, while keeping the accuracy in an acceptable range. The proposed design is one of the first analog designs for the hyperbolic tangent and its performance is analyzed by using two well-known datsets, including the Modified National Institute of Standards and Technology (MNIST) and Fashion-MNIST. The direct implementation of the proposed design for Tanh is proposed and investigated via software and hardware modeling.

...read moreread less

Journal Article•DOI•

TanhExp: A smooth activation function with high convergence speed for lightweight neural networks

[...]

Xinyu Liu¹, Xinyu Liu², Xiaoguang Di²•Institutions (2)

City University of Hong Kong¹, Harbin Institute of Technology²

01 Mar 2021-Iet Computer Vision

TL;DR: TanhExp as discussed by the authors proposed a novel activation function named Tanh Exponential Activation Function (Tanhexp) which can improve the performance for these networks on image classification task significantly.

...read moreread less

Abstract: Lightweight or mobile neural networks used for real-time computer vision tasks contain fewer parameters than normal networks, which lead to a constrained performance. In this work, we proposed a novel activation function named Tanh Exponential Activation Function (TanhExp) which can improve the performance for these networks on image classification task significantly. The definition of TanhExp is f(x) = xtanh(e^x). We demonstrate the simplicity, efficiency, and robustness of TanhExp on various datasets and network models and TanhExp outperforms its counterparts in both convergence speed and accuracy. Its behaviour also remains stable even with noise added and dataset altered. We show that without increasing the size of the network, the capacity of lightweight neural networks can be enhanced by TanhExp with only a few training epochs and no extra parameters added.

...read moreread less

Journal Article•DOI•

Robust model-free control for redundant robotic manipulators based on zeroing neural networks activated by nonlinear functions

[...]

Ning Tan¹, Ning Tan², Ning Tan³, Peng Yu³•Institutions (3)

East China Normal University¹, South China University of Technology², Sun Yat-sen University³

28 May 2021-Neurocomputing

TL;DR: In this paper, two nonlinear-function-activated zeroing neural networks are employed to solve the Jacobian estimation problem and trajectory tracking problem respectively, and theoretical analysis proves that the proposed control scheme has finite-time convergence when employing nonlinear activation functions and the tracking error will not exceed the upper bound with the bounded noise interference.

...read moreread less

Journal Article•DOI•

EFTL: Complex Convolutional Networks With Electromagnetic Feature Transfer Learning for SAR Target Recognition

[...]

Jiaming Liu¹, Mengdao Xing¹, Hanwen Yu², Guang-Cai Sun¹•Institutions (2)

Xidian University¹, University of Electronic Science and Technology of China²

04 Jun 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A special kind of transfer learning based on the electromagnetic property from the attributed scattering center model is applied in networks to modulate the first convolutional layer and shows a better performance in terms of classification accuracy compared to random weight initialization.

...read moreread less

Abstract: Considering that synthetic aperture radar (SAR) images obtained directly after signal processing are in the form of complex matrices, we propose a complex convolutional network for SAR target recognition. In this article, we give a brief introduction to complex convolutional networks and compare them with the real counterpart. A complex activation function is applied to analyze the influence of phase information in complex neural networks. Inspired by the theory of network visualization, a special kind of transfer learning based on the electromagnetic property from the attributed scattering center model is applied in our networks to modulate the first convolutional layer. The experiment shows a better performance in terms of classification accuracy compared to random weight initialization.

...read moreread less

Journal Article•DOI•

Adaptive diagnostic machine learning technique for classification of cell decisions for AKT protein

[...]

Ayodeji Olalekan Salau¹, Shruti Jain²•Institutions (2)

Afe Babalola University¹, Jaypee University of Information Technology²

01 Jan 2021-Informatics in Medicine Unlocked

TL;DR: The results of the experimental study indicate that it is possible to create self-consistent cell-signalling compendia based on AKT protein data that have been computationally simulated to provide valuable insights for cell survival/death regulation.

...read moreread less

Journal Article•DOI•

Synchronization of Nonidentical Neural Networks With Unknown Parameters and Diffusion Effects via Robust Adaptive Control Techniques

[...]

Hao Zhang¹, Zhigang Zeng¹•Institutions (1)

Huazhong University of Science and Technology¹

15 Jan 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper considers the self-synchronization and tracking synchronization issues for a class of nonidentically coupled neural networks model with unknown parameters and diffusion effects using the special structure of neural networks with global Lipschitz activation function.

...read moreread less

Abstract: This paper considers the self-synchronization and tracking synchronization issues for a class of nonidentically coupled neural networks model with unknown parameters and diffusion effects. Using the special structure of neural networks with global Lipschitz activation function, nonidentical terms are treated as external disturbances, which can then be compensated via robust adaptive control techniques. For the case where no common reference trajectory is given in advance, a distributed adaptive controller is proposed to drive the synchronization error to an adjustable bounded area. For the case where a reference trajectory is predesigned, two distributed adaptive controllers are proposed, respectively, to address the tracking synchronization problem with bounded and unbounded reference trajectories, different decomposition methods are given to extract the heterogeneous characteristics. To avoid the appearance of global information, such as the spectrum of the coupling matrix, corresponding adaptive designs on coupling strengths are also provided for both cases. Moreover, the upper bounds of the final synchronization errors can be gradually adjusted according to the parameters of the adaptive designs. Finally, numerical examples are given to test the effectiveness of the control algorithms.

...read moreread less

Journal Article•DOI•

On the approximation of functions by tanh neural networks.

[...]

Tim De Ryck¹, Samuel Lanthaler¹, Siddhartha Mishra¹•Institutions (1)

ETH Zurich¹

01 Nov 2021-Neural Networks

TL;DR: The authors derived bounds on the error in high-order Sobolev norms incurred by neural networks with the hyperbolic tangent activation function, and provided explicit estimates on the approximation error with respect to the size of the neural networks.

...read moreread less

Journal Article•DOI•

Modelling and Computational Experiment to Obtain Optimized Neural Network for Battery Thermal Management Data

[...]

Asif Afzal, Javed Khan Bhutto, Abdulrahman A. Alrobaian, Abdul Razak Kaladgi, Sher Afghan Khan - Show less +1 more

05 Nov 2021-Energies

TL;DR: In this paper, an optimized neural network (NN) model was proposed to predict battery average Nusselt number (Nuavg) data using four activations functions, including Sigmoidal, Gaussian, Tanh, and Linear functions.

...read moreread less

Abstract: The focus of this work is to computationally obtain an optimized neural network (NN) model to predict battery average Nusselt number (Nuavg) data using four activations functions. The battery Nuavg is highly nonlinear as reported in the literature, which depends mainly on flow velocity, coolant type, heat generation, thermal conductivity, battery length to width ratio, and space between the parallel battery packs. Nuavg is modeled at first using only one hidden layer in the network (NN1). The neurons in NN1 are experimented from 1 to 10 with activation functions: Sigmoidal, Gaussian, Tanh, and Linear functions to get the optimized NN1. Similarly, deep NN (NND) was also analyzed with neurons and activations functions to find an optimized number of hidden layers to predict the Nuavg. RSME (root mean square error) and R-Squared (R2) is accessed to conclude the optimized NN model. From this computational experiment, it is found that NN1 and NND both accurately predict the battery data. Six neurons in the hidden layer for NN1 give the best predictions. Sigmoidal and Gaussian functions have provided the best results for the NN1 model. In NND, the optimized model is obtained at different hidden layers and neurons for each activation function. The Sigmoidal and Gaussian functions outperformed the Tanh and Linear functions in an NN1 model. The linear function, on the other hand, was unable to forecast the battery data adequately. The Gaussian and Linear functions outperformed the other two NN-operated functions in the NND model. Overall, the deep NN (NND) model predicted better than the single-layered NN (NN1) model for each activation function.

...read moreread less

Journal Article•DOI•

A new result on H∞ performance state estimation for static neural networks with time-varying delays

[...]

Yufeng Tian¹, Zhanshan Wang¹•Institutions (1)

Northeastern University (China)¹

01 Jan 2021-Applied Mathematics and Computation

TL;DR: A parameter-dependent reciprocally convex inequality (PDRCI) is presented, which encompasses some existing results as its special cases and the restrictions on slack matrices are overcome, which directly leads to performance improvement and reduction of conservativeness in the estimator solution.

...read moreread less

Journal Article•DOI•

Design of In-Situ Learning Bidirectional Associative Memory Neural Network Circuit With Memristor Synapse

[...]

Jichen Shi¹, Zhigang Zeng¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Oct 2021

TL;DR: A novel network circuit based on memristor synapses is proposed for bidirectional associative memory with in-situ learning method and an analog neuron circuit is designed to emulate the cubic activation function of neural networks.

...read moreread less

Abstract: Memristor is considered as a promising synaptic device for neural networks because of its tunable and non-volatile resistance states, which is similar to the biological synapses. In this article, a novel network circuit based on memristor synapses is proposed for bidirectional associative memory with in-situ learning method. An analog neuron circuit is designed to emulate the cubic activation function of neural networks. A memristive synapse circuit is constructed to map both positive and negative weights on a single memristor. Moreover, an in-situ learning circuit fitting memristor's nonlinear characteristic is proposed. Feedback control strategy is incorporated in this learning circuit to adjust the resistance of the memristor and avoid the encoding error of the memristor's write voltage. The performance of the proposed network circuit is verified by the training and recalling simulations. The comparison between the proposed approach and related works is analyzed to demonstrate the advantage of the proposed circuit design.

...read moreread less

Collapse