scispace - formally typeset
Open accessJournal ArticleDOI: 10.1016/J.NEUNET.2021.02.023

SPLASH: Learnable activation functions for improving accuracy and adversarial robustness.

04 Mar 2021-Neural Networks (Pergamon)-Vol. 140, pp 1-12
Abstract: We introduce SPLASH units, a class of learnable activation functions shown to simultaneously improve the accuracy of deep neural networks while also improving their robustness to adversarial attacks. SPLASH units have both a simple parameterization and maintain the ability to approximate a wide range of non-linear functions. SPLASH units are: (1) continuous; (2) grounded ( f ( 0 ) = 0 ); (3) use symmetric hinges; and (4) their hinges are placed at fixed locations which are derived from the data (i.e. no learning required). Compared to nine other learned and fixed activation functions, including ReLU and its variants, SPLASH units show superior performance across three datasets (MNIST, CIFAR-10, and CIFAR-100) and four architectures (LeNet5, All-CNN, ResNet-20, and Network-in-Network). Furthermore, we show that SPLASH units significantly increase the robustness of deep neural networks to adversarial attacks. Our experiments on both black-box and white-box adversarial attacks show that commonly-used architectures, namely LeNet5, All-CNN, Network-in-Network, and ResNet-20, can be up to 31% more robust to adversarial attacks by simply using SPLASH units instead of ReLUs. Finally, we show the benefits of using SPLASH activation functions in bigger architectures designed for non-trivial datasets such as ImageNet.

... read more


7 results found

Open accessJournal Article
Garrett Bingham1, Risto Miikkulainen1Institutions (1)
04 May 2021-arXiv: Learning
Abstract: Recent studies have shown that the choice of activation function can significantly affect the performance of deep learning networks. However, the benefits of novel activation functions have been inconsistent and task-dependent, and therefore the rectified linear unit (ReLU) is still the most commonly used. This paper proposes a technique for customizing activation functions automatically, resulting in reliable improvements in performance. Evolutionary search is used to discover the general form of the function, and gradient descent to optimize its parameters for different parts of the network and over the learning process. Experiments with three different neural network architectures on the CIFAR-100 image classification dataset show that this approach is effective. It discovers different activation functions for different architectures, and consistently improves accuracy over ReLU and other recently proposed activation functions by significant margins. The approach can therefore be used as an automated optimization step in applying deep learning to new tasks.

... read more

Topics: Activation function (57%), Artificial neural network (57%), Rectifier (neural networks) (57%) ... read more

6 Citations

Open accessPosted Content
09 Jun 2021-arXiv: Learning
Abstract: The open-world deployment of Machine Learning (ML) algorithms in safety-critical applications such as autonomous vehicles needs to address a variety of ML vulnerabilities such as interpretability, verifiability, and performance limitations. Research explores different approaches to improve ML dependability by proposing new models and training techniques to reduce generalization error, achieve domain adaptation, and detect outlier examples and adversarial attacks. In this paper, we review and organize practical ML techniques that can improve the safety and dependability of ML algorithms and therefore ML-based software. Our organization maps state-of-the-art ML techniques to safety strategies in order to enhance the dependability of the ML algorithm from different aspects, and discuss research gaps as well as promising solutions.

... read more

Topics: Dependability (60%)

2 Citations

Journal ArticleDOI: 10.1016/J.JVCIR.2021.103294
Zhengze Li1, Xiaoyuan Yang1, Kangqing Shen1, Fazhen Jiang1  +3 moreInstitutions (1)
Abstract: Activation functions are of great importance for the performance and training of deep neural networks. High-performance activation function is expected to effectively prevent the gradient from vanishing and help network converge. This paper provides a novel smooth activation function, called Parameterized Self-circulating Gating Unit (PSGU), aiming to train an adaptive activation function to improve the performance of deep networks. Compared with other works, we propose and study the self-circulation gating property of activation function, and analyze its influence on the signal transmission in network by controlling the flow of information. Specifically, we theoretically analyze and propose the initialization based on PSGU, which adequately explores the properties in neighborhood of the origin. Finally, the proposed activation function and initialization are compared with other methods on commonly-used network architectures, the achieved performances of using PSGU alone or combining with our proposed initialization are over par with the state of the art.

... read more

Journal ArticleDOI: 10.1007/S11265-020-01627-X
Abstract: Quantum computers are imminent threat to secure signal processing because they can break the contemporary public-key cryptography schemes in polynomial time. Ring learning with error (RLWE) lattice-based cryptography (LBC) is considered as the most versatile and efficient family of post-quantum cryptography (PQC). Polynomial multiplication is the most compute-intensive routine in the RLWE schemes. Convolutions and Number Theoretic Transform (NTT) are two common methods to perform the polynomial multiplication. In this paper, we explore the energy efficiency of different polynomial multipliers, NTT-based and convolution-based, on GPU and FPGA. When synthesized on a Zynq UltraScale+ FPGA, our NTT-based and convolution-based designs achieve on average 5.1x and 22.5x speedup over state-of-the-art. Our convolution-based design, on a Zynq UltraScale+ FPGA, can generate more than 2x signatures per second by CRYSTALS-Dilithium. The designed NTT-based multiplier on NVIDIA Jetson TX2 is 1.2x and 2x faster than our baseline NTT-based multiplier on FPGA for polynomial degrees of 512 and 1024, respectively. Our explorations and guidelines can help designers choose proper implementations to realize quantum-resistant signal processing.

... read more

Topics: Lattice-based cryptography (57%), Polynomial (53%), Speedup (50%)

Open accessPosted Content
Abstract: There is a lack of scalable quantitative measures of reactivity for functional groups in organic chemistry Measuring reactivity experimentally is costly and time-consuming and does not scale to the astronomical size of chemical space In previous quantum chemistry studies, we have introduced Methyl Cation Affinities (MCA*) and Methyl Anion Affinities (MAA*), using a solvation model, as quantitative measures of reactivity for organic functional groups over the broadest range Although MCA* and MAA* offer good estimates of reactivity parameters, their calculation through Density Functional Theory (DFT) simulations is time-consuming To circumvent this problem, we first use DFT to calculate MCA* and MAA* for more than 2,400 organic molecules thereby establishing a large dataset of chemical reactivity scores We then design deep learning methods to predict the reactivity of molecular structures and train them using this curated dataset in combination with different representations of molecular structures Using ten-fold cross-validation, we show that graph attention neural networks applied to informative input fingerprints produce the most accurate estimates of reactivity, achieving over 91% test accuracy for predicting the MCA* plus-minus 30 or MAA* plus-minus 30, over 50 orders of magnitude Finally, we demonstrate the application of these reactivity scores to two tasks: (1) chemical reaction prediction; (2) combinatorial generation of reaction mechanisms The curated dataset of MCA* and MAA* scores is available through the ChemDB chemoinformatics web portal at this http URL

... read more


53 results found

Open accessProceedings ArticleDOI: 10.1109/CVPR.2016.90
Kaiming He1, Xiangyu Zhang1, Shaoqing Ren1, Jian Sun1Institutions (1)
27 Jun 2016-
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

... read more

Topics: Deep learning (53%), Residual (53%), Convolutional neural network (53%) ... read more

93,356 Citations

Journal ArticleDOI: 10.1109/5.726791
Yann LeCun1, Léon Bottou2, Léon Bottou3, Yoshua Bengio3  +3 moreInstitutions (5)
01 Jan 1998-
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

... read more

Topics: Neocognitron (64%), Intelligent character recognition (64%), Artificial neural network (60%) ... read more

34,930 Citations

Proceedings ArticleDOI: 10.1109/CVPR.2009.5206848
Jia Deng1, Wei Dong1, Richard Socher1, Li-Jia Li1  +2 moreInstitutions (1)
20 Jun 2009-
Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

... read more

Topics: WordNet (57%), Image retrieval (54%)

31,274 Citations

Open accessJournal Article
Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

... read more

Topics: Overfitting (66%), Deep learning (62%), Convolutional neural network (61%) ... read more

27,534 Citations

Open accessJournal Article
Abstract: We present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large datasets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of datasets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the datasets.

... read more

Topics: Sammon mapping (58%), t-distributed stochastic neighbor embedding (57%), Isomap (57%) ... read more

22,120 Citations

No. of citations received by the Paper in previous years