scispace - formally typeset
Search or ask a question
Author

Forest Agostinelli

Bio: Forest Agostinelli is an academic researcher from University of South Carolina. The author has contributed to research in topics: Computer science & Cube (algebra). The author has an hindex of 11, co-authored 18 publications receiving 788 citations. Previous affiliations of Forest Agostinelli include University of Michigan & University of California, Irvine.

Papers
More filters
Posted Content
TL;DR: A novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent is designed, achieving state-of-the-art performance on CIFar-10, CIFAR-100, and a benchmark from high-energy physics involving Higgs boson decay modes.
Abstract: Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.

404 citations

Proceedings Article
05 Dec 2013
TL;DR: In this article, the adaptive multi-column stacked sparse denoising autoencoder (AMC-SSDA) is proposed, which combines multiple SSDAs by computing optimal column weights via solving a nonlinear optimization program and training a separate network to predict the optimal weights.
Abstract: Stacked sparse denoising autoencoders (SSDAs) have recently been shown to be successful at removing noise from corrupted images. However, like most denoising techniques, the SSDA is not robust to variation in noise types beyond what it has seen during training. To address this limitation, we present the adaptive multi-column stacked sparse denoising autoencoder (AMC-SSDA), a novel technique of combining multiple SSDAs by (1) computing optimal column weights via solving a nonlinear optimization program and (2) training a separate network to predict the optimal weights. We eliminate the need to determine the type of noise, let alone its statistics, at test time and even show that the system can be robust to noise not seen in the training set. We show that state-of-the-art denoising performance can be achieved with a single system on a variety of different noise types. Additionally, we demonstrate the efficacy of AMC-SSDA as a preprocessing (denoising) algorithm by achieving strong classification performance on corrupted MNIST digits.

165 citations

Journal ArticleDOI
TL;DR: A new deep learning based search heuristic performs well on the iconic Rubik’s cube and can also generalize to puzzles in which optimal solvers are intractable.
Abstract: The Rubik’s cube is a prototypical combinatorial puzzle that has a large state space with a single goal state. The goal state is unlikely to be accessed using sequences of randomly generated moves, posing unique challenges for machine learning. We solve the Rubik’s cube with DeepCubeA, a deep reinforcement learning approach that learns how to solve increasingly difficult states in reverse from the goal state without any specific domain knowledge. DeepCubeA solves 100% of all test configurations, finding a shortest path to the goal state 60.3% of the time. DeepCubeA generalizes to other combinatorial puzzles and is able to solve the 15 puzzle, 24 puzzle, 35 puzzle, 48 puzzle, Lights Out and Sokoban, finding a shortest path in the majority of verifiable cases. For some combinatorial puzzles, solutions can be verified to be optimal, for others, the state space is too large to be certain that a solution is optimal. A new deep learning based search heuristic performs well on the iconic Rubik’s cube and can also generalize to puzzles in which optimal solvers are intractable.

126 citations

Proceedings Article
21 Dec 2014
TL;DR: In this paper, a piecewise linear activation function is learned independently for each neuron using gradient descent, achieving state-of-the-art performance on CIFAR-10 (7.51%), CFIAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.
Abstract: Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.

74 citations

Journal ArticleDOI
TL;DR: BIO_CYCLE is a system to robustly estimate which signals are periodic in high-throughput circadian experiments, producing estimates of amplitudes, periods, phases, as well as several statistical significance measures.
Abstract: Author(s): Agostinelli, Forest; Ceglia, Nicholas; Shahbaba, Babak; Sassone-Corsi, Paolo; Baldi, Pierre | Abstract: MotivationCircadian rhythms date back to the origins of life, are found in virtually every species and every cell, and play fundamental roles in functions ranging from metabolism to cognition. Modern high-throughput technologies allow the measurement of concentrations of transcripts, metabolites and other species along the circadian cycle creating novel computational challenges and opportunities, including the problems of inferring whether a given species oscillate in circadian fashion or not, and inferring the time at which a set of measurements was taken.ResultsWe first curate several large synthetic and biological time series datasets containing labels for both periodic and aperiodic signals. We then use deep learning methods to develop and train BIO_CYCLE, a system to robustly estimate which signals are periodic in high-throughput circadian experiments, producing estimates of amplitudes, periods, phases, as well as several statistical significance measures. Using the curated data, BIO_CYCLE is compared to other approaches and shown to achieve state-of-the-art performance across multiple metrics. We then use deep learning methods to develop and train BIO_CLOCK to robustly estimate the time at which a particular single-time-point transcriptomic experiment was carried. In most cases, BIO_CLOCK can reliably predict time, within approximately 1 h, using the expression levels of only a small number of core clock genes. BIO_CLOCK is shown to work reasonably well across tissue types, and often with only small degradation across conditions. BIO_CLOCK is used to annotate most mouse experiments found in the GEO database with an inferred time stamp.Availability and implementationAll data and software are publicly available on the CircadiOmics web portal: circadiomics.igb.uci.edu/Contactsfagostin@uci.edu or pfbaldi@uci.eduSupplementary informationSupplementary data are available at Bioinformatics online.

69 citations


Cited by
More filters
Posted Content
TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

11,866 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

11,732 citations

Journal ArticleDOI
TL;DR: This work was supported in part by the Royal Society of the UK, the National Natural Science Foundation of China, and the Alexander von Humboldt Foundation of Germany.

2,404 citations

Journal ArticleDOI
TL;DR: This review, which focuses on the application of CNNs to image classification tasks, covers their development, from their predecessors up to recent state-of-the-art deep learning systems.
Abstract: Convolutional neural networks CNNs have been applied to visual tasks since the late 1980s. However, despite a few scattered applications, they were dormant until the mid-2000s when developments in computing power and the advent of large amounts of labeled data, supplemented by improved algorithms, contributed to their advancement and brought them to the forefront of a neural network renaissance that has seen rapid progression since 2012. In this review, which focuses on the application of CNNs to image classification tasks, we cover their development, from their predecessors up to recent state-of-the-art deep learning systems. Along the way, we analyze 1 their early successes, 2 their role in the deep learning renaissance, 3 selected symbolic works that have contributed to their recent popularity, and 4 several improvement attempts by reviewing contributions and challenges of over 300 publications. We also introduce some of their current trends and remaining challenges.

2,366 citations

Journal ArticleDOI
TL;DR: FFDNet as discussed by the authors proposes a fast and flexible denoising convolutional neural network with a tunable noise level map as the input, which can handle a wide range of noise levels effectively with a single network.
Abstract: Due to the fast inference and good performance, discriminative learning methods have been widely studied in image denoising. However, these methods mostly learn a specific model for each noise level, and require multiple models for denoising images with different noise levels. They also lack flexibility to deal with spatially variant noise, limiting their applications in practical denoising. To address these issues, we present a fast and flexible denoising convolutional neural network, namely FFDNet, with a tunable noise level map as the input. The proposed FFDNet works on downsampled sub-images, achieving a good trade-off between inference speed and denoising performance. In contrast to the existing discriminative denoisers, FFDNet enjoys several desirable properties, including: 1) the ability to handle a wide range of noise levels (i.e., [0, 75]) effectively with a single network; 2) the ability to remove spatially variant noise by specifying a non-uniform noise level map; and 3) faster speed than benchmark BM3D even on CPU without sacrificing denoising performance. Extensive experiments on synthetic and real noisy images are conducted to evaluate FFDNet in comparison with state-of-the-art denoisers. The results show that FFDNet is effective and efficient, making it highly attractive for practical denoising applications.

1,430 citations