scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Deep learning

28 May 2015-Nature (Nature Research)-Vol. 521, Iss: 7553, pp 436-444
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, a large-scale remote sensing image retrieval dataset called PatternNet was collected for the purpose of evaluating the performance of different deep learning-based approaches for remote sensing images retrieval.
Abstract: Benchmark datasets are critical for developing, evaluating, and comparing remote sensing image retrieval (RSIR) approaches. However, current benchmark datasets are deficient in that (1) they were originally collected for land use/land cover classification instead of RSIR; (2) they are relatively small in terms of the number of classes as well as the number of images per class which makes them unsuitable for developing deep learning based approaches; and (3) they are not appropriate for RSIR due to the large amount of background present in the images. These limitations restrict the development of novel approaches for RSIR, particularly those based on deep learning which require large amounts of training data. We therefore present a new large-scale remote sensing dataset termed “PatternNet” that was collected specifically for RSIR. PatternNet was collected from high-resolution imagery and contains 38 classes with 800 images per class. Significantly, PatternNet’s large scale makes it suitable for developing novel, deep learning based approaches for RSIR. We use PatternNet to evaluate the performance of over 35 RSIR methods ranging from traditional handcrafted feature based methods to recent, deep learning based ones. These results serve as a baseline for future research on RSIR.

287 citations

Journal ArticleDOI
TL;DR: Five research areas in seismology are surveyed in which ML classification, regression, clustering algorithms show promise: earthquake detection and phase picking, earthquake early warning, ground‐motion prediction, seismic tomography, and earthquake geodesy.
Abstract: This article provides an overview of current applications of machine learning (ML) in seismology. ML techniques are becoming increasingly widespread in seismology, with applications ranging from identifying unseen signals and patterns to extracting features that might improve our physical understanding. The survey of the applications in seismology presented here serves as a catalyst for further use of ML. Five research areas in seismology are surveyed in which ML classification, regression, clustering algorithms show promise: earthquake detection and phase picking, earthquake early warning (EEW), ground‐motion prediction, seismic tomography, and earthquake geodesy. We conclude by discussing the need for a hybrid approach combining data‐driven ML with traditional physical modeling.

287 citations


Cites background or methods from "Deep learning"

  • ...“Deep Learning” (Goodfellow et al., 2016) provides a practical introduction to deep neural networks (NNs)....

    [...]

  • ...…of ML algorithms such as semisupervised learning and reinforcement learning, for which we refer readers to more advanced texts (e.g., Murphy, 2012; Goodfellow et al., 2016). doi: 10.1785/0220180259 Seismological Research Letters Volume 90, Number 1 January/February 2019 3 Downloaded from…...

    [...]

  • ...There are also many excellent free online courses, such as Ng’s “Machine Learning,” Hinton’s “Neural Networks for Machine Learning,” Tibshirani and Hastie’s “Statistical Learning,” and Li et al.’s “Convolutional Neural Networks for Visual Recognition.”...

    [...]

Book ChapterDOI
10 Sep 2019
TL;DR: This introductory paper presents recent developments and applications in deep learning, and makes a plea for a wider use of explainable learning algorithms in practice.
Abstract: In recent years, machine learning (ML) has become a key enabling technology for the sciences and industry. Especially through improvements in methodology, the availability of large databases and increased computational power, today’s ML algorithms are able to achieve excellent performance (at times even exceeding the human level) on an increasing number of complex tasks. Deep learning models are at the forefront of this development. However, due to their nested non-linear structure, these powerful models have been generally considered “black boxes”, not providing any information about what exactly makes them arrive at their predictions. Since in many applications, e.g., in the medical domain, such lack of transparency may be not acceptable, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This introductory paper presents recent developments and applications in this field and makes a plea for a wider use of explainable learning algorithms in practice.

287 citations


Cites background from "Deep learning"

  • ...These immense successes of AI systems mainly became possible through improvements in deep learning methodology [48,47], the availability of large databases [17,34] and computational gains obtained with powerful GPU cards [52]....

    [...]

Journal ArticleDOI
TL;DR: The constructed CNN system for detecting esophageal cancer can analyze stored endoscopic images in a short time with high sensitivity, however, more training would lead to higher diagnostic accuracy.

286 citations

Journal ArticleDOI
TL;DR: An overview of the theory behind ML is provided, the common ML algorithms used in medicine including their pitfalls are explored and the potential future of ML in medicine is discussed.
Abstract: Machine learning (ML) is a burgeoning field of medicine with huge resources being applied to fuse computer science and statistics to medical problems. Proponents of ML extol its ability to deal with large, complex and disparate data, often found within medicine and feel that ML is the future for biomedical research, personalized medicine, computer-aided diagnosis to significantly advance global health care. However, the concepts of ML are unfamiliar to many medical professionals and there is untapped potential in the use of ML as a research tool. In this article, we provide an overview of the theory behind ML, explore the common ML algorithms used in medicine including their pitfalls and discuss the potential future of ML in medicine.

285 citations

References
More filters
Journal ArticleDOI
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

72,897 citations

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Journal ArticleDOI
01 Jan 1988-Nature
TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Abstract: We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure1.

23,814 citations

Journal ArticleDOI
26 Feb 2015-Nature
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Abstract: The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

23,074 citations

Journal ArticleDOI
28 Jul 2006-Science
TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Abstract: High-dimensional data can be converted to low-dimensional codes by training a multilayer neural network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent can be used for fine-tuning the weights in such "autoencoder" networks, but this works well only if the initial weights are close to a good solution. We describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

16,717 citations