scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?

TL;DR: It is hoped that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.
Abstract: Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Organized for a computer vision audience, we present functional principles of the processing hierarchies present in the primate visual system considering recent discoveries in neurophysiology. The hierarchical processing in the primate visual system is characterized by a sequence of different levels of processing (on the order of 10) that constitute a deep hierarchy in contrast to the flat vision architectures predominantly used in today's mainstream computer vision. We hope that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

14,635 citations


Additional excerpts

  • ...…Hung, Kreiman, Poggio, & DiCarlo, 2005; Kobatake & Tanaka, 1994; Kriegeskorte et al., 2008; Lennie & Movshon, 2005; Logothetis, Pauls, & Poggio, 1995; Perrett, Hietanen, Oram, Benson, & Rolls, 1992; Perrett, Rolls, & Caan, 1982); compare a computer vision-oriented survey (Kruger et al., 2013)....

    [...]

  • ..., 2008; Lennie & Movshon, 2005; Logothetis, Pauls, & Poggio, 1995; Perrett, Hietanen, Oram, Benson, & Rolls, 1992; Perrett, Rolls, & Caan, 1982); compare a computer vision-oriented survey (Kruger et al., 2013)....

    [...]

Journal ArticleDOI
06 Jun 1986-JAMA
TL;DR: The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or her own research.
Abstract: I have developed "tennis elbow" from lugging this book around the past four weeks, but it is worth the pain, the effort, and the aspirin. It is also worth the (relatively speaking) bargain price. Including appendixes, this book contains 894 pages of text. The entire panorama of the neural sciences is surveyed and examined, and it is comprehensive in its scope, from genomes to social behaviors. The editors explicitly state that the book is designed as "an introductory text for students of biology, behavior, and medicine," but it is hard to imagine any audience, interested in any fragment of neuroscience at any level of sophistication, that would not enjoy this book. The editors have done a masterful job of weaving together the biologic, the behavioral, and the clinical sciences into a single tapestry in which everyone from the molecular biologist to the practicing psychiatrist can find and appreciate his or

7,563 citations

Journal ArticleDOI
TL;DR: The concept of deep learning is introduced into hyperspectral data classification for the first time, and a new way of classifying with spatial-dominated information is proposed, which is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression.
Abstract: Classification is one of the most popular topics in hyperspectral remote sensing. In the last two decades, a huge number of methods were proposed to deal with the hyperspectral data classification problem. However, most of them do not hierarchically extract deep features. In this paper, the concept of deep learning is introduced into hyperspectral data classification for the first time. First, we verify the eligibility of stacked autoencoders by following classical spectral information-based classification. Second, a new way of classifying with spatial-dominated information is proposed. We then propose a novel deep learning framework to merge the two features, from which we can get the highest classification accuracy. The framework is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression. Specifically, as a deep learning architecture, stacked autoencoders are aimed to get useful high-level features. Experimental results with widely-used hyperspectral data indicate that classifiers built in this deep learning-based framework provide competitive performance. In addition, the proposed joint spectral-spatial deep neural network opens a new window for future research, showcasing the deep learning-based methods' huge potential for accurate hyperspectral data classification.

2,071 citations

Journal ArticleDOI
TL;DR: This paper proposes a 3-D CNN-based FE model with combined regularization to extract effective spectral-spatial features of hyperspectral imagery and reveals that the proposed models with sparse constraints provide competitive results to state-of-the-art methods.
Abstract: Due to the advantages of deep learning, in this paper, a regularized deep feature extraction (FE) method is presented for hyperspectral image (HSI) classification using a convolutional neural network (CNN). The proposed approach employs several convolutional and pooling layers to extract deep features from HSIs, which are nonlinear, discriminant, and invariant. These features are useful for image classification and target detection. Furthermore, in order to address the common issue of imbalance between high dimensionality and limited availability of training samples for the classification of HSI, a few strategies such as L2 regularization and dropout are investigated to avoid overfitting in class data modeling. More importantly, we propose a 3-D CNN-based FE model with combined regularization to extract effective spectral-spatial features of hyperspectral imagery. Finally, in order to further improve the performance, a virtual sample enhanced method is proposed. The proposed approaches are carried out on three widely used hyperspectral data sets: Indian Pines, University of Pavia, and Kennedy Space Center. The obtained results reveal that the proposed models with sparse constraints provide competitive results to state-of-the-art methods. In addition, the proposed deep FE opens a new window for further research.

2,059 citations

Journal ArticleDOI
TL;DR: A new feature extraction (FE) and image classification framework are proposed for hyperspectral data analysis based on deep belief network (DBN) and a novel deep architecture is proposed, which combines the spectral-spatial FE and classification together to get high classification accuracy.
Abstract: Hyperspectral data classification is a hot topic in remote sensing community. In recent years, significant effort has been focused on this issue. However, most of the methods extract the features of original data in a shallow manner. In this paper, we introduce a deep learning approach into hyperspectral image classification. A new feature extraction (FE) and image classification framework are proposed for hyperspectral data analysis based on deep belief network (DBN). First, we verify the eligibility of restricted Boltzmann machine (RBM) and DBN by the following spectral information-based classification. Then, we propose a novel deep architecture, which combines the spectral–spatial FE and classification together to get high classification accuracy. The framework is a hybrid of principal component analysis (PCA), hierarchical learning-based FE, and logistic regression (LR). Experimental results with hyperspectral data indicate that the classifier provide competitive solution with the state-of-the-art methods. In addition, this paper reveals that deep learning system has huge potential for hyperspectral data classification.

1,028 citations


Cites background from "Deep Hierarchies in the Primate Vis..."

  • ...Based on neuroscience, human brains process information with multiple stages from retina to cortex, which leads to high performance on object recognition [31]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Journal ArticleDOI
TL;DR: This method is used to examine receptive fields of a more complex type and to make additional observations on binocular interaction and this approach is necessary in order to understand the behaviour of individual cells, but it fails to deal with the problem of the relationship of one cell to its neighbours.
Abstract: What chiefly distinguishes cerebral cortex from other parts of the central nervous system is the great diversity of its cell types and interconnexions. It would be astonishing if such a structure did not profoundly modify the response patterns of fibres coming into it. In the cat's visual cortex, the receptive field arrangements of single cells suggest that there is indeed a degree of complexity far exceeding anything yet seen at lower levels in the visual system. In a previous paper we described receptive fields of single cortical cells, observing responses to spots of light shone on one or both retinas (Hubel & Wiesel, 1959). In the present work this method is used to examine receptive fields of a more complex type (Part I) and to make additional observations on binocular interaction (Part II). This approach is necessary in order to understand the behaviour of individual cells, but it fails to deal with the problem of the relationship of one cell to its neighbours. In the past, the technique of recording evoked slow waves has been used with great success in studies of functional anatomy. It was employed by Talbot & Marshall (1941) and by Thompson, Woolsey & Talbot (1950) for mapping out the visual cortex in the rabbit, cat, and monkey. Daniel & Whitteiidge (1959) have recently extended this work in the primate. Most of our present knowledge of retinotopic projections, binocular overlap, and the second visual area is based on these investigations. Yet the method of evoked potentials is valuable mainly for detecting behaviour common to large populations of neighbouring cells; it cannot differentiate functionally between areas of cortex smaller than about 1 mm2. To overcome this difficulty a method has in recent years been developed for studying cells separately or in small groups during long micro-electrode penetrations through nervous tissue. Responses are correlated with cell location by reconstructing the electrode tracks from histological material. These techniques have been applied to

12,923 citations


"Deep Hierarchies in the Primate Vis..." refers background in this paper

  • ...The hierarchical processing in the primate visual system is characterized by a sequence of different levels of processing (on the order of 10) that constitute a deep hierarchy in contrast to the flat vision architectures predominantly used in today’s mainstream computer vision....

    [...]

Journal ArticleDOI
TL;DR: A summary of the layout of cortical areas associated with vision and with other modalities, a computerized database for storing and representing large amounts of information on connectivity patterns, and the application of these data to the analysis of hierarchical organization of the cerebral cortex are reported on.
Abstract: In recent years, many new cortical areas have been identified in the macaque monkey. The number of identified connections between areas has increased even more dramatically. We report here on (1) a summary of the layout of cortical areas associated with vision and with other modalities, (2) a computerized database for storing and representing large amounts of information on connectivity patterns, and (3) the application of these data to the analysis of hierarchical organization of the cerebral cortex. Our analysis concentrates on the visual system, which includes 25 neocortical areas that are predominantly or exclusively visual in function, plus an additional 7 areas that we regard as visual-association areas on the basis of their extensive visual inputs. A total of 305 connections among these 32 visual and visual-association areas have been reported. This represents 31% of the possible number of pathways if each area were connected with all others. The actual degree of connectivity is likely to be closer to 40%. The great majority of pathways involve reciprocal connections between areas. There are also extensive connections with cortical areas outside the visual system proper, including the somatosensory cortex, as well as neocortical, transitional, and archicortical regions in the temporal and frontal lobes. In the somatosensory/motor system, there are 62 identified pathways linking 13 cortical areas, suggesting an overall connectivity of about 40%. Based on the laminar patterns of connections between areas, we propose a hierarchy of visual areas and of somatosensory/motor areas that is more comprehensive than those suggested in other recent studies. The current version of the visual hierarchy includes 10 levels of cortical processing. Altogether, it contains 14 levels if one includes the retina and lateral geniculate nucleus at the bottom as well as the entorhinal cortex and hippocampus at the top. Within this hierarchy, there are multiple, intertwined processing streams, which, at a low level, are related to the compartmental organization of areas V1 and V2 and, at a high level, are related to the distinction between processing centers in the temporal and parietal lobes. However, there are some pathways and relationships (about 10% of the total) whose descriptions do not fit cleanly into this hierarchical scheme for one reason or another. In most instances, though, it is unclear whether these represent genuine exceptions to a strict hierarchy rather than inaccuracies or uncertainities in the reported assignment.

7,796 citations


"Deep Hierarchies in the Primate Vis..." refers background or methods in this paper

  • ...The ties with the biological vision faded, and if there were some references to biological-related mechanisms, they were most commonly limited to individual functional modules or feature choices such as Gabor wavelets....

    [...]

  • ...We begin with the retinal photoreceptors as the first stage of visual processing (Section 3.1), and follow the visual signal from the eye through the Lateral Geniculate Nucleus (LGN) (Section 3.2)....

    [...]

  • ...At present, detailed knowledge about 2....

    [...]

  • ...Index Terms—Computer vision, deep hierarchies, biological modeling Ç...

    [...]

Book
01 Jan 2009
TL;DR: The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.
Abstract: Can machine learning deliver AI? Theoretical results, inspiration from the brain and cognition, as well as machine learning experiments suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one would need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers, graphical models with many levels of latent variables, or in complicated propositional formulae re-using many sub-formulae. Each level of the architecture represents features at a different level of abstraction, defined as a composition of lower-level features. Searching the parameter space of deep architectures is a difficult task, but new algorithms have been discovered and a new sub-area has emerged in the machine learning community since 2006, following these discoveries. Learning algorithms such as those for Deep Belief Networks and other related unsupervised learning algorithms have recently been proposed to train deep architectures, yielding exciting results and beating the state-of-the-art in certain areas. Learning Deep Architectures for AI discusses the motivations for and principles of learning algorithms for deep architectures. By analyzing and comparing recent results with different learning algorithms for deep architectures, explanations for their success are proposed and discussed, highlighting challenges and suggesting avenues for future explorations in this area.

7,767 citations


Additional excerpts

  • ...Index Terms—Computer vision, deep hierarchies, biological modeling Ç...

    [...]