Showing papers on "Convolutional neural network published in 2012"

PDF

Open Access

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Practical Bayesian Optimization of Machine Learning Algorithms

[...]

Jasper Snoek¹, Hugo Larochelle², Ryan P. Adams³•Institutions (3)

University of Toronto¹, Université de Sherbrooke², Harvard University³

03 Dec 2012

TL;DR: This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.

...read moreread less

Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

...read moreread less

5,654 citations

Proceedings Article•DOI•

Multi-column deep neural networks for image classification

[...]

Dan Ciresan¹, Ueli Meier¹, Jürgen Schmidhuber¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

16 Jun 2012

TL;DR: In this paper, a biologically plausible, wide and deep artificial neural network architectures was proposed to match human performance on tasks such as the recognition of handwritten digits or traffic signs, achieving near-human performance.

...read moreread less

Abstract: Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.

...read moreread less

3,717 citations

Journal Article•DOI•

2012 Special Issue: Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition

[...]

Johannes Stallkamp¹, Marc Schlipsing¹, Jan Salmen¹, Christian Igel²•Institutions (2)

Ruhr University Bochum¹, University of Copenhagen²

01 Aug 2012-Neural Networks

TL;DR: A publicly available traffic sign dataset with more than 50,000 images of German road signs in 43 classes is presented, and Convolutional neural networks showed particularly high classification accuracies in the competition, and the CNNs outperformed the human test persons.

...read moreread less

1,138 citations

Posted Content•

Practical Bayesian Optimization of Machine Learning Algorithms

[...]

Jasper Snoek¹, Hugo Larochelle², Ryan P. Adams³•Institutions (3)

University of Toronto¹, Université de Sherbrooke², Harvard University³

13 Jun 2012-arXiv: Machine Learning

TL;DR: In this paper, a learning algorithm's generalization performance is modeled as a sample from a Gaussian process and the tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next.

...read moreread less

Abstract: Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

...read moreread less

1,110 citations

ImageNet classification with deep convolutional neural networks

[...]

Krizhevsky

01 Jan 2012

1,002 citations

Proceedings Article•DOI•

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

[...]

Ossama Abdel-Hamid¹, Abdelrahman Mohamed², Hui Jiang¹, Gerald Penn²•Institutions (2)

York University¹, University of Toronto²

25 Mar 2012

TL;DR: The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance.

...read moreread less

Abstract: Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of local filtering and max-pooling in the CNN architecture. In this paper, we propose to apply CNN to speech recognition within the framework of hybrid NN-HMM model. We propose to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance. In our method, a pair of local filtering layer and max-pooling layer is added at the lowest end of neural network (NN) to normalize spectral variations of speech signals. In our experiments, the proposed CNN architecture is evaluated in a speaker independent speech recognition task using the standard TIMIT data sets. Experimental results show that the proposed CNN method can achieve over 10% relative error reduction in the core TIMIT test sets when comparing with a regular NN using the same number of hidden layers and weights. Our results also show that the best result of the proposed CNN model is better than previously published results on the same TIMIT test sets that use a pre-trained deep NN model.

...read moreread less

901 citations

Proceedings Article•

End-to-end text recognition with convolutional neural networks

[...]

Tao Wang¹, David J. Wu¹, Adam Coates¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

01 Nov 2012

TL;DR: This paper combines the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows them to use a common framework to train highly-accurate text detector and character recognizer modules.

...read moreread less

Abstract: Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highly-accurate text detector and character recognizer modules. Then, using only simple off-the-shelf methods, we integrate these two modules into a full end-to-end, lexicon-driven, scene text recognition system that achieves state-of-the-art performance on standard benchmarks, namely Street View Text and ICDAR 2003.

...read moreread less

900 citations

Journal Article•DOI•

A novel hybrid CNN-SVM classifier for recognizing handwritten digits

[...]

Xiao-Xiao Niu¹, Ching Y. Suen¹•Institutions (1)

Concordia University Wisconsin¹

01 Apr 2012-Pattern Recognition

TL;DR: A hybrid model of integrating the synergy of two superior classifiers: Convolutional Neural Network (CNN) and Support Vector Machine (SVM) which have proven results in recognizing different types of patterns is presented.

...read moreread less

585 citations

Proceedings Article•DOI•

Steel defect classification with Max-Pooling Convolutional Neural Networks

[...]

Jonathan Masci¹, Ueli Meier¹, Dan Ciresan¹, Jürgen Schmidhuber¹, Gabriel Fricout¹ - Show less +1 more•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

10 Jun 2012

TL;DR: Not only does the proposed Max-Pooling Convolutional Neural Network approach obtain much better results, but the proposed method also works directly on raw pixel intensities of detected and segmented steel defects, avoiding further time consuming and hard to optimize ad-hoc preprocessing.

...read moreread less

Abstract: We present a Max-Pooling Convolutional Neural Network approach for supervised steel defect classification. On a classification task with 7 defects, collected from a real production line, an error rate of 7% is obtained. Compared to SVM classifiers trained on commonly used feature descriptors our best net performs at least two times better. Not only we do obtain much better results, but the proposed method also works directly on raw pixel intensities of detected and segmented steel defects, avoiding further time consuming and hard to optimize ad-hoc preprocessing.

...read moreread less

262 citations

Book Chapter•DOI•

Road scene segmentation from a single image

[...]

Jose M. Alvarez¹, Theo Gevers², Yann LeCun¹, Antonio M. López³•Institutions (3)

Courant Institute of Mathematical Sciences¹, University of Amsterdam², Autonomous University of Barcelona³

07 Oct 2012

TL;DR: It is concluded that convolutional neural networks are suitable for learning 3D scene layout from noisy labels and provides a relative improvement of 7% compared to the baseline and combining color planes provides a statistical description of road areas that exhibits maximal uniformity.

...read moreread less

Abstract: Road scene segmentation is important in computer vision for different applications such as autonomous driving and pedestrian detection. Recovering the 3D structure of road scenes provides relevant contextual information to improve their understanding. In this paper, we use a convolutional neural network based algorithm to learn features from noisy labels to recover the 3D scene layout of a road image. The novelty of the algorithm relies on generating training labels by applying an algorithm trained on a general image dataset to classify on–board images. Further, we propose a novel texture descriptor based on a learned color plane fusion to obtain maximal uniformity in road areas. Finally, acquired (off–line) and current (on–line) information are combined to detect road areas in single images. From quantitative and qualitative experiments, conducted on publicly available datasets, it is concluded that convolutional neural networks are suitable for learning 3D scene layout from noisy labels and provides a relative improvement of 7% compared to the baseline. Furthermore, combining color planes provides a statistical description of road areas that exhibits maximal uniformity and provides a relative improvement of 8% compared to the baseline. Finally, the improvement is even bigger when acquired and current information from a single image are combined.

...read moreread less

Proceedings Article•

Convolutional neural networks applied to house numbers digit classification

[...]

Pierre Sermanet¹, Soumith Chintala¹, Yann LeCun¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

18 Apr 2012

TL;DR: This work augmented the traditional ConvNet architecture by learning multi-stage features and by using Lp pooling and establishes a new state-of-the-art of 95.10% accuracy on the SVHN dataset (48% error improvement).

...read moreread less

Abstract: We classify digits of real-world house numbers using convolutional neural networks (ConvNets). Con-vNets are hierarchical feature learning neural networks whose structure is biologically inspired. Unlike many popular vision approaches that are hand-designed, ConvNets can automatically learn a unique set of features optimized for a given task. We augmented the traditional ConvNet architecture by learning multi-stage features and by using Lp pooling and establish a new state-of-the-art of 95.10% accuracy on the SVHN dataset (48% error improvement). Furthermore, we analyze the benefits of different pooling methods and multi-stage features in ConvNets. The source code and a tutorial are available at eblearn.sf.net.

...read moreread less

Posted Content•

Convolutional Neural Networks Applied to House Numbers Digit Classification

[...]

Pierre Sermanet¹, Soumith Chintala¹, Yann LeCun¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

18 Apr 2012-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a new state-of-the-art of 94.85% accuracy on the SVHN dataset (45.2% error improvement) was achieved by using Lp pooling.

...read moreread less

Abstract: We classify digits of real-world house numbers using convolutional neural networks (ConvNets). ConvNets are hierarchical feature learning neural networks whose structure is biologically inspired. Unlike many popular vision approaches that are hand-designed, ConvNets can automatically learn a unique set of features optimized for a given task. We augmented the traditional ConvNet architecture by learning multi-stage features and by using Lp pooling and establish a new state-of-the-art of 94.85% accuracy on the SVHN dataset (45.2% error improvement). Furthermore, we analyze the benefits of different pooling methods and multi-stage features in ConvNets. The source code and a tutorial are available at this http URL.

...read moreread less

Book Chapter•DOI•

Simplifying convnets for fast learning

[...]

Franck Mamalet, Christophe Garcia¹•Institutions (1)

Institut national des sciences Appliquées de Lyon¹

11 Sep 2012

TL;DR: This paper proposes different strategies for simplifying filters, used as feature extractors, to be learnt in convolutional neural networks (ConvNets) in order to modify the hypothesis space, and to speed-up learning and processing times.

...read moreread less

Abstract: In this paper, we propose different strategies for simplifying filters, used as feature extractors, to be learnt in convolutional neural networks (ConvNets) in order to modify the hypothesis space, and to speed-up learning and processing times. We study two kinds of filters that are known to be computationally efficient in feed-forward processing: fused convolution/sub-sampling filters, and separable filters. We compare the complexity of the back-propagation algorithm on ConvNets based on these different kinds of filters. We show that using these filters allows to reach the same level of recognition performance as with classical ConvNets for handwritten digit recognition, up to 3.3 times faster.

...read moreread less

Proceedings Article•DOI•

Rethinking Automatic Chord Recognition with Convolutional Neural Networks

[...]

Eric J. Humphrey¹, Juan Pablo Bello¹•Institutions (1)

New York University¹

12 Dec 2012

TL;DR: This work adopts a different perspective of the problem, where several seconds of pitch spectra are classified directly by a convolutional neural network, and achieves state of the art performance through this initial effort to chord recognition.

...read moreread less

Abstract: Despite early success in automatic chord recognition, recent efforts are yielding diminishing returns while basically iterating over the same fundamental approach. Here, we abandon typical conventions and adopt a different perspective of the problem, where several seconds of pitch spectra are classified directly by a convolutional neural network. Using labeled data to train the system in a supervised manner, we achieve state of the art performance through this initial effort in an otherwise unexplored area. Subsequent error analysis provides insight into potential areas of improvement, and this approach to chord recognition shows promise for future harmonic analysis systems.

...read moreread less

Journal Article•DOI•

An Event-Driven Multi-Kernel Convolution Processor Module for Event-Driven Vision Sensors

[...]

Luis A. Camunas-Mesa¹, C. Zamarreno-Ramos, Alejandro Linares-Barranco², A.J. Acosta-Jimenez, Teresa Serrano-Gotarredona, Bernabe Linares-Barranco - Show less +2 more•Institutions (2)

University of Leicester¹, University of Seville²

01 Feb 2012-IEEE Journal of Solid-state Circuits

TL;DR: An Event-Driven Convolution Module for computing 2D convolutions on such event streams and has multi-kernel capability, which means it will select the convolution kernel depending on the origin of the event.

...read moreread less

Abstract: Event-Driven vision sensing is a new way of sensing visual reality in a frame-free manner. This is, the vision sensor (camera) is not capturing a sequence of still frames, as in conventional video and computer vision systems. In Event-Driven sensors each pixel autonomously and asynchronously decides when to send its address out. This way, the sensor output is a continuous stream of address events representing reality dynamically continuously and without constraining to frames. In this paper we present an Event-Driven Convolution Module for computing 2D convolutions on such event streams. The Convolution Module has been designed to assemble many of them for building modular and hierarchical Convolutional Neural Networks for robust shape and pose invariant object recognition. The Convolution Module has multi-kernel capability. This is, it will select the convolution kernel depending on the origin of the event. A proof-of-concept test prototype has been fabricated in a 0.35 μm CMOS process and extensive experimental results are provided. The Convolution Processor has also been combined with an Event-Driven Dynamic Vision Sensor (DVS) for high-speed recognition examples. The chip can discriminate propellers rotating at 2 k revolutions per second, detect symbols on a 52 card deck when browsing all cards in 410 ms, or detect and follow the center of a phosphor oscilloscope trace rotating at 5 KHz.

...read moreread less

Proceedings Article•DOI•

Offline handwritten English character recognition based on convolutional neural network

[...]

Aiquan Yuan¹, Gang Bai¹, Lijing Jiao¹, Yajie Liu¹•Institutions (1)

College of Information Technology¹

27 Mar 2012

TL;DR: This paper applies Convolutional Neural Networks for offline handwritten English character recognition using a modified LeNet-5 CNN model, with special settings of the number of neurons in each layer and the connecting way between some layers.

...read moreread less

Abstract: This paper applies Convolutional Neural Networks (CNNs) for offline handwritten English character recognition. We use a modified LeNet-5 CNN model, with special settings of the number of neurons in each layer and the connecting way between some layers. Outputs of the CNN are set with error-correcting codes, thus the CNN has the ability to reject recognition results. For training of the CNN, an error-samples-based reinforcement learning strategy is developed. Experiments are evaluated on UNIPEN lowercase and uppercase datasets, with recognition rates of 93.7% for uppercase and 90.2% for lowercase, respectively.

...read moreread less

Journal Article•DOI•

Comparison between frame-constrained fix-pixel-value and frame-free spiking-dynamic-pixel convNets for visual processing

[...]

Clement Farabet¹, Rafael Paz², Jose A. Pérez-Carrasco³, C. Zamarreno-Ramos³, Alejandro Linares-Barranco², Yann LeCun⁴, Eugenio Culurciello⁵, Teresa Serrano-Gotarredona³, Bernabe Linares-Barranco³ - Show less +5 more•Institutions (5)

Courant Institute of Mathematical Sciences¹, University of Seville², Spanish National Research Council³, New York University⁴, University of Paris⁵

10 Apr 2012-Frontiers in Neuroscience

TL;DR: A comparison study of the Frame-Based or Frame-Free Spiking ConvNet Convolution Processors and spike-based convolution processors, two neuro-inspired solutions for real-time visual processing.

...read moreread less

Abstract: Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video in- formation in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed and time multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro- inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons.

...read moreread less

Book Chapter•DOI•

Semantic road segmentation via multi-scale ensembles of learned features

[...]

Jose M. Alvarez¹, Yann LeCun¹, Theo Gevers², Antonio M. López³•Institutions (3)

Courant Institute of Mathematical Sciences¹, University of Amsterdam², Autonomous University of Barcelona³

07 Oct 2012

TL;DR: An algorithm based on convolutional neural networks is proposed to learn local features from training data at different scales and resolutions using a weighted linear combination and its performance is similar to state---of---the---art methods using other sources of information such as depth, motion or stereo.

...read moreread less

Abstract: Semantic segmentation refers to the process of assigning an object label (e.g., building, road, sidewalk, car, pedestrian) to every pixel in an image. Common approaches formulate the task as a random field labeling problem modeling the interactions between labels by combining local and contextual features such as color, depth, edges, SIFT or HoG. These models are trained to maximize the likelihood of the correct classification given a training set. However, these approaches rely on hand---designed features (e.g., texture, SIFT or HoG) and a higher computational time required in the inference process. Therefore, in this paper, we focus on estimating the unary potentials of a conditional random field via ensembles of learned features. We propose an algorithm based on convolutional neural networks to learn local features from training data at different scales and resolutions. Then, diversification between these features is exploited using a weighted linear combination. Experiments on a publicly available database show the effectiveness of the proposed method to perform semantic road scene segmentation in still images. The algorithm outperforms appearance based methods and its performance is similar compared to state---of---the---art methods using other sources of information such as depth, motion or stereo.

...read moreread less

Proceedings Article•DOI•

Convolutional Neural Support Vector Machines: Hybrid Visual Pattern Classifiers for Multi-robot Systems

[...]

Jawad Nagi¹, Gianni A. Di Caro¹, Alessandro Giusti¹, Farrukh Nagi, Luca Maria Gambardella¹ - Show less +1 more•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

12 Dec 2012

TL;DR: Experimental results indicate that the CNSVM can be successfully applied to visual learning and recognition of hand gestures as well as to measure learning progress.

...read moreread less

Abstract: We introduce Convolutional Neural Support Vector Machines (CNSVMs), a combination of two heterogeneous supervised classification techniques, Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs). CNSVMs are trained using a Stochastic Gradient Descent approach, that provides the computational capability of online incremental learning and is robust for typical learning scenarios in which training samples arrive in mini-batches. This is the case for visual learning and recognition in multi-robot systems, where each robot acquires a different image of the same sample. The experimental results indicate that the CNSVM can be successfully applied to visual learning and recognition of hand gestures as well as to measure learning progress.

...read moreread less

Book Chapter•DOI•

Multiscale convolutional neural networks for vision: based classification of cells

[...]

Pierre Buyssens¹, Abderrahim Elmoataz¹, Olivier Lezoray¹•Institutions (1)

University of Caen Lower Normandy¹

05 Nov 2012

TL;DR: The proposed approach gives better classification rates than classical state-of-the-art methods allowing a safer Computer---Aided Diagnosis of pleural cancer.

...read moreread less

Abstract: We present a Multiscale Convolutional Neural Network (MCNN) approach for vision---based classification of cells. Based on several deep Convolutional Neural Networks (CNN) acting at different resolutions, the proposed architecture avoid the classical handcrafted features extraction step, by processing features extraction and classification as a whole. The proposed approach gives better classification rates than classical state---of---the---art methods allowing a safer Computer---Aided Diagnosis of pleural cancer.

...read moreread less

Proceedings Article•

Learning Object-Class Segmentation with Convolutional Neural Networks

[...]

Hannes Schulz¹, Sven Behnke¹•Institutions (1)

University of Bonn¹

25 Apr 2012

TL;DR: A convolutional network architecture that includes innovative elements, such as multiple output maps, suitable loss functions, supervised pretraining, multiscale inputs, reused outputs, and pairwise class location lters is proposed.

...read moreread less

Abstract: After successes at image classication, segmentation is the next step towards image understanding for neural networks. We propose a convolutional network architecture that includes innovative elements, such as multiple output maps, suitable loss functions, supervised pretraining, multiscale inputs, reused outputs, and pairwise class location lters. Ex- periments on three data sets show that our method performs on par with current in computer vision methods with regards to accuracy and exceeds them in speed.

...read moreread less

Journal Article•

Persian Signature Verification using Convolutional Neural Networks

[...]

Hurieh Khalajzadeh, Mohammad Mansouri, Mohammad Teshnehlab

29 Apr 2012-International journal of engineering research and technology

TL;DR: In this paper, an offline signature verification scheme based on Convolutional Neural Network (CNN) is proposed and the simulation results reveal the efficiency of the suggested algorithm.

...read moreread less

Abstract: The style of people’s handwritten signature is a biometric feature used in person authentication. In this paper, an offline signature verification scheme based on Convolutional Neural Network (CNN) is proposed. CNN focuses on the problems of feature extraction without prior knowledge on the data. The classification task is performed by Multilayer perceptron network (MLP). This method is not only capable of extracting features relevant to a given signature, but also robust with regard to signature location changes and scale variations when compared to classical methods. The proposed method is evaluated on a dataset of Persian signatures gathered originally from 22 people. The simulation results reveal the efficiency of the suggested algorithm.

...read moreread less

Proceedings Article•DOI•

Convolutional Neural Networks Applied to Human Face Classification

[...]

Brian Cheung¹•Institutions (1)

Cooper Union¹

12 Dec 2012

TL;DR: This work trained a convolutional neural network to distinguish between images of human faces from computer generated avatars as part of the ICMLA 2012 Face Recognition Challenge, and achieved a classification accuracy of 99% on the Avatar CAPTCHA dataset.

...read moreread less

Abstract: Convolutional neural network models have covered a broad scope of computer vision applications, achieving competitive performance with minimal domain knowledge. In this work, we apply such a model to a task designed to deter automated systems. We trained a convolutional neural network to distinguish between images of human faces from computer generated avatars as part of the ICMLA 2012 Face Recognition Challenge. The network achieved a classification accuracy of 99\% on the \textit{Avatar CAPTCHA} dataset. Furthermore, we demonstrated the potential of utilizing support vector machines on the same problem and achieved equally competitive performance.

...read moreread less

Proceedings Article•DOI•

Combining Multi-scale Character Recognition and Linguistic Knowledge for Natural Scene Text OCR

[...]

Khaoula Elagouni, Christophe Garcia, Franck Mamalet, Pascale Sébillot

27 Mar 2012

TL;DR: A novel method to recognize scene texts avoiding the conventional character segmentation step is proposed, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones.

...read moreread less

Abstract: Understanding text captured in real-world scenes is a challenging problem in the field of visual pattern recognition and continues to generate a significant interest in the OCR (Optical Character Recognition) community. This paper proposes a novel method to recognize scene texts avoiding the conventional character segmentation step. The idea is to scan the text image with multi-scale windows and apply a robust recognition model, relying on a neural classification approach, to every window in order to recognize valid characters and identify non valid ones. Recognition results are represented as a graph model in order to determine the best sequence of characters. Some linguistic knowledge is also incorporated to remove errors due to recognition confusions. The designed method is evaluated on the ICDAR 2003 database of scene text images and outperforms state-of-the-art approaches.

...read moreread less

Proceedings Article•DOI•

Local-feature-map Integration Using Convolutional Neural Networks for Music Genre Classification.

[...]

Toru Nakashika¹, Christophe Garcia, Tetsuya Takiguchi¹•Institutions (1)

Kobe University¹

09 Sep 2012

TL;DR: This paper focuses on appropriate feature extraction and proper classiﬁcation by integrating the features using convolutional neural network, and calculates gray level co-occurrence matrix (GLCM) descriptors with different offsets from a short-term mel spectrogram.

...read moreread less

Abstract: A map-based approach, which treats 2-dimensional acoustic features like image processing, has recently attracted attention in music genre classiﬁcation. While this is successful at extracting local music-patterns compared with other timbral-feature-based methods, the extracted features are not sufﬁcient for music genre classiﬁcation. In this paper, we focus on appropriate feature extraction and proper classiﬁcation by integrating the features. For the musical feature extraction, we calculate gray level co-occurrence matrix (GLCM) descriptors with different offsets from a short-term mel spectrogram. These feature maps are integratively classiﬁed using convolutional neural network (CNN). In our experiments, we got a large improvement of more than 10 points in the classiﬁcation accuracy, compared with conventional map-based methods. Index Terms: music genre classiﬁcation, music infor- mation retrieval, music feature extraction, convolutional neural network

...read moreread less

Proceedings Article•DOI•

Handwritten English Word Recognition Based on Convolutional Neural Networks

[...]

Aiquan Yuan¹, Gang Bai¹, Po Yang¹, Yanni Guo¹, Xinting Zhao¹ - Show less +1 more•Institutions (1)

College of Information Technology¹

18 Sep 2012

TL;DR: A novel segmentation-based and lexicon-driven handwritten English recognition systems using convolutional neural networks for offline character recognition and modified online segmentation method based on rules are presented.

...read moreread less

Abstract: This paper presents a novel segmentation-based and lexicon-driven handwritten English recognition systems. For the segmentation, a modified online segmentation method based on rules are applied. Then, convolutional neural networks are introduced for offline character recognition. Experiments are evaluated on UNIPEN lowercase data sets, with the word recognition rate of 92.20%.

...read moreread less

Journal Article•DOI•

Can Deep Neural Networks Discover Meaningful Pattern Features

[...]

Iveta Mrázová¹, Marek Kukacka¹•Institutions (1)

Charles University in Prague¹

01 Jan 2012-Procedia Computer Science

TL;DR: This paper introduces a new sensitivity-based approach capable of picking the right image features from a pre-trained SOM-like feature detector for hand-written digit recognition and shows that pruned network architectures impact a transparent representation of the features actually present in the data while improving network robustness.

...read moreread less

Proceedings Article•DOI•

Embedding Gravitational Search Algorithms in Convolutional Neural Networks for OCR applications

[...]

Lucian-Ovidiu Fedorovici¹, Radu-Emil Precup¹, Florin Dragan¹, Radu-Codrut David¹, Constantin Purcaru¹ - Show less +1 more•Institutions (1)

Politehnica University of Timișoara¹

24 May 2012

TL;DR: A new algorithm consisting of applying first the GSA and next the BP in order to ensure performance improvements by avoiding the algorithms' traps in local minima for a six layer CNN dedicated to OCR applications is presented.

...read moreread less

Abstract: This paper presents aspects concerning embedding Gravitational Search Algorithms (GSAs) in Convolutional Neural Networks (CNNs) for Optical Character Recognition (OCR) systems. The GSAs are used in combination with the Back Propagation (BP) algorithm as optimization algorithms in the training process of a specific CNN architecture for OCR applications. The new algorithm consists of applying first the GSA and next the BP in order to ensure performance improvements by avoiding the algorithms' traps in local minima. A performance analysis for a given benchmark application shows the advantages of our algorithm over the classical BP algorithm for a six layer CNN dedicated to OCR applications.

...read moreread less

Improving Object Classification using Pose Information

[...]

Hugo Penedones, Ronan Collobert, François Fleuret¹, David Grangier•Institutions (1)

Idiap Research Institute¹

01 Jan 2012

TL;DR: This work proposes a method that exploits pose information in order to improve object classification, and investigates both Multi-layer Perceptrons and Convolutional Neural Network architectures, and achieves state-of-the-art results in the challenging NORB dataset.

...read moreread less

Abstract: We propose a method that exploits pose information in order to improve object classification. A lot of research has focused in other strategies, such as engineering feature extractors, trying different classifiers and even using transfer learning. Here, we use neural network architectures in a multi-task setup, whose outputs predict both the class and the camera azimuth. We investigate both Multi-layer Perceptrons and Convolutional Neural Network architectures, and achieve state-of-the-art results in the challenging NORB dataset.

...read moreread less