scispace - formally typeset
Search or ask a question

Showing papers on "Convolutional neural network published in 2008"


Proceedings ArticleDOI
05 Jul 2008
TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.
Abstract: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

5,759 citations


Book ChapterDOI
Amr Ahmed1, Kai Yu, Wei Xu, Yihong Gong, Eric P. Xing1 
12 Oct 2008
TL;DR: This paper presents a framework for training hierarchical feed-forward models for visual recognition, using transfer learning from pseudo tasks, and shows that these pseudo tasks induce an informative inverse-Wishart prior on the functional behavior of the network, offering an effective way to incorporate useful prior knowledge into the network training.
Abstract: Building visual recognition models that adapt across different domains is a challenging task for computer vision. While feature-learning machines in the form of hierarchial feed-forward models (e.g., convolutional neural networks) showed promise in this direction, they are still difficult to train especially when few training examples are available. In this paper, we present a framework for training hierarchical feed-forward models for visual recognition, using transfer learning from pseudo tasks. These pseudo tasks are automatically constructed from data without supervision and comprise a set of simple pattern-matching operations. We show that these pseudo tasks induce an informative inverse-Wishart prior on the functional behavior of the network, offering an effective way to incorporate useful prior knowledge into the network training. In addition to being extremely simple to implement, and adaptable across different domains with little or no extra tuning, our approach achieves promising results on challenging visual recognition tasks, including object recognition, gender recognition, and ethnicity recognition.

163 citations


Patent
17 Nov 2008
TL;DR: In this paper, a computer automated method of selectively identifying a user specified behavior of a crowd is presented. But the method comprises receiving video data but can also include audio data and sensor data.
Abstract: The present invention is directed to a computer automated method of selectively identifying a user specified behavior of a crowd. The method comprises receiving video data but can also include audio data and sensor data. The video data contains images a crowd. The video data is processed to extract hierarchical human and crowd features. The detected crowd features are processed to detect a selectable crowd behavior. The selected crowd behavior detected is specified by a configurable behavior rule. Human detection is provided by a hybrid human detector algorithm which can include Adaboost or convolutional neural network. Crowd features are detected using textual analysis techniques. The configurable crowd behavior for detection can be defined by crowd behavioral language.

143 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: A new technique for the classification of electroencephalographic (EEG) steady-state visual evoked potential (SSVEP) activity for non-invasive BCI is proposed based on a convolutional neural network that includes a Fourier transform between hidden layers in order to switch from the time domain to the frequency domain analysis in the network.
Abstract: In BCI (brain - computer interface) systems, brain signals must be processed to identify distinct activities that convey different mental states. We propose a new technique for the classification of electroencephalographic (EEG) steady-state visual evoked potential (SSVEP) activity for non-invasive BCI. The proposed method is based on a convolutional neural network that includes a Fourier transform between hidden layers in order to switch from the time domain to the frequency domain analysis in the network. The first step allows the creation of different channels. The second step is dedicated to the transformation of the signal in the frequency domain. The last step is the classification. It uses a hybrid rejection strategy that uses a junk class for the mental transition states and thresholds for the confidence values. The presented results with offline processing are obtained with 6 electrodes on 2 subjects with a time segment of 1s. The system is reliable for both subjects over 95%, with rejection criterion.

100 citations


Proceedings Article
01 Jan 2008
TL;DR: This work presents an approach based on convolutional neural networks to detect and localize horizontal text lines from raw color pixels and demonstrates that it can outperform other methods on the real-world test set of ICDAR’03.
Abstract: Text detection is an important preliminary step before text can be recognized in unconstrained image environments. We present an approach based on convolutional neural networks to detect and localize horizontal text lines from raw color pixels. The network learns to extract and combine its own set of features through learning instead of using hand-crafted ones. Learning was also used in order to precisely localize the text lines by simply training the network to reject badly-cut text and without any use of tedious knowledge-based postprocessing. Although the network was trained with synthetic examples, experimental results demonstrated that it can outperform other methods on the real-world test set of ICDAR’03.

74 citations


Proceedings ArticleDOI
28 Oct 2008
TL;DR: This paper shows how convolutional networks can be combined with appropriate image analysis to achieve high accuracies on three very different tasks in breast and gastric cancer grading, despite the challenge of limited training data.
Abstract: Histological analysis on stained biopsy samples requires recognizing many kinds of local and structural details, with some awareness of context. Machine learning algorithms such as convolutional networks can be powerful tools for such problems, but often there may not be enough training data to exploit them to their full potential. In this paper, we show how convolutional networks can be combined with appropriate image analysis to achieve high accuracies on three very different tasks in breast and gastric cancer grading, despite the challenge of limited training data. The three problems are to count mitotic figures in the breast, to recognize epithelial layers in the stomach, and to detect signet ring cells.

38 citations


Book ChapterDOI
03 Sep 2008
TL;DR: The main result is that the ensemble learns to predict 36.9% of the moves made in test expert Go games, improving upon the state of the art, and that the best single convolutional neural network of the ensemble achieves 34% accuracy.
Abstract: Building a strong computer Go player is a longstanding open problem. In this paper we consider the related problem of predicting the moves made by Go experts in professional games. The ability to predict experts' moves is useful, because it can, in principle, be used to narrow the search done by a computer Go player. We applied an ensemble of convolutional neural networks to this problem. Our main result is that the ensemble learns to predict 36.9% of the moves made in test expert Go games, improving upon the state of the art, and that the best single convolutional neural network of the ensemble achieves 34% accuracy. This network has less than 104parameters.

35 citations


Proceedings ArticleDOI
07 Jun 2008
TL;DR: A pattern recognition model for dynamic hand gesture recognition that combines a convolutional neural network with a weighted fuzzy min-max neural network and a feature analysis technique utilizing the WFMM algorithm is proposed.
Abstract: In this paper, a pattern recognition model for dynamic hand gesture recognition is proposed. The proposed model combines a convolutional neural network (CNN) with a weighted fuzzy min-max (WFMM) neural network; each module performs feature extraction and feature analysis, respectively. The data representation proposed in this research is a spatiotemporal template which is based on the motion information of the target object. To process the data, we develop a modified CNN model by extending the receptive field to a three-dimensional structure. To increase the efficiency of the pattern classifier, we use a feature analysis technique utilizing the WFMM algorithm. The experimental results show that the proposed method can minimize the influence caused by the spatial and temporal variation of the feature points. The recognition performance using only the selected features for the classification process is evaluated.

27 citations


Proceedings ArticleDOI
26 Sep 2008
TL;DR: The proposed solution uses convolutional neural networks to implement two classifiers, one for digit recognition and one for numeral strings composed from two digits partially overlapped, which are comparable with the best results from literature.
Abstract: The objective of the present work is to provide an efficient technique for off-line recognition of handwritten numeral strings. It can be used in various applications, like postal code recognition or information extraction from fields of different forms. The proposed solution uses convolutional neural networks (CNNs) to implement two classifiers, one for digit recognition and one for numeral strings composed from two digits partially overlapped. Both classifiers are trained without negative examples. By comparing the results of the classifiers it can decide if the image contains one digit or two partially overlapped digits. The use of the two-digit strings classifier completely relieves our method from the usage of segmentation. The method is evaluated on a well-known numeral strings database - NIST Special Database 19 - and the results are comparable with the best results from literature, even if those are using elaborate segmentation and training with negative examples.

18 citations


Proceedings ArticleDOI
19 Dec 2008
TL;DR: A convolutional neural network architecture designed to recognize license plate directly from pixel images with no preprocessing is proposed, and the image transformation applied on the original license plate to increase the training database is presented.
Abstract: In this paper, a new method was introduced in the Chinese license plate recognition. We propose a convolutional neural network architecture designed to recognize license plate directly from pixel images with no preprocessing. We present the image transformation applied on the original license plate to increase the training database. We also provide experimental results to demonstrate the robustness of our approach and the recognition rate on the license plate and non-license plate testing set.

18 citations


Proceedings Article
01 Feb 2008
TL;DR: The paper presents the face detection approach which combines Haar-like features' cascade for the face candidates' search and the convolutional neural network for the final verification.
Abstract: The paper presents the face detection approach which combines Haar-like features' cascade for the face candidates' search and the convolutional neural network for the final verification. The approach works in the near real -time mode with the extremely low false alarms rate.

Book ChapterDOI
01 Jan 2008
TL;DR: This chapter presents a description of the convolutional neural network architecture, and reports some of the work applying CNNs to theoretical and real-world image processing problems.
Abstract: Summary. Convolutional neural networks (CNNs) represent an interesting method for adaptive image processing, and form a link between general feed-forward neural networks and adaptive filters. Two-dimensional CNNs are formed by one or more layers of two-dimensional filters, with possible non-linear activation functions and/or down-sampling. Convolutional neural networks (CNNs) impose constraints on the weights and connectivity of the network, providing a framework well suited to the processing of spatially or temporally distributed data. CNNs possess key properties of translation invariance and spatially local connections (receptive fields). The socalled “weight-sharing” property of CNNs limits the number of free parameters. Although CNNs have been applied to face and character recognition, it is fair to say that the full potential of CNNs has not yet been realised. This chapter presents a description of the convolutional neural network architecture, and reports some of our work applying CNNs to theoretical and real-world image processing problems.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A new courtesy amount recognition module of CENPARMIpsilas check reading system (CRS) is proposed in this paper and the experimental results show that the recognition rate of the courtesy amount has improved from 41.2% to 74.3%.
Abstract: A new courtesy amount recognition module of CENPARMIpsilas check reading system (CRS) is proposed in this paper. The module consists of 3 main segments: pre-processing, segmentation and recognition, and post-processing. A new feedback-based segmentation algorithm is adopted for the segmentation task. Besides one individual numeral recognizer for numerals from dasia0psila to dasia9psila, one convolutional neural network(CNN) recognizer for ldquo00rdquo and ldquo000rdquo numeral strings is also integrated into our module for the recognition task. The experimental results on the Quebec Bell Check database show that the recognition rate of the courtesy amount has improved from 41.2% to 74.3%.

Journal ArticleDOI
TL;DR: Evaluated on the benchmark CMU rotated face database, the proposed face detection system outperforms some of the existing rotation invariant face detectors; it has fewer false positives and higher detection accuracy.

Proceedings ArticleDOI
13 Jul 2008
TL;DR: HCNN-networks combine the original idea of LeCun's convolutional networks with the benefits of RBF-like neurons in all the layers and with the winner-takes- all mechanism applied during recall, and proved to be capable of considerably speeding-up the training process while maintaining roughly the same performance of the trained networks like original convolutionals.
Abstract: Convolutional neural networks are known to outperform all other neural network models when classifying a wide variety of 2D-shapes. This type of networks supports a massively parallel extraction of low-level features in the processed images. Especially this characteristic is assumed to impact the performance of convolutional networks in character recognition tasks - and in particular when considering scaled, rotated, translated or otherwise deformed patterns. Yet training of convolutional networks is rather time-consuming due to the relatively high complexity of the entire model. To speed-up the training process, we will propose a new variant of convolutional networks - the so-called hybrid convolutional neural network (HCNN). HCNN-networks combine the original idea of LeCun's convolutional networks with the benefits of RBF-like neurons in all the layers and with the winner-takes- all mechanism applied during recall. In the tests done so far in hand-written digit recognition, HCNN proved to be capable of considerably speeding-up the training process while maintaining roughly the same performance of the trained networks like original convolutional networks.

Book ChapterDOI
01 Jan 2008
TL;DR: A new framework that can automatically generate a relevance map from sensory data that can represent knowledge regarding objects and infer new knowledge about novel objects is presented, based on understating of the visual what pathway in the authors' brain.
Abstract: Knowledge-based clustering and autonomous mental development remains a high priority research topic, among which the learning techniques of neural networks are used to achieve optimal performance. In this paper, we present a new framework that can automatically generate a relevance map from sensory data that can represent knowledge regarding objects and infer new knowledge about novel objects. The proposed model is based on understating of the visual what pathway in our brain. A bottom-up attention model can selectively decide salient object areas. Color and form features for a selected object are generated by a sparse coding mechanism by a convolution neural network (CNN). Using the extracted features by the CNN as inputs, the incremental knowledge representation model, called the growing fuzzy topology adaptive resonant theory (TART) network, makes clusters for the construction of an ontology map in the color and form domains. The clustered information is relevant to describe specific objects, and the proposed model can automatically infer an unknown object by using the learned information. Experimental results with real data have demonstrated the validity of this approach.

Journal ArticleDOI
TL;DR: An off-line cursive word recognition system based completely on neural networks: reading models and models of early visual processing.
Abstract: We present an off-line cursive word recognition system based completely on neural networks: reading models and models of early visual processing. The first stage (normalization) preprocesses the input image in order to reduce letter position uncertainty; the second stage (feature extraction) is based on the feedforward model of orientation selectivity; the third stage (letter pre-recognition) is based on a convolutional neural network, and the last stage (word recognition) is based on the interactive activation model.

Proceedings ArticleDOI
10 Oct 2008
TL;DR: Proposed solution uses convolutional neural networks (CNNs) and rely on very light preprocessing avoiding segmentation for off-line recognition of handwritten numerals composed from two digits partially overlapped.
Abstract: The objective of the present work is to provide an efficient and reliable technique for off-line recognition of handwritten numerals composed from two digits partially overlapped. It can be used in various applications, like postal code recognition or information extraction from fields of different forms. Proposed solution uses convolutional neural networks (CNNs) and rely on very light preprocessing avoiding segmentation. Test results on a comprehensive well-known character database -NIST SD 19- show a high degree of recognition accuracy.

Proceedings ArticleDOI
01 Jun 2008
TL;DR: The proposed pedestrian detection system is evaluated on the DaimlerChrysler pedestrian classification benchmark database and its performance is compared to the performance of support vector machines and Adaboost classifiers.
Abstract: In this paper, we present a biologically inspired method for detecting pedestrians in images. The method is based on a convolutional neural network architecture, which combines feature extraction and classification. The proposed network architecture is much simpler and easier to train than earlier versions. It differs from its predecessors in that the first processing layer consists of a set of pre-defined nonlinear derivative filters for computing gradient information. The subsequent processing layer has trainable shunting inhibitory feature detectors, which are used as inputs to a pattern classifier. The proposed pedestrian detection system is evaluated on the DaimlerChrysler pedestrian classification benchmark database and its performance is compared to the performance of support vector machines and Adaboost classifiers.

Dissertation
16 Dec 2008
TL;DR: A new method of binarization of text pictures, a new method for segmentation of text images, the study of a convolutional neural network for character recognition in images, a discussion on the relevance of the binarized step in the recognition of text in images based on machine learning methods, and a new methods of text recognition in pictures based on graph theory are proposed.
Abstract: Thanks to increasingly powerful storage media, multimedia resources have become nowadays essential resources, in the field of information and broadcasting (News Agency, INA), culture (museums), transport (monitoring), environment (satellite images), or medical imaging (medical records in hospitals). Thus, the challenge is how to quickly find relevant information. Therefore, research in multimedia is increasingly focused on indexing and retrieval techniques. To accomplish this task, the text within images and videos can be a relevant key. The challenges of recognizing text in images and videos are many: poor resolution, characters of different sizes, artifacts due to compression and effects of anti-recovery, very complex and variable background. There are four steps for the recognition of the text: (1) detecting the presence of the text, (2) localizing of the text, (3) extracting and enhancing the text area, and finally (4) recognizing the content of the text. In this work we will focus on this last step and we assume that the text box has been detected, located and retrieved correctly. This recognition module can also be divided into several sub-modules such as a binarization module, a text segmentation module, a character recognition module. We focused on a particular machine learning algorithm called convolutional neural networks (CNNs). These are networks of neurons whose topology is similar to the mammalian visual cortex. CNNs were initially used for recognition of handwritten digits. They were then applied successfully on many problems of pattern recognition. We propose in this thesis a new method of binarization of text images, a new method for segmentation of text images, the study of a convolutional neural network for character recognition in images, a discussion on the relevance of the binarization step in the recognition of text in images based on machine learning methods, and a new method of text recognition in images based on graph theory.

Journal ArticleDOI
TL;DR: The design of the proposed Convolutional Neural Network architecture for face image recognition takes the constraints on the bandwidth of the communications between memory and processor into the account, and two methods of segmentation are used to buffer the image data required for these parallel to sequential calculations from the image RAM to multi-port RAMs.
Abstract: The design of the proposed Convolutional Neural Network (CNN) architecture for face image recognition takes the constraints on the bandwidth of the communications between memory and processor into the account. The coarse grained parallelism which performed in the bottom layer node's calculations is reduced in consequent manner until the calculation of one simple node in the upper layer is achieved sequentially. Two methods of segmentation are used to buffer the image data required for these parallel to sequential calculations from the image RAM to multi-port RAMs. A comparison between these two methods with respect to the whole number of RAM access required to generate the system recognition code is performed. A speedup of 44 is achieved when the hardware system is implemented with the using of the 1 st method of segmentation as compared to a Pentium 4, 2.4 GHz sequential computer software implementation. While a speedup of 88 is achieved when the same hardware system is implemented but with the using of the 2 nd segmentation method,

Proceedings ArticleDOI
18 Jun 2008
TL;DR: The proposed modular system architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.
Abstract: A system for classification of visual information based on a topology synthesizing approach is presented. The topology synthesizing approach automatically creates a relevance map from essential regions of visual information. It also derives a set of well-organized representations from low-level description to drive the final classification. The backbone of the topology synthesizing approach is a mapping strategy involving two basic modules: structured low-level feature extraction using convolution neural network and a topology representation module based on a self-organizing tree algorithm. Classification is achieved by simulating high-level top-down visual information perception and classifying using an incremental Bayesian parameter estimation method. The proposed modular system architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.

01 Jan 2008
TL;DR: The proposed CNNs allow the proposed system to be robust to variation in illumination and in both shape and scale of pedestrians; and proposed methods of setting ROIs and tracking pedestrians allow this system to detect a dangerous situation and warn it to a driver fast.
Abstract: This paper proposes a method of achieving fast detection of pedestrians, while simultaneously maintaining good performance regardless of variation in illumination, and in both shape and scale of pedestrians with a single camera. Regions of interest (ROIs) are acquired by optical flow fields using the Lucas-Kanade algorithm, and classified by convolutional neural networks (CNNs) whether they are pedestrians. Detected pedestrians are tracked by using a particle filter based on adaptive fusion frameworks. The CNNs allow the proposed system to be robust to variation in illumination and in both shape and scale of pedestrians; and proposed methods of setting ROIs and tracking pedestrians allow this system to detect a dangerous situation and warn it to a driver fast. A single camera is only used to conduct this method, thus, the proposed system is also economically efficient.

Proceedings ArticleDOI
03 Mar 2008
TL;DR: A compact, low cost, real-time CMOS hardware architecture for face detection based on a VLSI-friendly implementation of Shunting Inhibitory Convolutional Neural Networks (SICoNN).
Abstract: In this paper, we present a compact, low cost, real-time CMOS hardware architecture for face detection. The proposed architecture is based on a VLSI-friendly implementation of Shunting Inhibitory Convolutional Neural Networks (SICoNN). Reported experimental results show that the proposed architecture can detect faces with 93% detection accuracy at 5% false alarm rate. A VLSI Systolic architecture was considered to further optimize the design in terms of execution speed, power dissipation and area. Potential applications of the proposed face detection hardware include consumer electronics, security, monitoring and head-counting.

Proceedings ArticleDOI
25 Aug 2008
TL;DR: A kind of topology creation strategy for image analysis and classification that automatically generates a relevance map from essential regions of natural images and derives a set of well-structured representations from low-level description to drive the final classification.
Abstract: A kind of topology creation strategy for image analysis and classification is presented. The topology creation strategy automatically generates a relevance map from essential regions of natural images. It also derives a set of well-structured representations from low-level description to drive the final classification. The backbone of the topology creation strategy is a distribution mapping rule involving two basic modules: structured low-level feature extraction using convolution neural network and a topology creation module based on a hypersphere neural network. Classification is achieved by simulating high-level top-down visual information perception and classifying using an incremental Bayesian parameter estimation method. The proposed modular system architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.

Proceedings Article
22 Jul 2008
TL;DR: This work presents the application results of a new architecture based on convolutional neural networks, named Convolutional Downward Spiral Architecture (CDSA), that generates digital filters automatically, which can be applied in a wide range of inspection systems.
Abstract: Adaptive learning is an important neural network characteristic; this means that they learn how to take care of difficult tasks by learning through illustrative samples of the problem to solve. Since neural networks can learn to tell the difference among many patterns by samples and training, there is no need to elaborate an a priori model, neither to develop specific probability distribution functions. This work presents the application results of a new architecture based on convolutional neural networks, named Convolutional Downward Spiral Architecture (CDSA), that generates digital filters automatically, which can be applied in a wide range of inspection systems.