scispace - formally typeset
Search or ask a question

Showing papers on "Convolutional neural network published in 2006"


Proceedings Article
23 Oct 2006
TL;DR: Three novel approaches to speeding up CNNs are presented: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units).
Abstract: Convolutional neural networks (CNNs) are well known for producing state-of-the-art recognizers for document processing [1]. However, they can be difficult to implement and are usually slower than traditional multi-layer perceptrons (MLPs). We present three novel approaches to speeding up CNNs: a) unrolling convolution, b) using BLAS (basic linear algebra subroutines), and c) using GPUs (graphic processing units). Unrolled convolution converts the processing in each convolutional layer (both forward-propagation and back-propagation) into a matrix-matrix product. The matrix-matrix product representation of CNNs makes their implementation as easy as MLPs. BLAS is used to efficiently compute matrix products on the CPU. We also present a pixel shader based GPU implementation of CNNs. Results on character recognition problems indicate that unrolled convolution with BLAS produces a dramatic 2.4X−3.0X speedup. The GPU implementation is even faster and produces a 3.1X−4.1X speedup.

562 citations


22 Nov 2006
TL;DR: The derivation and implementation of convolutional neural networks are discussed, followed by an extension which allows one to learn sparse combinations of feature maps, and small snippets of MATLAB code are given to accompany the equations.
Abstract: We discuss the derivation and implementation of convolutional neural networks, followed by an extension which allows one to learn sparse combinations of feature maps. The derivation we present is specific to two-dimensional data and convolutions, but can be extended without much additional effort to an arbitrary number of dimensions. Throughout the discussion, we emphasize efficiency of the implementation, and give small snippets of MATLAB code to accompany the equations.

352 citations


Journal ArticleDOI
TL;DR: The proposed approach solves some problems inherent to objective metrics that should predict subjective quality score obtained using the single stimulus continuous quality evaluation (SSCQE) method and relies on the use of a convolutional neural network that allows a continuous time scoring of the video.
Abstract: This paper describes an application of neural networks in the field of objective measurement method designed to automatically assess the perceived quality of digital videos. This challenging issue aims to emulate human judgment and to replace very complex and time consuming subjective quality assessment. Several metrics have been proposed in literature to tackle this issue. They are based on a general framework that combines different stages, each of them addressing complex problems. The ambition of this paper is not to present a global perfect quality metric but rather to focus on an original way to use neural networks in such a framework in the context of reduced reference (RR) quality metric. Especially, we point out the interest of such a tool for combining features and pooling them in order to compute quality scores. The proposed approach solves some problems inherent to objective metrics that should predict subjective quality score obtained using the single stimulus continuous quality evaluation (SSCQE) method. This latter has been adopted by video quality expert group (VQEG) in its recently finalized reduced referenced and no reference (RRNR-TV) test plan. The originality of such approach compared to previous attempts to use neural networks for quality assessment, relies on the use of a convolutional neural network (CNN) that allows a continuous time scoring of the video. Objective features are extracted on a frame-by-frame basis on both the reference and the distorted sequences; they are derived from a perceptual-based representation and integrated along the temporal axis using a time-delay neural network (TDNN). Experiments conducted on different MPEG-2 videos, with bit rates ranging 2-6 Mb/s, show the effectiveness of the proposed approach to get a plausible model of temporal pooling from the human vision system (HVS) point of view. More specifically, a linear correlation criteria, between objective and subjective scoring, up to 0.92 has been obtained on a set of typical TV videos

197 citations


Book ChapterDOI
07 May 2006
TL;DR: The hyperfeatures model is formulated and its performance under several different image coding methods including clustering based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Dirichlet Allocation is studied.
Abstract: Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant and have good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics at scales larger than their local input patches. We present a new multilevel visual representation, ‘hyperfeatures', that is designed to remedy this. The starting point is the familiar notion that to detect object parts, in practice it often suffices to detect co-occurrences of more local object fragments – a process that can be formalized as comparison (e.g. vector quantization) of image patches against a codebook of known fragments, followed by local aggregation of the resulting codebook membership vectors to detect co-occurrences. This process converts local collections of image descriptor vectors into somewhat less local histogram vectors – higher-level but spatially coarser descriptors. We observe that as the output is again a local descriptor vector, the process can be iterated, and that doing so captures and codes ever larger assemblies of object parts and increasingly abstract or ‘semantic' image properties. We formulate the hyperfeatures model and study its performance under several different image coding methods including clustering based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Dirichlet Allocation. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks.

166 citations


Proceedings ArticleDOI
M. Szarvas1, U. Sakai1, J. Ogata1
11 Sep 2006
TL;DR: In this article, a real-time pedestrian detection system utilizing a LIDAR-based object detector and convolutional neural network (CNN)-based image classifier is presented.
Abstract: This paper presents a novel real-time pedestrian detection system utilizing a LIDAR-based object detector and convolutional neural network (CNN)-based image classifier. Our method achieves over 10 frames/second processing speed by constraining the search space using the range information from the LIDAR. The image region candidates detected by the LIDAR are confirmed for the presence of pedestrians by a convolutional neural network classifier. Our CNN classifier achieves high accuracy at a low computational cost thanks to its ability to automatically learn a small number of highly discriminating features. The focus of this paper is the evaluation of the effect of region of interest (ROI) detection on system accuracy and processing speed. The evaluation results indicate that the use of the LIDAR-based ROI detector can reduce the number of false positives by a factor of 2 and reduce the processing time by a factor of 4. The single frame detection accuracy of the system is above 90% when there is 1 false positive per second.

98 citations


Proceedings ArticleDOI
20 Aug 2006
TL;DR: Two detectors, one for face and the other for license plates, are proposed, both based on a modified convolutional neural network (CNN) verifier, and Pyramid-based localization techniques were applied to fuse the candidates and to identify the regions of faces or license plates.
Abstract: In this paper, two detectors, one for face and the other for license plates, are proposed, both based on a modified convolutional neural network(CNN) verifier. In our proposed verifier, a single feature map and a fully connected MLP were trained by examples to classify the possible candidates. Pyramid-based localization techniques were applied to fuse the candidates and to identify the regions of faces or license plates. In addition, geometrical rules filtered out false alarms in license plate detection. Some experimental results are given to show the effectiveness of the approach. Keywords: Face detection, license plate detection, convolution neural network, feature map.

84 citations


Patent
Puri Siddhartha1
17 Aug 2006
TL;DR: In this paper, a convolutional neural network is implemented on a graphics processing unit (GPU) for pixel shader units, and a set of start-to-finish formulas are used to program the computations on the pixel shaders.
Abstract: A convolutional neural network is implemented on a graphics processing unit. The network is then trained through a series of forward and backward passes, with convolutional kernels and bias matrices modified on each backward pass according to a gradient of an error function. The implementation takes advantage of parallel processing capabilities of pixel shader units on a GPU, and utilizes a set of start-to-finish formulas to program the computations on the pixel shaders. Input and output to the program is done through textures, and a multi-pass summation process is used when sums are needed across pixel shader unit registers.

82 citations


Proceedings ArticleDOI
30 Oct 2006
TL;DR: This paper employs shunting inhibitory convolutional neural networks to develop an automatic gender recognition system that achieves a recognition rate of 85.7% when tested on a large set of digital images collected from the Web and BioID face databases.
Abstract: In this paper, we employ shunting inhibitory convolutional neural networks to develop an automatic gender recognition system. The system comprises two modules: a face detector and a gender classifier. The human faces are first detected and localized in the input image. Each detected face is then passed to the gender classifier to determine whether it is a male or female. Both the face detection and gender classification modules employ the same neural network architecture; however, the two modules are trained separately to extract different features for face detection and gender classification. Tested on two different databases, Web and BioID database, the face detector has an average detection accuracy of 97.9%. The gender classifier, on the other hand, achieves 97.2% classification accuracy on the FERET database. The combined system achieves a recognition rate of 85.7% when tested on a large set of digital images collected from the Web and BioID face databases.

68 citations



Proceedings ArticleDOI
20 Aug 2006
TL;DR: By extending the weight-sharing properties of convolutional neural networks to orientations, this paper obtains a neural network that is inherently robust to object rotations, while still being capable to learn optimally discriminant features from training data.
Abstract: Approaches based on local features and descriptors are increasingly used for the task of object recognition due to their robustness with regard to occlusions and geometrical deformations of objects. In this paper we present a local feature based, rotation-invariant Neoperceptron. By extending the weight-sharing properties of convolutional neural networks to orientations, we obtain a neural network that is inherently robust to object rotations, while still being capable to learn optimally discriminant features from training data. The performance of the network is evaluated on a facial expression database and compared to a standard Neoperceptron as well as to the Scale Invariant Feature Transform (SIFT), a-state-of-the-art local descriptor. The results confirm the validity of our approach.

47 citations


Proceedings ArticleDOI
01 Nov 2006
TL;DR: The proposed convolutional neural network has the ability to perform feature extraction and classification within the same architecture, whilst preserving the two-dimensional spatial structure of the input image.
Abstract: In this paper, we propose a convolutional neural network (CoNN) for texture classification. This network has the ability to perform feature extraction and classification within the same architecture, whilst preserving the two-dimensional spatial structure of the input image. Feature extraction is performed using shunting inhibitory neurons, whereas the final classification decision is performed using sigmoid neurons. Tested on images from the Brodatz texture database, the proposed network achieves similar or better classification performance as some of the most popular texture classification approaches, namely Gabor filters, wavelets, quadratic mirror filters (QMF) and co-occurrence matrix methods. Furthermore, The CoNN classifier outperforms these techniques when its output is postprocessed with median filtering.

Proceedings ArticleDOI
20 Aug 2006
TL;DR: This paper proposes to use a class of convolutional neural networks built upon the concepts of local receptive field processing and weight sharing, which makes them more tolerant to distortions and variations in two dimensional shapes for gender classification.
Abstract: Demographic features, such as gender, are very important for human recognition and can be used to enhance social and biometric applications. In this paper, we propose to use a class of convolutional neural networks for gender classification. These networks are built upon the concepts of local receptive field processing and weight sharing, which makes them more tolerant to distortions and variations in two dimensional shapes. Tested on two separate data sets, the proposed networks achieve better classification accuracy than the conventional feedforward multilayer perceptron networks. On the Feret benchmark dataset, the proposed convolutional neural networks achieve a classification rate of 97.1%.

Proceedings Article
23 Oct 2006
TL;DR: A new radical based approach for scaling neural network (NN) recognizers to thousands of East-Asian characters scales well and achieves a low error rate.
Abstract: East-Asian characters possess a rich hierarchical structure with each character comprising a unique spatial arrangement of radicals (sub-characters). In this paper, we present a new radical based approach for scaling neural network (NN) recognizers to thousands of East-Asian characters. The proposed off-line character recognizer comprises neural networks arranged in a graph. Each NN is one of three types: a radical-at-location (RAL) recognizer, a gater, or a combiner. Each radical-atlocation NN is a convolutional neural network that is designed to processes the whole character image and recognize radicals at a specific location in the character. Example locations include left-half, right-half, top-half, bottom-half, left-top quadrant, bottom-right quadrant, etc. Segmentation is completely avoided by allowing each RAL classifier to process the whole character image. Gater-NNs reduce the number of NNs that need to be evaluated at runtime and combiner-NNs combine RAL classifier outputs for final recognition. The proposed approach is tested on a real-world dataset containing 13.4 million handwritten Chinese character samples from 3665 classes. Experimental results indicate that the proposed approach scales well and achieves a low error rate.

Book ChapterDOI
03 Oct 2006
TL;DR: A modified fuzzy min-max(FMM) neural network model for pattern classification, and a real-time face detection method using the proposed model, which is used for the pattern classification stage.
Abstract: In this paper, we introduce a modified fuzzy min-max(FMM) neural network model for pattern classification, and present a real-time face detection method using the proposed model. The learning process of the FMM model consists of three sub-processes: hyperbox creation, expansion and contraction processes. During the learning process, the feature distribution and frequency data are utilized to compensate the hyperbox distortion which may be caused by eliminating the overlapping area of hyperboxes in the contraction process. We present a multi-stage face detection method which is composed of two stages: feature extraction stage and classification stage. The feature extraction module employs a convolutional neural network (CNN) with a Gabor transform layer to extract successively larger features in a hierarchical set of layers. The proposed FMM model is used for the pattern classification stage. Moreover, the model is utilized to select effective feature sets for the skin-color filter of the system.

Proceedings ArticleDOI
30 Oct 2006
TL;DR: Network trained as suggested are highly robust against random changes of synaptic weights occuring on the hardware substrate, and work well even with only three distinct weight values (-1, 0, +1), reducing computational complexity to mere counting.
Abstract: Convolutional neural networks are known to be powerful image classifiers. In this work, a method is proposed for training convolutional networks for implementation on an existing mixed digital-analog VLSI hardware architecture. The binary threshold neurons provided by this architecture cannot be trained using gradient-based methods. The convolutional layers are trained with a clustering method, locally in each layer. The output layer is trained using the Perceptron learning rule. Competitive results are obtained on hand-written digits (MNIST) and traffic signs. The analog hardware enables high integration and low power consumption, but inherent error sources affect the computation accuracy. Networks trained as suggested are highly robust against random changes of synaptic weights occuring on the hardware substrate, and work well even with only three distinct weight values (-1, 0, +1), reducing computational complexity to mere counting.

Book ChapterDOI
31 Aug 2006
TL;DR: A chip-in-the-loop version of the iterative Perceptron rule is introduced for training the output layer, and influences of various types of errors are thoroughly investigated for all network layers, using the MNIST database of hand-written digits as a benchmark.
Abstract: Recently, the authors described a training method for a convolutional neural network of threshold neurons. Hidden layers are trained by by clustering, in a feed-forward manner, while the output layer is trained using the supervised Perceptron rule. The system is designed for implementation on an existing low-power analog hardware architecture, exhibiting inherent error sources affecting the computation accuracy in unspecified ways. One key technique is to train the network on-chip, taking possible errors into account without any need to quantify them. For the hidden layers, an on-chip approach has been applied previously. In the present work, a chip-in-the-loop version of the iterative Perceptron rule is introduced for training the output layer. Influences of various types of errors are thoroughly investigated (noisy, deleted, and clamped weights) for all network layers, using the MNIST database of hand-written digits as a benchmark.

Book ChapterDOI
31 Aug 2006
TL;DR: It is demonstrated how SCNNs can be implemented by successive convolutions of the input image: scanning an image for objects at all possible locations is shown to be possible in real-time using this technique.
Abstract: A new convolutional neural network model termed sparse convolutional neural network (SCNN) is presented and its usefulness for real-time object detection in gray-valued, monocular video sequences is demonstrated. SCNNs are trained on ”raw” gray values and are intended to perform feature selection as a part of regular neural network training. For this purpose, the learning rule is extended by an unsupervised component which performs a local nonlinear principal components analysis: in this way, meaningful and diverse properties can be computed from local image patches. The SCNN model can be used to train classifiers for different object classes which share a common first layer, i.e., a common preprocessing. This is of advantage since the information needs only to be calculated once for all classifiers. It is further demonstrated how SCNNs can be implemented by successive convolutions of the input image: scanning an image for objects at all possible locations is shown to be possible in real-time using this technique.

Journal Article
TL;DR: The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.
Abstract: This article addresses the problem of rotation invariant face detection using convolutional neural networks. Recently, we developed a new class of convolutional neural networks for visual pattern recognition. These networks have a simple network architecture and use shunting inhibitory neurons as the basic computing elements for feature extraction. Three networks with different connection schemes have been developed for in-plane rotation invariant face detection: fully-connected, toeplitz-connected, and binary-connected networks. The three networks are trained using a variant of Levenberg-Marquardt algorithm and tested on a set of 40,000 rotated face patterns. As a face/non-face classifier, these networks achieve 97.3% classification accuracy for a rotation angle in the range ±90° and 95.9% for full in-plane rotation. The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.

Book ChapterDOI
03 Oct 2006
TL;DR: In this article, three networks with different connection schemes have been developed for in-plane rotation invariant face detection: fully-connected, toeplitz-connected and binary-connected networks.
Abstract: This article addresses the problem of rotation invariant face detection using convolutional neural networks. Recently, we developed a new class of convolutional neural networks for visual pattern recognition. These networks have a simple network architecture and use shunting inhibitory neurons as the basic computing elements for feature extraction. Three networks with different connection schemes have been developed for in-plane rotation invariant face detection: fully-connected, toeplitz-connected, and binary-connected networks. The three networks are trained using a variant of Levenberg-Marquardt algorithm and tested on a set of 40,000 rotated face patterns. As a face/non-face classifier, these networks achieve 97.3% classification accuracy for a rotation angle in the range ±900 and 95.9% for full in-plane rotation. The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.

Journal Article
TL;DR: In this paper, a modified fuzzy min-max (FMM) neural network model was proposed for pattern classification, and a real-time face detection method using the proposed model was presented.
Abstract: In this paper, we introduce a modified fuzzy min-max(FMM) neural network model for pattern classification, and present a real-time face detection method using the proposed model. The learning process of the FMM model consists of three sub-processes: hyperbox creation, expansion and contraction processes. During the learning process, the feature distribution and frequency data are utilized to compensate the hyperbox distortion which may be caused by eliminating the overlapping area of hyperboxes in the contraction process. We present a multi-stage face detection method which is composed of two stages: feature extraction stage and classification stage. The feature extraction module employs a convolutional neural network (CNN) with a Gabor transform layer to extract successively larger features in a hierarchical set of layers. The proposed FMM model is used for the pattern classification stage. Moreover, the model is utilized to select effective feature sets for the skin-color filter of the system.

DissertationDOI
01 Jan 2006
TL;DR: This thesis evaluates the feasibility of implementing a convolutional neural network for image classification on a massively parallel low-power hardware system and develops and tests appropriate, gradient-free, training algorithms, combining self-organization and supervised learning.
Abstract: Computing with analog micro electronics can offer several advantages over standard digital technology, most notably: Low space and power consumption and massive parallelization. On the other hand, analog computation lacks the exactness of digital calculations due to inevitable device variations introduced during the chip production, but also due to electric noise in the analog signals. Artificial neural networks are well suited for parallel analog implementations, first, because of their inherent parallelity and second, because they can adapt to device imperfections by training. This thesis evaluates the feasibility of implementing a convolutional neural network for image classification on a massively parallel low-power hardware system. A particular, mixed analogdigital, hardware model is considered, featuring simple threshold neurons. Appropriate, gradient-free, training algorithms, combining self-organization and supervised learning are developed and tested with two benchmark problems (MNIST hand-written digits and traffic signs). Software simulations evaluate the methods under various defined computation faults. A model-free closed-loop technique is shown to compensate for rather serious computation errors without the need for explicit error quantification. Last but not least, the developed networks and the training techniques are verified on a real prototype chip.

Journal ArticleDOI
TL;DR: A new neural network model, namely shunting inhibitory convolutional neural networks, or SICoNNets for short, is applied to the problem of handwritten digit recognition, where the processing is based on the physiologically plausible mechanism of shunting inhibition.
Abstract: In this paper, we apply a new neural network model, namely shunting inhibitory convolutional neural networks, or SICoNNets for short, to the problem of handwritten digit recognition. This type of networks has a generic and flexible architecture, where the processing is based on the physiologically plausible mechanism of shunting inhibition. A hybrid first-order training method, called QRProp, is developed based on the three training algorithms Rprop, Quickprop, and SuperSAB. The MNIST database is used to train and evaluate the performance of SICoNNets in handwritten digit recognition. A network with 24 feature maps and 2722 free parameters achieves a recognition accuracy of 97.3%.

Proceedings Article
01 Jan 2006
TL;DR: Working on un- processed image data only, it is demonstrated that classification accuracies can be improved by the proposed method compared to purely MSE-trained SCNNs and fully-connected multilayer perceptron architectures.
Abstract: A convolutional network architecture termed sparse convolu- tional neural network (SCNN) is proposed and tested on a real-world clas- sification task (car classification). In addition to the error function based on the mean squared error (MSE), approximate decorrelation between hid- den layer neurons is enforced by a weight orthogonalization mechanism. The aim is to obtain a sparse coding of the objects' visual appearance, thus removing the need for a dedicated feature selection stage. Working on un- processed image data only, it is demonstrated that classification accuracies can be improved by the proposed method compared to purely MSE-trained SCNNs and fully-connected multilayer perceptron architectures.

Proceedings ArticleDOI
30 Oct 2006
TL;DR: It is found that a single MSN is sufficient for the applications that require a number of neurons in different hidden layers of a conventional neural network.
Abstract: In this paper, learning algorithm for a single multiplicative spiking neuron (MSN) is proposed and tested for various applications where a multilayer perceptron (MLP) neural network is conventionally used. It is found that a single MSN is sufficient for the applications that require a number of neurons in different hidden layers of a conventional neural network. Several benchmark and real-life problems of classification and function-approximation are illustrated. It has been observed that the inclusion of few more biological phenomenon in artificial neural networks can make them more prevailing.

Proceedings ArticleDOI
07 Jun 2006
TL;DR: Experimental results from the application of digital filters for defects detection in paper pulp production are shown, automatically generated by means of a convolutional neural architecture that uses a modified back-propagation algorithm.
Abstract: Automatic inspection in today's manufacturing is critical to be competitive. In this paper, experimental results from the application of digital filters for defects detection in paper pulp production are shown. These filters have been automatically generated by means of a convolutional neural architecture, that uses a modified back-propagation algorithm. The main subjects discussed are: convolutional top-down spiral architecture, a tool used to automatically generate digital filters, a simple but effective modification to the back-propagation algorithm for this application, and experimental results

Book ChapterDOI
03 Oct 2006
TL;DR: In this article, a pyramid neural network was proposed for classification of visual patterns, which has a hierarchical structure with two types of processing layers, namely pyramidal layers and 1-D layers.
Abstract: We propose a novel neural network for classification of visual patterns The new network, called pyramidal neural network or PyraNet, has a hierarchical structure with two types of processing layers, namely pyramidal layers and 1-D layers The PyraNet is motivated by two concepts: the image pyramids and local receptive fields In the new network, nonlinear 2-D are trained to perform both 2-D analysis and data reduction In this paper, we present a fast training method for the PyraNet that is based on resilient back-propagation and weight decay, and apply the new network to classify gender from facial images

Patent
17 Aug 2006
TL;DR: In this paper, the authors propose a method for training a convolutional neural network (CNN) that consists of the following steps: receiving graphics data representing a state of the CNN and comprising one or more textures representing one or multiple neural network variables, wherein said textures comprise a texture with two-dimensional addressing and at least one of the textures represents a neural network variable with addressing of more than two dimensions.
Abstract: FIELD: information technology. SUBSTANCE: method comprises the following steps: receiving graphics data representing a state of the convolutional neural network and comprising one or more textures representing one or more neural network variables, wherein said textures comprise a texture with two-dimensional addressing, and at least one of the textures represents a neural network variable with addressing of more than two dimensions which has been flattened into two dimensional addressing, the convolutional neural network comprising at least one layer comprising a plurality of patches; executing one or more programs on the graphics processing unit (GPU) in order to perform a forward pass in the convolutional neural network, executing one or more programs to perform a backward pass in the convolutional neural network, the executing including performing convolution operations on the patches; executing one or more programs in order to modify the patches in the convolutional neural network by changing the graphics data based on results of the backward pass; and repeating execution of one or more programs to perform forward passes, backward passes, and to modify the graphics data until the convolutional neural network is trained. EFFECT: lower computational complexity. 17 cl, 9 dwg

Journal ArticleDOI
01 Jun 2006
TL;DR: An image-filtering processor VLSI designed using a 0.35 μm CMOS process that performs 6-bit precision convolutions for an image of 80 × 80 pixels with a receptive-field size of up to 51 ×-51 pixels within 8.2 ms.
Abstract: Image filtering with large receptive-field area is essential for brain-like vision systems. The typical processing model using such filtering is convolutional neural networks (CoNNs). The CoNNs are a well-known robust image-recognition processing model, which imitates the vision nerve system in the brain. To realize such image processing, we have developed an image-filtering processor VLSI. The VLSI designed using a 0.35 μm CMOS process performs 6-bit precision convolutions for an image of 80 × 80 pixels with a receptive-field size of up to 51 × 51 pixels within 8.2 ms. Because the VLSI is based on a hybrid approach using pulse-width modulation (PWM) and digital circuits, low power-consumption of 220 mW has been achieved. Face position detection can be performed within 66 ms by using the developed VLSI.

Book ChapterDOI
01 Jan 2006
TL;DR: Experimental results show that pattern recognition by the proposed method improves the recognition rate considerably and has been compared to other network structures in terms of speed and accuracy and has shown better performance in simulations.
Abstract: This article describes an approach to automatically recognize patterns such as 3D objects and handwritten digits based on a database. The designed system can be used for both 3D object recognition from 2D poses of the object and handwritten digit recognition applications. The system does not require any feature extraction stage before the recognition. Probabilistic Neural Network (PNN) is used for the recognition of the patterns. Experimental results show that pattern recognition by the proposed method improves the recognition rate considerably. The system has been compared to other network structures in terms of speed and accuracy and has shown better performance in simulations.

Book ChapterDOI
07 Aug 2006
TL;DR: A modified version of fuzzy min-max (FMM) neural network for feature analysis and face classification and a relevance factor between features and pattern classes is defined to analyze the saliency of features.
Abstract: In this paper, we present a real-time face detection method based on hybrid neural networks. We propose a modified version of fuzzy min-max (FMM) neural network for feature analysis and face classification. A relevance factor between features and pattern classes is defined to analyze the saliency of features. The measure can be utilized for the feature selection to construct an adaptive skin-color filter. The feature extraction module employs a convolutional neural network (CNN) with a Gabor transform layer to extract successively larger features in a hierarchical set of layers. In this paper we first describe the behavior of the proposed FMM model, and then introduce the feature analysis technique for skin-color filter and pattern classifier.