scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Computer Vision and Pattern Recognition in 2011"


Posted Content
TL;DR: This paper presents a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR), which integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently.
Abstract: Object detection is a fundamental step for automated video analysis in many vision applications. Object detection in a video is usually performed by object detectors or background subtraction techniques. Often, an object detector requires manually labeled examples to train a binary classifier, while background subtraction needs a training sequence that contains no objects to build a background model. To automate the analysis, object detection without a separate training phase becomes a critical task. People have tried to tackle this task by using motion information. But existing motion-based methods are usually limited when coping with complex scenarios such as nonrigid motion and dynamic background. In this paper, we show that above challenges can be addressed in a unified framework named DEtecting Contiguous Outliers in the LOw-rank Representation (DECOLOR). This formulation integrates object detection and background learning into a single process of optimization, which can be solved by an alternating algorithm efficiently. We explain the relations between DECOLOR and other sparsity-based methods. Experiments on both simulated data and real sequences demonstrate that DECOLOR outperforms the state-of-the-art approaches and it can work effectively on a wide range of complex scenarios.

509 citations


Posted Content
TL;DR: In this article, a multi-scale measure of the point cloud dimensionality around each point is defined, which characterizes the local 3D organization, and a probabilistic confidence is given at each point, allowing the user to remove the points for which the classification is uncertain.
Abstract: 3D point clouds of natural environments relevant to problems in geomorphology often require classification of the data into elementary relevant classes. A typical example is the separation of riparian vegetation from ground in fluvial environments, the distinction between fresh surfaces and rockfall in cliff environments, or more generally the classification of surfaces according to their morphology. Natural surfaces are heterogeneous and their distinctive properties are seldom defined at a unique scale, prompting the use of multi-scale criteria to achieve a high degree of classification success. We have thus defined a multi-scale measure of the point cloud dimensionality around each point, which characterizes the local 3D organization. We can thus monitor how the local cloud geometry behaves across scales. We present the technique and illustrate its efficiency in separating riparian vegetation from ground and classifying a mountain stream as vegetation, rock, gravel or water surface. In these two cases, separating the vegetation from ground or other classes achieve accuracy larger than 98 %. Comparison with a single scale approach shows the superiority of the multi-scale analysis in enhancing class separability and spatial resolution. The technique is robust to missing data, shadow zones and changes in point density within the scene. The classification is fast and accurate and can account for some degree of intra-class morphological variability such as different vegetation types. A probabilistic confidence in the classification result is given at each point, allowing the user to remove the points for which the classification is uncertain. The process can be both fully automated, but also fully customized by the user including a graphical definition of the classifiers. Although developed for fully 3D data, the method can be readily applied to 2.5D airborne lidar data.

386 citations


Posted Content
TL;DR: By successfully applying diffusion to LSE, the RD-LSE model is stable by means of the simple finite difference method, which is very easy to implement.
Abstract: This paper presents a novel reaction-diffusion (RD) method for implicit active contours, which is completely free of the costly re-initialization procedure in level set evolution (LSE). A diffusion term is introduced into LSE, resulting in a RD-LSE equation, to which a piecewise constant solution can be derived. In order to have a stable numerical solution of the RD based LSE, we propose a two-step splitting method (TSSM) to iteratively solve the RD-LSE equation: first iterating the LSE equation, and then solving the diffusion equation. The second step regularizes the level set function obtained in the first step to ensure stability, and thus the complex and costly re-initialization procedure is completely eliminated from LSE. By successfully applying diffusion to LSE, the RD-LSE model is stable by means of the simple finite difference method, which is very easy to implement. The proposed RD method can be generalized to solve the LSE for both variational level set method and PDE-based level set method. The RD-LSE method shows very good performance on boundary anti-leakage, and it can be readily extended to high dimensional level set method. The extensive and promising experimental results on synthetic and real images validate the effectiveness of the proposed RD-LSE approach.

174 citations


Posted Content
TL;DR: This paper presents an introduction to BoF image representations, describes critical design choices, and surveys the BoF literature, placing emphasis on recent techniques that mitigate quantization errors, improve fea- ture detection, and speed up image retrieval.
Abstract: The past decade has seen the growing popularity of Bag of Features (BoF) approaches to many computer vision tasks, including image classification, video search, robot localization, and texture recognition. Part of the appeal is simplicity. BoF meth- ods are based on orderless collections of quantized local image descriptors; they discard spatial information and are therefore conceptually and computationally simpler than many alternative methods. Despite this, or perhaps because of this, BoF-based systems have set new performance standards on popular image classification benchmarks and have achieved scalability breakthroughs in image retrieval. This paper presents an introduction to BoF image representations, describes critical design choices, and surveys the BoF literature. Emphasis is placed on recent techniques that mitigate quantization errors, improve fea- ture detection, and speed up image retrieval. At the same time, unresolved issues and fundamental challenges are raised. Among the unresolved issues are determining the best techniques for sampling images, describing local image features, and evaluating system performance. Among the more fundamental challenges are how and whether BoF meth- ods can contribute to localizing objects in complex images, or to associating high-level semantics with natural images. This survey should be useful both for introducing new in- vestigators to the field and for providing existing researchers with a consolidated reference to related work.

169 citations


Journal ArticleDOI
TL;DR: In this article, curved Gabor filters are applied to the curved ridge and valley structure of low-quality fingerprint images for the purpose of enhancing curved structures in noisy images, which locally adapt their shape to the direction of flow.
Abstract: Gabor filters play an important role in many application areas for the enhancement of various types of images and the extraction of Gabor features. For the purpose of enhancing curved structures in noisy images, we introduce curved Gabor filters which locally adapt their shape to the direction of flow. These curved Gabor filters enable the choice of filter parameters which increase the smoothing power without creating artifacts in the enhanced image. In this paper, curved Gabor filters are applied to the curved ridge and valley structure of low-quality fingerprint images. First, we combine two orientation field estimation methods in order to obtain a more robust estimation for very noisy images. Next, curved regions are constructed by following the respective local orientation and they are used for estimating the local ridge frequency. Lastly, curved Gabor filters are defined based on curved regions and they are applied for the enhancement of low-quality fingerprint images. Experimental results on the FVC2004 databases show improvements of this approach in comparison to state-of-the-art enhancement methods.

116 citations


Posted Content
TL;DR: This paper presents a review of a large number of techniques present in the literature for extracting fingerprint minutiae, broadly classified as those working on binarized images and those that work on gray scale images directly.
Abstract: Fingerprints are the oldest and most widely used form of biometric identification. Everyone is known to have unique, immutable fingerprints. As most Automatic Fingerprint Recognition Systems are based on local ridge features known as minutiae, marking minutiae accurately and rejecting false ones is very important. However, fingerprint images get degraded and corrupted due to variations in skin and impression conditions. Thus, image enhancement techniques are employed prior to minutiae extraction. A critical step in automatic fingerprint matching is to reliably extract minutiae from the input fingerprint images. This paper presents a review of a large number of techniques present in the literature for extracting fingerprint minutiae. The techniques are broadly classified as those working on binarized images and those that work on gray scale images directly.

102 citations


Posted Content
TL;DR: In this article, a feedback enabled cascaded classification model (FE-CCM) is proposed to jointly optimize all the sub-tasks, while requiring only a black-box interface to the original classifier for each sub-task.
Abstract: Scene understanding includes many related sub-tasks, such as scene categorization, depth estimation, object detection, etc. Each of these sub-tasks is often notoriously hard, and state-of-the-art classifiers already exist for many of them. These classifiers operate on the same raw image and provide correlated outputs. It is desirable to have an algorithm that can capture such correlation without requiring any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that jointly optimizes all the sub-tasks, while requiring only a `black-box' interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about which error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in the domain of scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection. Our method also improves performance in two robotic applications: an object-grasping robot and an object-finding robot.

94 citations


Journal ArticleDOI
TL;DR: This article applies for the first time this method to fMRI data, and shows that TV regularization is well suited to the purpose of brain mapping while being a powerful tool for brain decoding.
Abstract: While medical imaging typically provides massive amounts of data, the extraction of relevant information for predictive diagnosis remains a difficult challenge. Functional MRI (fMRI) data, that provide an indirect measure of task-related or spontaneous neuronal activity, are classically analyzed in a mass-univariate procedure yielding statistical parametric maps. This analysis framework disregards some important principles of brain organization: population coding, distributed and overlapping representations. Multivariate pattern analysis, i.e., the prediction of behavioural variables from brain activation patterns better captures this structure. To cope with the high dimensionality of the data, the learning method has to be regularized. However, the spatial structure of the image is not taken into account in standard regularization methods, so that the extracted features are often hard to interpret. More informative and interpretable results can be obtained with the l_1 norm of the image gradient, a.k.a. its Total Variation (TV), as regularization. We apply for the first time this method to fMRI data, and show that TV regularization is well suited to the purpose of brain mapping while being a powerful tool for brain decoding. Moreover, this article presents the first use of TV regularization for classification.

90 citations


Posted Content
TL;DR: A new axis-based shape representation scheme along with a matching framework to address the problem of generic shape recognition that captures the perceptual qualities of shapes well and finding the similarities and the differences among shapes becomes easier.
Abstract: This paper presents a new axis-based shape representation scheme along with a matching framework to address the problem of generic shape recognition. The main idea is to define the relative spatial arrangement of local symmetry axes and their metric properties in a shape centered coordinate frame. The resulting descriptions are invariant to scale, rotation, small changes in viewpoint and articulations. Symmetry points are extracted from a surface whose level curves roughly mimic the motion by curvature. By increasing the amount of smoothing on the evolving curve, only those symmetry axes that correspond to the most prominent parts of a shape are extracted. The representation does not suffer from the common instability problems of the traditional connected skeletons. It captures the perceptual qualities of shapes well. Therefore finding the similarities and the differences among shapes becomes easier. The matching process gives highly successful results on a diverse database of 2D shapes.

89 citations


Posted Content
TL;DR: BoostMetric as mentioned in this paper uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process to learn a valid Mahalanobis distance metric.
Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed BoostMetric, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive definite. Semidefinite programming is often used to enforce this constraint, but does not scale well and easy to implement. BoostMetric is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. BoostMetric thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various datasets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time.

73 citations


Posted Content
TL;DR: An overview of DOCR systems is presented and the available DOCR techniques are reviewed in this article, where the current status of the DOCR is discussed and directions for future research are suggested.
Abstract: English Character Recognition (CR) has been extensively studied in the last half century and progressed to a level, sufficient to produce technology driven applications. But same is not the case for Indian languages which are complicated in terms of structure and computations. Rapidly growing computational power may enable the implementation of Indic CR methodologies. Digital document processing is gaining popularity for application to office and library automation, bank and postal services, publishing houses and communication technology. Devnagari being the national language of India, spoken by more than 500 million people, should be given special attention so that document retrieval and analysis of rich ancient and modern Indian literature can be effectively done. This article is intended to serve as a guide and update for the readers, working in the Devnagari Optical Character Recognition (DOCR) area. An overview of DOCR systems is presented and the available DOCR techniques are reviewed. The current status of DOCR is discussed and directions for future research are suggested.

Posted Content
TL;DR: It is shown that the "actionable information gap" between the two can be reduced by exercising control on the sensing process, and therefore, senging, control and information are inextricably tied.
Abstract: This manuscript describes the elements of a theory of information tailored to control and decision tasks and specifically to visual data. The concept of Actionable Information is described, that relates to a notion of information championed by J. Gibson, and a notion of "complete information" that relates to the minimal sufficient statistics of a complete representation. It is shown that the "actionable information gap" between the two can be reduced by exercising control on the sensing process. Thus, senging, control and information are inextricably tied. This has consequences in the so-called "signal-to-symbol barrier" problem, as well as in the analysis and design of active sensing systems. It has ramifications in vision-based control, navigation, 3-D reconstruction and rendering, as well as detection, localization, recognition and categorization of objects and scenes in live video. This manuscript has been developed from a set of lecture notes for a summer course at the First International Computer Vision Summer School (ICVSS) in Scicli, Italy, in July of 2008. They were later expanded and amended for subsequent lectures in the same School in July 2009. Starting on November 1, 2009, they were further expanded for a special topics course, CS269, taught at UCLA in the Spring term of 2010.

Posted Content
TL;DR: A complete Optical Character Recognition system for camera captured image/graphics embedded textual documents for handheld devices that is computationally efficient and consumes low memory so as to be applicable on handheld devices.
Abstract: This paper presents a complete Optical Character Recognition (OCR) system for camera captured image/graphics embedded textual documents for handheld devices. At first, text regions are extracted and skew corrected. Then, these regions are binarized and segmented into lines and characters. Characters are passed into the recognition module. Experimenting with a set of 100 business card images, captured by cell phone camera, we have achieved a maximum recognition accuracy of 92.74%. Compared to Tesseract, an open source desktop-based powerful OCR engine, present recognition accuracy is worth contributing. Moreover, the developed technique is computationally efficient and consumes low memory so as to be applicable on handheld devices.

Posted Content
TL;DR: A theoretic analysis provides a probabilistic generative interpretation for the Correlation Clustering functional, and justifies its intrinsic "model-selection" capability, and suggests several new optimization algorithms which can cope with large scale problems (>100K variables) that are infeasible using existing methods.
Abstract: Clustering is a fundamental task in unsupervised learning. The focus of this paper is the Correlation Clustering functional which combines positive and negative affinities between the data points. The contribution of this paper is two fold: (i) Provide a theoretic analysis of the functional. (ii) New optimization algorithms which can cope with large scale problems (>100K variables) that are infeasible using existing methods. Our theoretic analysis provides a probabilistic generative interpretation for the functional, and justifies its intrinsic "model-selection" capability. Furthermore, we draw an analogy between optimizing this functional and the well known Potts energy minimization. This analogy allows us to suggest several new optimization algorithms, which exploit the intrinsic "model-selection" capability of the functional to automatically recover the underlying number of clusters. We compare our algorithms to existing methods on both synthetic and real data. In addition we suggest two new applications that are made possible by our algorithms: unsupervised face identification and interactive multi-object segmentation by rough boundary delineation.

Posted Content
TL;DR: In this article, the authors present the first method to handle curvature regularity in region-based image segmentation and inpainting that is independent of initialization, which is based on a cell complex and considers basic regions and boundary elements.
Abstract: We present the first method to handle curvature regularity in region-based image segmentation and inpainting that is independent of initialization. To this end we start from a new formulation of length-based optimization schemes, based on surface continuation constraints, and discuss the connections to existing schemes. The formulation is based on a \emph{cell complex} and considers basic regions and boundary elements. The corresponding optimization problem is cast as an integer linear program. We then show how the method can be extended to include curvature regularity, again cast as an integer linear program. Here, we are considering pairs of boundary elements to reflect curvature. Moreover, a constraint set is derived to ensure that the boundary variables indeed reflect the boundary of the regions described by the region variables. We show that by solving the linear programming relaxation one gets quite close to the global optimum, and that curvature regularity is indeed much better suited in the presence of long and thin objects compared to standard length regularity.

Posted Content
TL;DR: The face recognition technology developed in house at face.com is employed to a well accepted benchmark and it is shown that without any tuning the system is able to considerably surpass state of the art results.
Abstract: We employ the face recognition technology developed in house at this http URL to a well accepted benchmark and show that without any tuning we are able to considerably surpass state of the art results. Much of the improvement is concentrated in the high-valued performance point of zero false positive matches, where the obtained recall rate almost doubles the best reported result to date. We discuss the various components and innovations of our system that enable this significant performance gap. These components include extensive utilization of an accurate 3D reconstructed shape model dealing with challenges arising from pose and illumination. In addition, discriminative models based on billions of faces are used in order to overcome aging and facial expression as well as low light and overexposure. Finally, we identify a challenging set of identification queries that might provide useful focus for future research.

Posted Content
Siming Wei, Zhouchen Lin1
TL;DR: It is shown that LRR can be approximated as a factorization method that combines noise removal by column sparse robust PCA and an improved version of LRR, called Robust Shape Interaction (RSI), which uses the corrected data as the dictionary instead of the noisy data.
Abstract: We analyze and improve low rank representation (LRR), the state-of-the-art algorithm for subspace segmentation of data. We prove that for the noiseless case, the optimization model of LRR has a unique solution, which is the shape interaction matrix (SIM) of the data matrix. So in essence LRR is equivalent to factorization methods. We also prove that the minimum value of the optimization model of LRR is equal to the rank of the data matrix. For the noisy case, we show that LRR can be approximated as a factorization method that combines noise removal by column sparse robust PCA. We further propose an improved version of LRR, called Robust Shape Interaction (RSI), which uses the corrected data as the dictionary instead of the noisy data. RSI is more robust than LRR when the corruption in data is heavy. Experiments on both synthetic and real data testify to the improved robustness of RSI.

Journal ArticleDOI
TL;DR: This paper introduces a technique that extrapolates the statistically inferred shape to fit the measurement data using non-linear optimization and ensures that the generated shape is both human-like and satisfies the measurement conditions.
Abstract: The recent advances in 3-D imaging technologies give rise to databases of human shapes, from which statistical shape models can be built. These statistical models represent prior knowledge of the human shape and enable us to solve shape reconstruction problems from partial information. Generating human shape from traditional anthropometric measurements is such a problem, since these 1-D measurements encode 3-D shape information. Combined with a statistical shape model, these easy-to-obtain measurements can be leveraged to create 3D human shapes. However, existing methods limit the creation of the shapes to the space spanned by the database and thus require a large amount of training data. In this paper, we introduce a technique that extrapolates the statistically inferred shape to fit the measurement data using nonlinear optimization. This method ensures that the generated shape is both human-like and satisfies the measurement conditions. We demonstrate the effectiveness of the method and compare it to existing approaches through extensive experiments, using both synthetic data and real human measurements.

Posted Content
TL;DR: In this paper, the authors presented a framework for facilitating virtual aortic stenting from a contrast computer tomography (CT) scan, which may be employed in determining both the appropriateness of intervention as well as the selection and localization of the device.
Abstract: Simulation of arterial stenting procedures prior to intervention allows for appropriate device selection as well as highlights potential complications. To this end, we present a framework for facilitating virtual aortic stenting from a contrast computer tomography (CT) scan. More specifically, we present a method for both lumen and outer wall segmentation that may be employed in determining both the appropriateness of intervention as well as the selection and localization of the device. The more challenging recovery of the outer wall is based on a novel minimal closure tracking algorithm. Our aortic segmentation method has been validated on over 3000 multiplanar reformatting (MPR) planes from 50 CT angiography data sets yielding a Dice Similarity Coefficient (DSC) of 90.67%.

Journal ArticleDOI
TL;DR: In this research, a comparative experiment of 4 methods to identify plants using shape features was accomplished and Polar Fourier Transform gave best performance with 64% in accuracy and outperformed the other methods.
Abstract: Shape is an important aspects in recognizing plants. Several approaches have been introduced to identify objects, including plants. Combination of geometric features such as aspect ratio, compactness, and dispersion, or moments such as moment invariants were usually used toidentify plants. In this research, a comparative experiment of 4 methods to identify plants using shape features was accomplished. Two approaches have never been used in plants identification yet, Zernike moments and Polar Fourier Transform (PFT), were incorporated. The experimental comparison was done on 52 kinds of plants with various shapes. The result, PFT gave best performance with 64% in accuracy and outperformed the other methods.

Posted Content
TL;DR: This paper is mainly focused on conducting an experiment using chain codes technique to perform recognition for different types of fonts used in Malaysian car plates.
Abstract: Summary Various applications of car plate recognition systems have been developed using various kinds of methods and techniques by researchers all over the world. The applications developed were only suitable for specific country due to its standard specification endorsed by the transport department of particular countries. The Road Transport Department of Malaysia also has endorsed a specification for car plates that includes the font and size of characters that must be followed by car owners. However, there are cases where this specification is not followed. Several applications have been developed in Malaysia to overcome this problem. However, there is still problem in achieving 100% recognition accuracy. This paper is mainly focused on conducting an experiment using chain codes technique to perform recognition for different types of fonts used in Malaysian car plates.

Posted Content
TL;DR: An adaptive regularization approach based on the fact that the regularization parameter should be a linear function of noise variance is proposed and the obtained results demonstrate the superiority of the approach compared with existing methods.
Abstract: Crucial information barely visible to the human eye is often embedded in a series of low-resolution images taken of the same scene. Super-resolution enables the extraction of this information by reconstructing a single image, at a high resolution than is present in any of the individual images. This is particularly useful in forensic imaging, where the extraction of minute details in an image can help to solve a crime. Super-resolution image restoration has been one of the most important research areas in recent years which goals to obtain a high resolution (HR) image from several low resolutions (LR) blurred, noisy, under sampled and displaced images. Relation of the HR image and LR images can be modeled by a linear system using a transformation matrix and additive noise. However, a unique solution may not be available because of the singularity of transformation matrix. To overcome this problem, POCS method has been used. However, their performance is not good because the effect of noise energy has been ignored. In this paper, we propose an adaptive regularization approach based on the fact that the regularization parameter should be a linear function of noise variance. The performance of the proposed approach has been tested on several images and the obtained results demonstrate the superiority of our approach compared with existing methods.

Book ChapterDOI
TL;DR: Visual speech recognition (VSR) deals with the visual domain of speech and involves image processing, artificial intelligence, object detection, pattern recognition, statistical modelling, etc and has received a great deal of attention in the last decade.
Abstract: Lip reading is used to understand or interpret speech without hearing it, a technique especially mastered by people with hearing difficulties. The ability to lip read enables a person with a hearing impairment to communicate with others and to engage in social activities, which otherwise would be difficult. Recent advances in the fields of computer vision, pattern recognition, and signal processing has led to a growing interest in automating this challenging task of lip reading. Indeed, automating the human ability to lip read, a process referred to as visual speech recognition (VSR) (or sometimes speech reading), could open the door for other novel related applications. VSR has received a great deal of attention in the last decade for its potential use in applications such as human-computer interaction (HCI), audio-visual speech recognition (AVSR), speaker recognition, talking heads, sign language recognition and video surveillance. Its main aim is to recognise spoken word(s) by using only the visual signal that is produced during speech. Hence, VSR deals with the visual domain of speech and involves image processing, artificial intelligence, object detection, pattern recognition, statistical modelling, etc.

Journal ArticleDOI
TL;DR: A method that combines Polar Fourier Transform, color moments, and vein features to retrieve leaf images based on a leaf image is proposed and shows that the method gave better performance than PNN, SVM, and Fourier transform.
Abstract: This paper proposed a method that combines Polar Fourier Transform, color moments, and vein features to retrieve leaf images based on a leaf image. The method is very useful to help people in recognizing foliage plants. Foliage plants are plants that have various colors and unique patterns in the leaf. Therefore, the colors and its patterns are information that should be counted on in the processing of plant identification. To compare the performance of retrieving system to other result, the experiments used Flavia dataset, which is very popular in recognizing plants. The result shows that the method gave better performance than PNN, SVM, and Fourier Transform. The method was also tested using foliage plants with various colors. The accuracy was 90.80% for 50 kinds of plants.

Posted Content
TL;DR: In this article, a novel approach using Canny, Principal Component Analysis (PCA) and Artificial Neural Network (ANN) was proposed for facial expression classification using the JAFFE database.
Abstract: Facial Expression Classification is an interesting research problem in recent years. There are a lot of methods to solve this problem. In this research, we propose a novel approach using Canny, Principal Component Analysis (PCA) and Artificial Neural Network. Firstly, in preprocessing phase, we use Canny for local region detection of facial images. Then each of local region's features will be presented based on Principal Component Analysis (PCA). Finally, using Artificial Neural Network (ANN)applies for Facial Expression Classification. We apply our proposal method (Canny_PCA_ANN) for recognition of six basic facial expressions on JAFFE database consisting 213 images posed by 10 Japanese female models. The experimental result shows the feasibility of our proposal method.

Posted Content
TL;DR: This method combines 2D Principal Component Analysis (2DPCA), one of the prominent methods for extracting feature vectors, and Support Vector Machine (SVM), the most powerful discriminative method for classification.
Abstract: The paper will present a novel approach for solving face recognition problem. Our method combines 2D Principal Component Analysis (2DPCA), one of the prominent methods for extracting feature vectors, and Support Vector Machine (SVM), the most powerful discriminative method for classification. Experiments based on proposed method have been conducted on two public data sets FERET and ATT the results show that the proposed method could improve the classification rates.

Posted Content
TL;DR: In this paper, two types of combining strategies were evaluated, namely combining skin features and combining skin classifiers, where the outputs of the skin classifier are combined using binary operators such as the AND and the OR operators, "Voting", "Sum of Weights" and a new neural network.
Abstract: Two types of combining strategies were evaluated namely combining skin features and combining skin classifiers. Several combining rules were applied where the outputs of the skin classifiers are combined using binary operators such as the AND and the OR operators, "Voting", "Sum of Weights" and a new neural network. Three chrominance components from the YCbCr colour space that gave the highest correct detection on their single feature MLP were selected as the combining parameters. A major issue in designing a MLP neural network is to determine the optimal number of hidden units given a set of training patterns. Therefore, a "coarse to fine search" method to find the number of neurons in the hidden layer is proposed. The strategy of combining Cb/Cr and Cr features improved the correct detection by 3.01% compared to the best single feature MLP given by Cb-Cr. The strategy of combining the outputs of three skin classifiers using the "Sum of Weights" rule further improved the correct detection by 4.38% compared to the best single feature MLP.

Posted ContentDOI
TL;DR: A simple histogram based approach to segment Devnagari documents is proposed in this paper and various challenges in segmentation of DevNagari script are discussed.
Abstract: Document segmentation is one of the critical phases in machine recognition of any language. Correct segmentation of individual symbols decides the accuracy of character recognition technique. It is used to decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper. Various challenges in segmentation of Devnagari script are also discussed.

Journal ArticleDOI
TL;DR: This letter identifies a central property of trigonometric functions, called shiftability, that allows us to exploit the redundancy inherent in the filtering operations and shows how certain complex filtering can be reduced to simply that of computing the moving sum of a stack of images.
Abstract: It was recently demonstrated in [5] that the non-linear bilateral filter [14] can be efficiently implemented using a constant-time or O(1) algorithm. At the heart of this algorithm was the idea of approximating the Gaussian range kernel of the bilateral filter using trigonometric functions. In this letter, we explain how the idea in [5] can be extended to few other linear and non-linear filters [14, 17, 2]. While some of these filters have received a lot of attention in recent years, they are known to be computationally intensive. To extend the idea in [5], we identify a central property of trigonometric functions, called shiftability, that allows us to exploit the redundancy inherent in the filtering operations. In particular, using shiftable kernels, we show how certain complex filtering can be reduced to simply that of computing the moving sum of a stack of images. Each image in the stack is obtained through an elementary pointwise transform of the input image. This has a two-fold advantage. First, we can use fast recursive algorithms for computing the moving sum [15, 6], and, secondly, we can use parallel computation to further speed up the computation. We also show how shiftable kernels can also be used to approximate the (non-shiftable) Gaussian kernel that is ubiquitously used in image filtering.

Posted Content
TL;DR: It is argued that in order to be optimal for a specific task, the descriptor should take into account the statistics of the corpus of shapes to which it is applied and those of the class of transformations toWhich it is made insensitive (the "noise").
Abstract: Informative and discriminative feature descriptors play a fundamental role in deformable shape analysis. For example, they have been successfully employed in correspondence, registration, and retrieval tasks. In the recent years, significant attention has been devoted to descriptors obtained from the spectral decomposition of the Laplace-Beltrami operator associated with the shape. Notable examples in this family are the heat kernel signature (HKS) and the wave kernel signature (WKS). Laplacian-based descriptors achieve state-of-the-art performance in numerous shape analysis tasks; they are computationally efficient, isometry-invariant by construction, and can gracefully cope with a variety of transformations. In this paper, we formulate a generic family of parametric spectral descriptors. We argue that in order to be optimal for a specific task, the descriptor should take into account the statistics of the corpus of shapes to which it is applied (the "signal") and those of the class of transformations to which it is made insensitive (the "noise"). While such statistics are hard to model axiomatically, they can be learned from examples. Following the spirit of the Wiener filter in signal processing, we show a learning scheme for the construction of optimal spectral descriptors and relate it to Mahalanobis metric learning. The superiority of the proposed approach is demonstrated on the SHREC'10 benchmark.