scispace - formally typeset
Search or ask a question

Showing papers on "Object-class detection published in 2014"


Proceedings ArticleDOI
23 Jun 2014
TL;DR: A novel deformable part-based model is proposed, which exploits both local context around each candidate detection as well as global context at the level of the scene, which significantly helps in detecting objects at all scales.
Abstract: In this paper we study the role of context in existing state-of-the-art detection and segmentation approaches. Towards this goal, we label every pixel of PASCAL VOC 2010 detection challenge with a semantic category. We believe this data will provide plenty of challenges to the community, as it contains 520 additional classes for semantic segmentation and object detection. Our analysis shows that nearest neighbor based approaches perform poorly on semantic segmentation of contextual classes, showing the variability of PASCAL imagery. Furthermore, improvements of exist ing contextual models for detection is rather modest. In order to push forward the performance in this difficult scenario, we propose a novel deformable part-based model, which exploits both local context around each candidate detection as well as global context at the level of the scene. We show that this contextual reasoning significantly helps in detecting objects at all scales.

1,327 citations


Proceedings ArticleDOI
TL;DR: In this paper, a multi-view face detector using aggregate channel features is proposed, which extends the image channel to diverse types like gradient magnitude and oriented gradient histograms and therefore encodes rich information in a simple form.
Abstract: Face detection has drawn much attention in recent decades since the seminal work by Viola and Jones. While many subsequences have improved the work with more powerful learning algorithms, the feature representation used for face detection still can’t meet the demand for effectively and efficiently handling faces with large appearance variance in the wild. To solve this bottleneck, we borrow the concept of channel features to the face detection domain, which extends the image channel to diverse types like gradient magnitude and oriented gradient histograms and therefore encodes rich information in a simple form. We adopt a novel variant called aggregate channel features, make a full exploration of feature design, and discover a multiscale version of features with better performance. To deal with poses of faces in the wild, we propose a multi-view detection approach featuring score re-ranking and detection adjustment. Following the learning pipelines in ViolaJones framework, the multi-view face detector using aggregate channel features surpasses current state-of-the-art detectors on AFW and FDDB testsets, while runs at 42 FPS

288 citations


Journal ArticleDOI
TL;DR: A complete algorithmic description, a learning code and a learned face detector that can be applied to any color image are proposed and a post-processing step is proposed to reduce detection redundancy using a robustness argument.
Abstract: In this article, we decipher the Viola-Jones algorithm, the first ever real-time face detection system. There are three ingredients working in concert to enable a fast and accurate detection: the integral image for feature computation, Adaboost for feature selection and an attentional cascade for efficient computational resource allocation. Here we propose a complete algorithmic description, a learning code and a learned face detector that can be applied to any color image. Since the Viola-Jones algorithm typically gives multiple detections, a post-processing step is also proposed to reduce detection redundancy using a robustness argument. Source Code The source code and the online demo are accessible at the IPOL web page of this article 1 .

259 citations


Journal ArticleDOI
TL;DR: It is shown that the proposed approach boosts the likelihood of correctly identifying the person of interest through the use of different fusion schemes, 3-D face models, and incorporation of quality measures for fusion and video frame selection.
Abstract: As face recognition applications progress from constrained sensing and cooperative subjects scenarios (e.g., driver’s license and passport photos) to unconstrained scenarios with uncooperative subjects (e.g., video surveillance), new challenges are encountered. These challenges are due to variations in ambient illumination, image resolution, background clutter, facial pose, expression, and occlusion. In forensic investigations where the goal is to identify a person of interest, often based on low quality face images and videos, we need to utilize whatever source of information is available about the person. This could include one or more video tracks, multiple still images captured by bystanders (using, for example, their mobile phones), 3-D face models constructed from image(s) and video(s), and verbal descriptions of the subject provided by witnesses. These verbal descriptions can be used to generate a face sketch and provide ancillary information about the person of interest (e.g., gender, race, and age). While traditional face matching methods generally take a single media (i.e., a still face image, video track, or face sketch) as input, this paper considers using the entire gamut of media as a probe to generate a single candidate list for the person of interest. We show that the proposed approach boosts the likelihood of correctly identifying the person of interest through the use of different fusion schemes, 3-D face models, and incorporation of quality measures for fusion and video frame selection.

231 citations


Proceedings ArticleDOI
29 Sep 2014
TL;DR: This work presents a novel approach for detecting objects and estimating their 3D pose in single images of cluttered scenes using a deformable parts-based model and demonstrates successful grasps using the detection and pose estimate with a PR2 robot.
Abstract: We present a novel approach for detecting objects and estimating their 3D pose in single images of cluttered scenes. Objects are given in terms of 3D models without accompanying texture cues. A deformable parts-based model is trained on clusters of silhouettes of similar poses and produces hypotheses about possible object locations at test time. Objects are simultaneously segmented and verified inside each hypothesis bounding region by selecting the set of superpixels whose collective shape matches the model silhouette. A final iteration on the 6-DOF object pose minimizes the distance between the selected image contours and the actual projection of the 3D model. We demonstrate successful grasps using our detection and pose estimate with a PR2 robot. Extensive evaluation with a novel ground truth dataset shows the considerable benefit of using shape-driven cues for detecting objects in heavily cluttered scenes.

222 citations


Proceedings ArticleDOI
Cha Zhang1, Zhengyou Zhang1
24 Mar 2014
TL;DR: A deep convolutional neural network is built that can simultaneously learn the face/nonface decision, the face pose estimation problem, and the facial landmark localization problem and it is shown that such a multi-task learning scheme can further improve the classifier's accuracy.
Abstract: Multiview face detection is a challenging problem due to dramatic appearance changes under various pose, illumination and expression conditions. In this paper, we present a multi-task deep learning scheme to enhance the detection performance. More specifically, we build a deep convolutional neural network that can simultaneously learn the face/nonface decision, the face pose estimation problem, and the facial landmark localization problem. We show that such a multi-task learning scheme can further improve the classifier's accuracy. On the challenging FDDB data set, our detector achieves over 3% improvement in detection rate at the same false positive rate compared with other state-of-the-art methods.

215 citations


Journal ArticleDOI
TL;DR: This paper reduces the uncertainty of the face representation by synthesizing the virtual training samples and devise a representation approach based on the selected useful training samples to perform face recognition that can not only obtain a high face recognition accuracy, but also has a lower computational complexity than the other state-of-the-art approaches.
Abstract: The image of a face varies with the illumination, pose, and facial expression, thus we say that a single face image is of high uncertainty for representing the face In this sense, a face image is just an observation and it should not be considered as the absolutely accurate representation of the face As more face images from the same person provide more observations of the face, more face images may be useful for reducing the uncertainty of the representation of the face and improving the accuracy of face recognition However, in a real world face recognition system, a subject usually has only a limited number of available face images and thus there is high uncertainty In this paper, we attempt to improve the face recognition accuracy by reducing the uncertainty First, we reduce the uncertainty of the face representation by synthesizing the virtual training samples Then, we select useful training samples that are similar to the test sample from the set of all the original and synthesized virtual training samples Moreover, we state a theorem that determines the upper bound of the number of useful training samples Finally, we devise a representation approach based on the selected useful training samples to perform face recognition Experimental results on five widely used face databases demonstrate that our proposed approach can not only obtain a high face recognition accuracy, but also has a lower computational complexity than the other state-of-the-art approaches

163 citations


01 Jan 2014
TL;DR: A method based on features extracted from a Convolutional Neural Network and latent SVM that can represent and exploit the presence of multiple object instances in an image that outperforms the state-of-the-art in weakly-supervised object detection and object classification on the Pascal VOC 2007 dataset.
Abstract: This paper focuses on the problem of object detection when the annotation at training time is restricted to presence or absence of object instances at image level. We present a method based on features extracted from a Convolutional Neural Network and latent SVM that can represent and exploit the presence of multiple object instances in an image. Moreover, the detection of the object instances in the image is improved by incorporating in the learning procedure additional constraints that represent domain-specific knowledge such as symmetry and mutual exclusion. We show that the proposed method outperforms the state-of-the-art in weakly-supervised object detection and object classification on the Pascal VOC 2007 dataset.

158 citations


Journal Article
TL;DR: A brief survey of different object detection, object classification and object tracking algorithms available in the literature including analysis and comparative study of different techniques used for various stages of tracking is presented.
Abstract: The goal of object tracking is segmenting a region of interest from a video scene and keeping track of its motion, positioning and occlusion.The object detection and object classification are preceding steps for tracking an object in sequence of images. Object detection is performed to check existence of objects in video and to precisely locate that object. Then detected object can be classified in various categories such as humans, vehicles, birds, floating clouds, swaying tree and other moving objects. Object tracking is performed using monitoring objects’ spatial and temporal changes during a video sequence, including its presence, position, size, shape, etc.Object tracking is used in several applications such as video surveillance, robot vision, traffic monitoring, Video inpainting and Animation. This paper presents a brief survey of different object detection, object classification and object tracking algorithms available in the literature including analysis and comparative study of different techniques used for various stages of tracking.

156 citations


Book ChapterDOI
01 Nov 2014
TL;DR: The Eigen-PEP model is presented, built upon the recent success of the probabilistic elastic part (PEP) model, which produces an intermediate high dimensional, part-based, and pose-invariant representation of a face subject.
Abstract: To effectively solve the problem of large scale video face recognition, we argue for a comprehensive, compact, and yet flexible representation of a face subject. It shall comprehensively integrate the visual information from all relevant video frames of the subject in a compact form. It shall also be flexible to be incrementally updated, incorporating new or retiring obsolete observations. In search for such a representation, we present the Eigen-PEP that is built upon the recent success of the probabilistic elastic part (PEP) model. It first integrates the information from relevant video sources by a part-based average pooling through the PEP model, which produces an intermediate high dimensional, part-based, and pose-invariant representation. We then compress the intermediate representation through principal component analysis, and only a number of principal eigen dimensions are kept (as small as 100). We evaluate the Eigen-PEP representation both for video-based face verification and identification on the YouTube Faces Dataset and a new Celebrity-1000 video face dataset, respectively. On YouTube Faces, we further improve the state-of-the-art recognition accuracy. On Celebrity-1000, we lead the competing baselines by a significant margin while offering a scalable solution that is linear with respect to the number of subjects.

134 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: This work constructs an efficient boosted exemplar-based face detector which overcomes the defect of the previous work by being faster, more memory efficient, and more accurate.
Abstract: Despite the fact that face detection has been studied intensively over the past several decades, the problem is still not completely solved. Challenging conditions, such as extreme pose, lighting, and occlusion, have historically hampered traditional, model-based methods. In contrast, exemplar-based face detection has been shown to be effective, even under these challenging conditions, primarily because a large exemplar database is leveraged to cover all possible visual variations. However, relying heavily on a large exemplar database to deal with the face appearance variations makes the detector impractical due to the high space and time complexity. We construct an efficient boosted exemplar-based face detector which overcomes the defect of the previous work by being faster, more memory efficient, and more accurate. In our method, exemplars as weak detectors are discriminatively trained and selectively assembled in the boosting framework which largely reduces the number of required exemplars. Notably, we propose to include non-face images as negative exemplars to actively suppress false detections to further improve the detection accuracy. We verify our approach over two public face detection benchmarks and one personal photo album, and achieve significant improvement over the state-of-the-art algorithms in terms of both accuracy and efficiency.

Journal ArticleDOI
TL;DR: This paper proposes a novel method for object detection based on structural feature description and query expansion that is evaluated on high-resolution satellite images and demonstrates its clear advantages over several other object detection methods.
Abstract: Object detection is an important task in very high-resolution remote sensing image analysis. Traditional detection approaches are often not sufficiently robust in dealing with the variations of targets and sometimes suffer from limited training samples. In this paper, we tackle these two problems by proposing a novel method for object detection based on structural feature description and query expansion. The feature description combines both local and global information of objects. After initial feature extraction from a query image and representative samples, these descriptors are updated through an augmentation process to better describe the object of interest. The object detection step is implemented using a ranking support vector machine (SVM), which converts the detection task to a ranking query task. The ranking SVM is first trained on a small subset of training data with samples automatically ranked based on similarities to the query image. Then, a novel query expansion method is introduced to update the initial object model by active learning with human inputs on ranking of image pairs. Once the query expansion process is completed, which is determined by measuring entropy changes, the model is then applied to the whole target data set in which objects in different classes shall be detected. We evaluate the proposed method on high-resolution satellite images and demonstrate its clear advantages over several other object detection methods.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: A dataset of 7, 413 airplanes annotated in detail with parts and their attributes is introduced, leveraging images donated by airplane spotters and crowd-sourcing both the design and collection of the detailed annotations to provide insights that should help researchers interested in designing fine-grained datasets for other basic level categories.
Abstract: We study the problem of understanding objects in detail, intended as recognizing a wide array of fine-grained object attributes. To this end, we introduce a dataset of 7, 413 airplanes annotated in detail with parts and their attributes, leveraging images donated by airplane spotters and crowdsourcing both the design and collection of the detailed annotations. We provide a number of insights that should help researchers interested in designing fine-grained datasets for other basic level categories. We show that the collected data can be used to study the relation between part detection and attribute prediction by diagnosing the performance of classifiers that pool information from different parts of an object. We note that the prediction of certain attributes can benefit substantially from accurate part detection. We also show that, differently from previous results in object detection, employing a large number of part templates can improve detection accuracy at the expenses of detection speed. We finally propose a coarse-to-fine approach to speed up detection through a hierarchical cascade algorithm.

Journal ArticleDOI
TL;DR: A rotation invariant parts-based model to detect objects with complex shape in high-resolution remote sensing images is proposed and the experimental results demonstrate the robustness and precision of the proposed detection model.
Abstract: In this letter, we propose a rotation invariant parts-based model to detect objects with complex shape in high-resolution remote sensing images. Specifically, the geospatial objects with complex shape are firstly divided into several main parts, and the structure information among parts is described and regulated in polar coordinates to achieve the rotation invariance on configuration. Meanwhile, the pose variance of each part relative to the object is also defined in our model. In encoding the features of the rotated parts and objects, a new rotation invariant feature is proposed by extending histogram oriented gradients. During the final detection step, a clustering method is introduced to locate the parts in objects, and that method can also be used to fuse the detection results. By this way, an efficient detection model is constructed and the experimental results demonstrate the robustness and precision of our proposed detection model.

Patent
03 Nov 2014
TL;DR: In this paper, a 3D-aligned face image can be generated from a 2D face image, which can then be used to align face images, classify face images and verify face images using a deep neural network.
Abstract: Systems, methods, and non-transitory computer readable media can align face images, classify face images, and verify face images by employing a deep neural network (DNN). A 3D-aligned face image can be generated from a 2D face image. An identity of the 2D face image can be classified based on provision of the 3D-aligned face image to the DNN. The identity of the 2D face image can comprise a feature vector.

Patent
13 Mar 2014
TL;DR: In this paper, the first representation of a digital image captured by a mobile device and using a processor of the mobile device is used to generate a first feature vector, which is then compared to a plurality of reference feature matrices.
Abstract: In one embodiment, a method includes receiving a digital image captured by a mobile device; and using a processor of the mobile device: generating a first representation of the digital image, the first representation being characterized by a reduced resolution; generating a first feature vector based on the first representation; comparing the first feature vector to a plurality of reference feature matrices; and classifying an object depicted in the digital image as a member of a particular object class based at least in part on the comparing.

Journal ArticleDOI
TL;DR: This paper proposes a scheme to produce the mirror image of the face and integrate the original face image and its mirror image for representation-based face recognition and shows that the proposed scheme can greatly improve the accuracy of the representation- based classification methods.

Journal ArticleDOI
TL;DR: This work proposes a novel fast algorithm for visually salient object detection, robust to real-world illumination conditions, and uses it to extract salient objects which can be efficiently used for training the machine learning-based object detection and recognition unit of the proposed system.
Abstract: Existing object recognition techniques often rely on human labeled data conducting to severe limitations to design a fully autonomous machine vision system. In this work, we present an intelligent machine vision system able to learn autonomously individual objects present in real environment. This system relies on salient object detection. In its design, we were inspired by early processing stages of human visual system. In this context we suggest a novel fast algorithm for visually salient object detection, robust to real-world illumination conditions. Then we use it to extract salient objects which can be efficiently used for training the machine learning-based object detection and recognition unit of the proposed system. We provide results of our salient object detection algorithm on MSRA Salient Object Database benchmark comparing its quality with other state-of-the-art approaches. The proposed system has been implemented on a humanoid robot, increasing its autonomy in learning and interaction with humans. We report and discuss the obtained results, validating the proposed concepts.

Proceedings ArticleDOI
01 Aug 2014
TL;DR: An up-to-date review of face detection methods including feature- based, appearance-based, knowledge-based and template matching, and the effect of applying Haar-like features along with neural networks are presented.
Abstract: Face detection is an interesting area in research application of computer vision and pattern recognition, especially during the past several years. It is also plays a vital role in surveillance systems which is the first steps in face recognition systems. The high degree of variation in the appearance of human faces causes the face detection as a complex problem in computer vision. The face detection systems aimed to decrease false positive rate and increase the accuracy of detecting face especially in complex background images. The main aim of this paper is to present an up-to-date review of face detection methods including feature-based, appearance-based, knowledge-based and template matching. Also, the study presents the effect of applying Haar-like features along with neural networks. We also conclude this paper with some discussions on how the work can be taken further.

Journal ArticleDOI
TL;DR: This paper develops a distributed object detection framework (DOD) by making the best use of spatial-temporal correlation, and develops CHOG-DOD as an instance of DOD framework, a cell-based HOG (CHOG) algorithm, where the features in one cell are not shared with overlapping blocks.
Abstract: In vision and learning, low computational complexity and high generalization are two important goals for video object detection. Low computational complexity here means not only fast speed but also less energy consumption. The sliding window object detection method with linear support vector machines (SVMs) is a general object detection framework. The computational cost is herein mainly paid in complex feature extraction and innerproduct-based classification. This paper first develops a distributed object detection framework (DOD) by making the best use of spatial-temporal correlation, where the process of feature extraction and classification is distributed in the current frame and several previous frames. In each framework, only subfeature vectors are extracted and the response of partial linear classifier (i.e., subdecision value) is computed. To reduce the dimension of traditional block-based histograms of oriented gradients (BHOG) feature vector, this paper proposes a cell-based HOG (CHOG) algorithm, where the features in one cell are not shared with overlapping blocks. Using CHOG as feature descriptor, we develop CHOG-DOD as an instance of DOD framework. Experimental results on detection of hand, face, and pedestrian in video show the superiority of the proposed method.

Proceedings ArticleDOI
12 Jul 2014
TL;DR: A robust liveness detection scheme based on challenge and response method is proposed which is able to detect the liveness when subjected to all attacks except the eye & mouth imposter attack.
Abstract: The recent literature on face recognition technology discusses the issue of face spoofing which can bypass the authentication system by placing a photo/video/mask of the enrolled person in front of the camera. This problem could be minimized by detecting the liveness of the person. Therefore, in this paper, we propose a robust liveness detection scheme based on challenge and response method. The liveness module is added as extra layer of security before the face recognition module. The liveness module utilizes face macro features, especially eye and mouth movements in order to generate random challenges and observing the user's response on account of this. The reliability of liveness module is tested by placing different types of spoofing attacks with various means, like using photograph, videos, etc. In all, five types of attacks have been taken care of and prevented by our system. Experimental results show that system is able to detect the liveness when subjected to all these attacks except the eye & mouth imposter attack. This attack is able to bypass the liveness test but it creates massive changes in face structure. Therefore resultant unrecognized or misclassified by the face recognition module. An experimental test conducted on 65 persons on university of Essex face database confirms that removal of eye and nose components results 75% misclassification.

Journal ArticleDOI
TL;DR: A novel local metric learning algorithm called exemplar metric learning (EML) is designed and an exemplar-based object detection algorithm based on EML is implemented and evaluated.
Abstract: Object detection has been widely studied in the com- puter vision community and it has many real applications, despite its variations, such as scale, pose, lighting, and background. Most classical object detection methods heavily rely on category- based training to handle intra-class variations. In contrast to classical methods that use a rigid category-based representation, exemplar-based methods try to model variations among positives by learning from specific positive samples. However, current existing exemplar-based methods either fail to use any training information or suffer from a significant performance drop when few exemplars are available. In this paper, we design a novel local metric learning approach to well handle exemplar- based object detection task. The main works are two-fold: 1) a novel local metric learning algorithm called exemplar metric learning (EML) is designed and 2) an exemplar-based object detection algorithm based on EML is implemented. We evaluate our method on two generic object detection data sets: UIUC-Car and UMass FDDB. Experiments show that compared with other exemplar-based methods, our approach can effectively enhance object detection performance when few exemplars are available.

Proceedings ArticleDOI
03 Jun 2014
TL;DR: It is proposed to develop an unique algorithm for vehicle data recognition and tracking using Gaussian mixture model and blob detection methods to address the issue of detecting vehicle / traffic data from video frames.
Abstract: Vehicle detection and tracking plays an effective and significant role in the area of traffic surveillance system where efficient traffic management and safety is the main concern. In this paper, we discuss and address the issue of detecting vehicle / traffic data from video frames. Although various researches have been done in this area and many methods have been implemented, still this area has room for improvements. With a view to do improvements, it is proposed to develop an unique algorithm for vehicle data recognition and tracking using Gaussian mixture model and blob detection methods. First, we differentiate the foreground from background in frames by learning the background. Here, foreground detector detects the object and a binary computation is done to define rectangular regions around every detected object. To detect the moving object correctly and to remove the noise some morphological operations have been applied. Then the final counting is done by tracking the detected objects and their regions. The results are encouraging and we got more than 91% of average accuracy in detection and tracking using the Gaussian Mixture Model and Blob Detection methods.

Journal ArticleDOI
Won Jun Kim1, Changick Kim1
TL;DR: The proposed scheme outperforms other previously developed methods in detecting salient regions of the static and dynamic scenes and can be easily extended to various applications, such as image retargeting, object segmentation, and video surveillance.
Abstract: Saliency detection has been extensively studied due to its promising contributions for various computer vision applications. However, most existing methods are easily biased toward edges or corners, which are statistically significant, but not necessarily relevant. Moreover, they often fail to find salient regions in complex scenes due to ambiguities between salient regions and highly textured backgrounds. In this paper, we present a novel unified framework for spatiotemporal saliency detection based on textural contrast. Our method is simple and robust, yet biologically plausible; thus, it can be easily extended to various applications, such as image retargeting, object segmentation, and video surveillance. Based on various datasets, we conduct comparative evaluations of 12 representative saliency detection models presented in the literature, and the results show that the proposed scheme outperforms other previously developed methods in detecting salient regions of the static and dynamic scenes.

Proceedings ArticleDOI
23 Jun 2014
TL;DR: A pedestrian detection approach that uses the same classifier for all pedestrian scales based on image features computed for a single scale, and goes beyond the low level pixel-wise gradient orientation bins and use higher level visual words organized into Word Channels.
Abstract: Most pedestrian detection approaches that achieve high accuracy and precision rate and that can be used for real-time applications are based on histograms of gradient orientations. Usually multiscale detection is attained by resizing the image several times and by recomputing the image features or using multiple classifiers for different scales. In this paper we present a pedestrian detection approach that uses the same classifier for all pedestrian scales based on image features computed for a single scale. We go beyond the low level pixel-wise gradient orientation bins and use higher level visual words organized into Word Channels. Boosting is used to learn classification features from the integral Word Channels. The proposed approach is evaluated on multiple datasets and achieves outstanding results on the INRIA and Caltech-USA benchmarks. By using a GPU implementation we achieve a classification rate of over 10 million bounding boxes per second and a 16 FPS rate for multiscale detection in a 640×480 image.

Posted Content
TL;DR: In this paper, an active search strategy that sequentially chooses the next window to evaluate based on all the information gathered before is proposed. But this strategy is guided by two forces: context and score of the classifier to attract the search to promising areas surrounding a highly scored window, and to keep away from areas near low scored ones.
Abstract: Object class detectors typically apply a window classifier to all the windows in a large set, either in a sliding window manner or using object proposals. In this paper, we develop an active search strategy that sequentially chooses the next window to evaluate based on all the information gathered before. This results in a substantial reduction in the number of classifier evaluations and in a more elegant approach in general. Our search strategy is guided by two forces. First, we exploit context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. This enables to jump across distant regions in the image (e.g. observing a sky region suggests that cars might be far below) and is done efficiently in a Random Forest framework. Second, we exploit the score of the classifier to attract the search to promising areas surrounding a highly scored window, and to keep away from areas near low scored ones. Our search strategy can be applied on top of any classifier as it treats it as a black-box. In experiments with R-CNN on the challenging SUN2012 dataset, our method matches the detection accuracy of evaluating all windows independently, while evaluating 9x fewer windows.

Proceedings ArticleDOI
08 Sep 2014
TL;DR: An Automatic Facial Expression Recognition System (AFERS) has been proposed which gives recognition rate of around 100% which is acceptable compared to other methods.
Abstract: A human-computer interaction system for an automatic face recognition or facial expression recognition has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. In this paper, an Automatic Facial Expression Recognition System (AFERS) has been proposed. The proposed method has three stages: (a) face detection, (b) feature extraction and (c) facial expression recognition. The first phase of face detection involves skin color detection using YCbCr color model, lighting compensation for getting uniformity on face and morphological operations for retaining the required face portion. The output of the first phase is used for extracting facial features like eyes, nose, and mouth using AAM (Active Appearance Model) method. The third stage, automatic facial expression recognition, involves simple Euclidean Distance method. In this method, the Euclidean distance between the feature points of the training images and that of the query image is compared. Based on minimum Euclidean distance, output image expression is decided. True recognition rate for this method is around 90% - 95%. Further modification of this method is done using Artificial Neuro-Fuzzy Inference System (ANFIS). This non-linear recognition system gives recognition rate of around 100% which is acceptable compared to other methods.

Journal ArticleDOI
TL;DR: This letter presents a novel rotation-invariant method for object detection from terrestrial 3-D laser scanning point clouds acquired in complex urban environments, utilizing the Implicit Shape Model to describe object categories, and extending the Hough Forest framework forobject detection in 3- D point clouds.
Abstract: This letter presents a novel rotation-invariant method for object detection from terrestrial 3-D laser scanning point clouds acquired in complex urban environments. We utilize the Implicit Shape Model to describe object categories, and extend the Hough Forest framework for object detection in 3-D point clouds. A 3-D local patch is described by structure and reflectance features and then mapped to the probabilistic vote about the possible location of the object center. Objects are detected at the peak points in the 3-D Hough voting space. To deal with the arbitrary azimuths of objects in real world, circular voting strategy is introduced by rotating the offset vector. To deal with the interference of adjacent objects, distance weighted voting is proposed. Large-scale real-world point cloud data collected by terrestrial mobile laser scanning systems are used to evaluate the performance. Experimental results demonstrate that the proposed method outperforms the state-of-the-art 3-D object detection methods.

Proceedings ArticleDOI
29 Sep 2014
TL;DR: This work takes object outline information from change detection to build 3-D models of rigid objects and represent the scene as static and dynamic components, and can integrate segmentation information from sources other than change detection.
Abstract: We build on recent fast and accurate 3-D reconstruction techniques to segment objects during scene reconstruction. We take object outline information from change detection to build 3-D models of rigid objects and represent the scene as static and dynamic components. Object models are updated online during mapping, and can integrate segmentation information from sources other than change detection.

Proceedings ArticleDOI
01 Nov 2014
TL;DR: The major challenge faced in developing this image processing algorithm was that upon making the test subjects in compliance with the classifier parameters, resizing of the images conceded in the loss of pixel data.
Abstract: Controlling a Robotic arm for applications such as object sorting with the use of vision sensors would need a robust image processing algorithm to recognize and detect the target object. This paper is directed towards the development of the image processing algorithm which is a pre-requisite for the full operation of a pick and place Robotic arm intended for object sorting task For this type of task first the objects are detected, and this is accomplished by feature extraction algorithm. Next, the extracted image (parameters in compliance with the classifier) is sent to the classifier to recognize what object it is and once this is finalized, the output would be the type of the object along with it's coordinates to be ready for the Robotic Arm to execute the pick and place task The major challenge faced in developing this image processing algorithm was that upon making the test subjects in compliance with the classifier parameters, resizing of the images conceded in the loss of pixel data. Therefore, a centered image approach was taken. The accuracy of the classifier developed in this paper was 99.33% and for the feature extraction algorithm, the accuracy was 83.6443%. Finally, the overall system performance of the image processing algorithm developed after experimentation was 82.7162%