scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2005"


Proceedings ArticleDOI
15 Oct 2005
TL;DR: It is shown that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and an alternative is proposed, and a recognition algorithm based on spatio-temporally windowed data is devised.
Abstract: A common trend in object recognition is to detect and leverage the use of sparse, informative feature points. The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation. In this work we develop an extension of these ideas to the spatio-temporal case. For this purpose, we show that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and we propose an alternative. Anchoring off of these interest points, we devise a recognition algorithm based on spatio-temporally windowed data. We present recognition results on a variety of datasets including both human and rodent behavior.

2,699 citations


Journal ArticleDOI
TL;DR: A computationally efficient framework for part-based modeling and recognition of objects, motivated by the pictorial structure models introduced by Fischler and Elschlager, that allows for qualitative descriptions of visual appearance and is suitable for generic recognition problems.
Abstract: In this paper we present a computationally efficient framework for part-based modeling and recognition of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to represent an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We address the problem of using pictorial structure models to find instances of an object in an image as well as the problem of learning an object model from training examples, presenting efficient algorithms in both cases. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.

2,514 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: The performance of the approach constitutes a suggestive plausibility proof for a class of feedforward models of object recognition in cortex and exhibits excellent recognition performance and outperforms several state-of-the-art systems on a variety of image datasets including many different object categories.
Abstract: We introduce a novel set of features for robust object recognition. Each element of this set is a complex feature obtained by combining position- and scale-tolerant edge-detectors over neighboring positions and multiple orientations. Our system's architecture is motivated by a quantitative model of visual cortex. We show that our approach exhibits excellent recognition performance and outperforms several state-of-the-art systems on a variety of image datasets including many different object categories. We also demonstrate that our system is able to learn from very few examples. The performance of the approach constitutes a suggestive plausibility proof for a class of feedforward models of object recognition in cortex.

969 citations


Book
24 Nov 2005
TL;DR: This 2005 book provides a needed review of signal processing theory, the pattern recognition metrics, and the practical application know-how from basic premises and shows both digital and optical implementations.
Abstract: Correlation is a robust and general technique for pattern recognition and is used in many applications, such as automatic target recognition, biometric recognition and optical character recognition The design, analysis and use of correlation pattern recognition algorithms requires background information, including linear systems theory, random variables and processes, matrix/vector methods, detection and estimation theory, digital signal processing and optical processing This 2005 book provides a needed review of this diverse background material and develops the signal processing theory, the pattern recognition metrics, and the practical application know-how from basic premises It shows both digital and optical implementations It also contains technology presented by the team that developed it and includes case studies of significant interest, such as face and fingerprint recognition Suitable for graduate students taking courses in pattern recognition theory, whilst reaching technical levels of interest to the professional practitioner

366 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: A flexible, semi-parametric model for learning probability densities confined to highly non-linear but intrinsically low-dimensional manifolds is proposed, which leads to a statistical formulation of the recognition problem in terms of minimizing the divergence between densities estimated on these manifolds.
Abstract: In many automatic face recognition applications, a set of a person's face images is available rather than a single image. In this paper, we describe a novel method for face recognition using image sets. We propose a flexible, semi-parametric model for learning probability densities confined to highly non-linear but intrinsically low-dimensional manifolds. The model leads to a statistical formulation of the recognition problem in terms of minimizing the divergence between densities estimated on these manifolds. The proposed method is evaluated on a large data set, acquired in realistic imaging conditions with severe illumination variation. Our algorithm is shown to match the best and outperform other state-of-the-art algorithms in the literature, achieving 94% recognition rate on average.

350 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is found that for object classes that have substantial geometric structure, such as airplanes, faces and motorbikes, a relatively small amount of spatial structure in the model can provide statistically indistinguishable recognition performance from more powerful models, and at a substantially lower computational cost.
Abstract: We present a class of statistical models for part-based object recognition that are explicitly parameterized according to the degree of spatial structure they can represent. These models provide a way of relating different spatial priors that have been used for recognizing generic classes of objects, including joint Gaussian models and tree-structured models. By providing explicit control over the degree of spatial structure, our models make it possible to study the extent to which additional spatial constraints among parts are actually helpful in detection and localization, and to consider the tradeoff in representational power and computational cost. We consider these questions for object classes that have substantial geometric structure, such as airplanes, faces and motorbikes, using datasets employed by other researchers to facilitate evaluation. We find that for these classes of objects, a relatively small amount of spatial structure in the model can provide statistically indistinguishable recognition performance from more powerful models, and at a substantially lower computational cost.

338 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: A "parts and structure" model for object category recognition that can be learnt efficiently and in a semi-supervised manner is presented, learnt from example images containing category instances, without requiring segmentation from background clutter.
Abstract: We present a "parts and structure" model for object category recognition that can be learnt efficiently and in a semi-supervised manner: the model is learnt from example images containing category instances, without requiring segmentation from background clutter. The model is a sparse representation of the object, and consists of a star topology configuration of parts modeling the output of a variety of feature detectors. The optimal choice of feature types (whose repertoire includes interest points, curves and regions) is made automatically. In recognition, the model may be applied efficiently in an exhaustive manner, bypassing the need for feature detectors, to give the globally optimal match within a query image. The approach is demonstrated on a wide variety of categories, and delivers both successful classification and localization of the object within the image.

333 citations


Proceedings ArticleDOI
13 Jun 2005
TL;DR: This paper presents a system for fully automatic recognition and reconstruction of 3D objects in image databases, using invariant local features to find matches between all images, and the RANSAC algorithm to find those that are consistent with the fundamental matrix.
Abstract: This paper presents a system for fully automatic recognition and reconstruction of 3D objects in image databases. We pose the object recognition problem as one of finding consistent matches between all images, subject to the constraint that the images were taken from a perspective camera. We assume that the objects or scenes are rigid. For each image, we associate a camera matrix, which is parameterised by rotation, translation and focal length. We use invariant local features to find matches between all images, and the RANSAC algorithm to find those that are consistent with the fundamental matrix. Objects are recognised as subsets of matching images. We then solve for the structure and motion of each object, using a sparse bundle adjustment algorithm. Our results demonstrate that it is possible to recognise and reconstruct 3D objects from an unordered image database with no user input at all.

304 citations


Proceedings ArticleDOI
17 Oct 2005
TL;DR: A method is designed, based on intersecting epipolar constraints, for providing ground truth correspondence automatically and it is found that the combination of Hessian-affine feature finder and SIFT features is most robust to viewpoint change.
Abstract: We explore the performance of a number of popular feature detectors and descriptors in matching 3D object features across viewpoints and lighting conditions. To this end we design a method, based on intersecting epipolar constraints, for providing ground truth correspondence automatically. We collect a database of 100 objects viewed from 144 calibrated viewpoints under three different lighting conditions. We find that the combination of Hessian-affine feature finder and SIFT features is most robust to viewpoint change. Harris-affine combined with SIFT and Hessian-affine combined with shape context descriptors were best respectively for lighting changes and scale changes. We also find that no detector-descriptor combination performs well with viewpoint changes of more than 25-30/spl deg/.

273 citations


Proceedings ArticleDOI
17 Oct 2005
TL;DR: A probabilistic part-based approach for texture and object recognition using a discriminative maximum entropy framework to learn the posterior distribution of the class label given the occurrences of parts from the dictionary in the training set.
Abstract: This paper presents a probabilistic part-based approach for texture and object recognition. Textures are represented using a part dictionary found by quantizing the appearance of scale- or affine- invariant keypoints. Object classes are represented using a dictionary of composite semi-local parts, or groups of neighboring keypoints with stable and distinctive appearance and geometric layout. A discriminative maximum entropy framework is used to learn the posterior distribution of the class label given the occurrences of parts from the dictionary in the training set. Experiments on two texture and two object databases demonstrate the effectiveness of this framework for visual classification.

199 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: The results support the assertion that neither generative or discriminative approach alone will be sufficient for large scale object recognition, and the techniques for combining them are discussed.
Abstract: Many approaches to object recognition are founded on probability theory, and can be broadly characterized as either generative or discriminative according to whether or not the distribution of the image features is modelled. Generative and discriminative methods have very different characteristics, as well as complementary strengths and weaknesses. In this paper we introduce new generative and discriminative models for object detection and classification based on weakly labelled training data. We use these models to illustrate the relative merits of the two approaches in the context of a data set of widely varying images of non-rigid objects (animals). Our results support the assertion that neither approach alone will be sufficient for large scale object recognition, and we discuss techniques for combining them.

Book ChapterDOI
TL;DR: The main purpose of this overview is to describe the recent 3D face recognition algorithms, which hold more information of the face, like surface information, that can be used for face recognition or subject discrimination.
Abstract: Many researches in face recognition have been dealing with the challenge of the great variability in head pose, lighting intensity and direction,facial expression, and aging. The main purpose of this overview is to describe the recent 3D face recognition algorithms. The last few years more and more 2D face recognition algorithms are improved and tested on less than perfect images. However, 3D models hold more information of the face, like surface information, that can be used for face recognition or subject discrimination. Another major advantage is that 3D face recognition is pose invariant. A disadvantage of most presented 3D face recognition methods is that they still treat the human face as a rigid object. This means that the methods aren't capable of handling facial expressions. Although 2D face recognition still seems to outperform the 3D face recognition methods, it is expected that this will change in the near future.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This paper uses the face recognition grand challenge dataset to evaluate hierarchical graph matching (HGM), an universal approach to 2D and 3D face recognition, and shows that HGM yields the best results presented at the recent FRGC workshop, and that 2d face recognition is significantly more accurate than 3DFace recognition and that the fusion of both modalities leads to a further improvement of the 2D results.
Abstract: The extension of 2D image-based face recognition methods with respect to 3D shape information and the fusion of both modalities is one of the main topics in the recent development of facial recognition. In this paper we discuss different strategies and their expected benefit for the fusion of 2D and 3D face recognition. The face recognition grand challenge (FRGC) provides for the first time ever a public benchmark dataset of a suitable size to evaluate the accuracy of both 2D and 3D face recognition. We use this benchmark to evaluate hierarchical graph matching (HGM), an universal approach to 2D and 3D face recognition, and demonstrate the benefit of different fusion strategies. The results show that HGM yields the best results presented at the recent FRGC workshop, that 2D face recognition is significantly more accurate than 3D face recognition and that the fusion of both modalities leads to a further improvement of the 2D results.

01 Feb 2005
TL;DR: A scale-invariant feature selection method that learns to recognize and detect object classes from images of natural scenes that uses local regions to realize robust and sparse part and texture selection invariant to changes in scale, orientation and affine deformation.
Abstract: In this paper, we introduce a scale-invariant feature selection method that learns to recognize and detect object classes from images of natural scenes The first step of our method consists of clustering local scale-invariant descriptors to characterize object class appearance Next, we train on the groups, and perform feature selection to determine the most discriminative parts We use local regions to realize robust and sparse part and texture selection invariant to changes in scale, orientation and affine deformation and, as a result, we avoid image normalization in both training and prediction phases We train our object models without requiring image parts to be labeled or objects to be separated from the background Moreover, our method continues to work well when images have cluttered background and occluded objects We evaluate our method on seven recently proposed datasets, and quantitatively compare the effect of different types of local regions and feature selection criteria on object recognition Our experiments show that local invariant descriptors are an appropriate representation for many different object classes Our results also confirm the importance of appearance-based discriminative feature selection

01 Jan 2005
TL;DR: This paper presents a large-scale evaluation of an approach that represents images as distributions of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover's Distance and the chi-square distance.
Abstract: Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a large-scale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover's Distance and the chi-square distance. We first evaluate the performance of our approach with different keypoint detectors and descriptors, as well as different kernels and classifiers. We then conduct a comparative evaluation with several state-of-the-art recognition methods on four texture and five object databases. On most of these databases, our implementation exceeds the best reported results and achieves comparable performance on the rest. Finally, we investigate the influence of background correlations on recognition performance via extensive tests on the PASCAL database, for which ground-truth object localization information is available. Our experiments demonstrate that image representations based on distributions of local features are surprisingly effective for classification of texture and object images under challenging real-world conditions, including significant intra-class variations and substantial background clutter.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: Using a non-parametric density estimation method over a joint domain-range representation of image pixels, multi-modal spatial uncertainties and complex dependencies between the domain and range are directly modeled and temporal persistence is proposed as a detection criteria.
Abstract: Detecting moving objects using stationary cameras is an important precursor to many activity recognition, object recognition and tracking algorithms. In this paper, three innovations are presented over existing approaches. Firstly, the model of the intensities of image pixels as independently distributed random variables is challenged and it is asserted that useful correlation exists in the intensities of spatially proximal pixels. This correlation is exploited to sustain high levels of detection accuracy in the presence of nominal camera motion and dynamic textures. By using a non-parametric density estimation method over a joint domain-range representation of image pixels, multi-modal spatial uncertainties and complex dependencies between the domain (location) and range (color) are directly modeled. Secondly, temporal persistence is proposed as a detection criteria. Unlike previous approaches to object detection which detect objects by building adaptive models of the only background, the foreground is also modeled to augment the detection of objects (without explicit tracking) since objects detected in a preceding frame contain substantial evidence for detection in a current frame. Third, the background and foreground models are used competitively in a MAP-MRF decision framework, stressing spatial context as a condition of pixel-wise labeling and the posterior function is maximized efficiently using graph cuts. Experimental validation of the proposed method is presented on a diverse set of dynamic scenes.

Journal ArticleDOI
TL;DR: A vision system for robotic object manipulation tasks in natural, domestic environments and one important property is that the step from object recognition to pose estimation is completely automatic combining both appearance and geometric models.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: This work explores a hybrid generative/discriminative approach using 'Fisher kernels' by Jaakkola and Haussler (1999) which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting.
Abstract: Learning models for detecting and classifying object categories is a challenging problem in machine vision. While discriminative approaches to learning and classification have, in principle, superior performance, generative approaches provide many useful features, one of which is the ability to naturally establish explicit correspondence between model components and scene features - this, in turn, allows for the handling of missing data and unsupervised learning in clutter. We explore a hybrid generative/discriminative approach using 'Fisher kernels' by Jaakkola and Haussler (1999) which retains most of the desirable properties of generative methods, while increasing the classification performance through a discriminative setting. Furthermore, we demonstrate how this kernel framework can be used to combine different types of features and models into a single classifier. Our experiments, conducted on a number of popular benchmarks, show strong performance improvements over the corresponding generative approach and are competitive with the best results reported in the literature.

Journal ArticleDOI
TL;DR: A flexible recognition system that can compute the good features for high classification of 3-D real objects is investigated and the recognition performance of classifiers in conjunction with moment-based feature sets is introduced.
Abstract: Moments and functions of moments have been extensively employed as invariant global features of images in pattern recognition. In this study, a flex- ible recognition system that can compute the good features for high classification of 3-D real objects is investigated. For object recognition, regardless of orientation, size and position, feature vectors are computed with the help of nonlinear moment invariant functions. Representations of objects using two-dimensional images that are taken from different angles of view are the main features leading us to our objective. After efficient feature extraction, the main focus of this study, the recog- nition performance of classifiers in conjunction with moment-based feature sets, is introduced.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: This paper presents a Bayesian framework to perform multimodal (such as variations in viewpoint and illumination) face image super-resolution for recognition in tensor space, and integrates the tasks of super- resolution and recognition by directly computing a maximum likelihood identity parameter vector in high-resolution Tensor space for recognition.
Abstract: Face images of non-frontal views under poor illumination resolution reduce dramatically face recognition accuracy. This is evident most compellingly by the very low recognition rate of all existing face recognition systems when applied to live CCTV camera input. In this paper, we present a Bayesian framework to perform multimodal (such as variations in viewpoint and illumination) face image super-resolution for recognition in tensor space. Given a single modal low-resolution face image, we benefit from the multiple factor interactions of training sensor and super-resolve its high-resolution reconstructions across different modalities for face recognition. Instead of performing pixel-domain super-resolution and recognition independently as two separate sequential processes, we integrate the tasks of super-resolution and recognition by directly computing a maximum likelihood identity parameter vector in high-resolution tensor space for recognition. We show results from multi-modal super-resolution and face recognition experiments across different imaging modalities, using low-resolution images as testing inputs and demonstrate improved recognition rates over standard tensorface and eigenface representations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: A geometry assisted probabilistic approach to improve face recognition under pose variation by approximate a human head with a 3D ellipsoid model, which enables the recognition to be conducted by comparing the texture maps instead of the original images, as done in traditional face recognition.
Abstract: Researchers have been working on human face recognition for decades. Face recognition is hard due to different types of variations in face images, such as pose, illumination and expression, among which pose variation is the hardest one to deal with. To improve face recognition under pose variation, this paper presents a geometry assisted probabilistic approach. We approximate a human head with a 3D ellipsoid model, so that any face image is a 2D projection of such a 3D ellipsoid at a certain pose. In this approach, both training and test images are back projected to the surface of the 3D ellipsoid, according to their estimated poses, to form the texture maps. Thus the recognition can be conducted by comparing the texture maps instead of the original images, as done in traditional face recognition. In addition, we represent the texture map as an array of local patches, which enables us to train a probabilistic model for comparing corresponding patches. By conducting experiments on the CMU PIE database, we show that the proposed algorithm provides better performance than the existing algorithms.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: A method capable of recognising one of N objects in log(N) time, which preserves all the strengths of local affine region methods – robustness to background clutter, occlusion, and large changes of viewpoints.
Abstract: Realistic approaches to large scale object recognition, i.e. for detection and localisation of hundreds or more objects, must support sub-linear time indexing. In the paper, we propose a method capable of recognising one of N objects in log(N) time. The ”visual memory” is organised as a binary decision tree that is built to minimise average time to decision. Leaves of the tree represent a few local image areas, and each non-terminal node is associated with a ’weak classifier’. In the recognition phase, a single invariant measurement decides in which subtree a corresponding image area is sought. The method preserves all the strengths of local affine region methods – robustness to background clutter, occlusion, and large changes of viewpoints. Experimentally we show that it supports near real-time recognition of hundreds of objects with state-of-the-art recognition rates. After the test image is processed (in a second on a current PCs), the recognition via indexing into the visual memory requires milliseconds.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: A discrim inative approach to learning object categories which maintains the representational power of generative learning, but trains the generative models in a discriminative manner to realize gains in classification performance is studied.
Abstract: Here we explore a discriminative learning method on underlying generative models for the purpose of discriminating between object categories. Visual recognition algorithms learn models from a set of training examples. Generative models learn their representations by considering data from a single class. Generative models are popular in computer vision for many reasons, including their ability to elegantly incorporate prior knowledge and to handle correspondences between object parts and detected features. However, generative models are often inferior to discriminative models during classification tasks. We study a discriminative approach to learning object categories which maintains the representational power of generative learning, but trains the generative models in a discriminative manner. The discriminatively trained models perform better during classification tasks as a result of selecting discriminative sets of features. We conclude by proposing a multi-class object recognition system which initially trains object classes in a generative manner, identifies subsets of similar classes with high confusion, and finally trains models for these subsets in a discriminative manner to realize gains in classification performance.

Proceedings ArticleDOI
07 Aug 2005
TL;DR: This work provides a framework for learning sequential attention in real-world visual object recognition, using an architecture of three processing stages that integrates local information via shifts of attention, resulting in chains of descriptor-action pairs that characterize object discrimination.
Abstract: This work provides a framework for learning sequential attention in real-world visual object recognition, using an architecture of three processing stages. The first stage rejects irrelevant local descriptors based on an information theoretic saliency measure, providing candidates for foci of interest (FOI). The second stage investigates the information in the FOI using a codebook matcher and providing weak object hypotheses. The third stage integrates local information via shifts of attention, resulting in chains of descriptor-action pairs that characterize object discrimination. A Q-learner adapts then from explorative search and evaluative feedback from entropy decreases on the attention sequences, eventually prioritizing shifts that lead to a geometry of descriptor-action scanpaths that is highly discriminative with respect to object recognition. The methodology is successfully evaluated on indoors (COIL-20 database) and outdoors (TSG-20 database) imagery, demonstrating significant impact by learning, outperforming standard local descriptor based methods both in recognition accuracy and processing time.

Proceedings ArticleDOI
06 Nov 2005
TL;DR: This paper presents a novel object-based video coding framework for videos obtained from a static camera that does not require explicit 2D or 3D models of objects and hence is general enough to cater for varying types of objects in the scene.
Abstract: This paper presents a novel object-based video coding framework for videos obtained from a static camera. As opposed to most existing methods, the proposed method does not require explicit 2D or 3D models of objects and hence is general enough to cater for varying types of objects in the scene. The proposed system detects and tracks objects in the scene and learns the appearance model of each object online using incremental principal component analysis (IPCA). Each object is then coded using the coefficients of the most significant principal components of its learned appearance space. Due to smooth transitions between limited number of poses of an object, usually a limited number of significant principal components contribute to most of the variance in the object's appearance space and therefore only a small number of coefficients are required to code the object. The rigid component of the object's motion is coded in terms of its affine parameters. The framework is applied to compressing videos in surveillance and video phone domains. The proposed method is evaluated on videos containing a variety of scenarios such as multiple objects undergoing occlusion, splitting, merging, entering and exiting, as well as a changing background. Results on standard MPEG-7 videos are also presented. For all the videos, the proposed method displays higher Peak Signal to Noise Ratio (PSNR) compared to MPEG-2 and MPEG-4 methods, and provides comparable or better compression.

Journal ArticleDOI
Aishy Amer1
TL;DR: The contributions in this paper are the real-time two-stage voting strategy, the monitoring of object changes to handle occlusion and object split, and the spatiotemporal adaptation of the tracking parameters.
Abstract: This paper proposes an automatic object tracking method based on both object segmentation and motion estimation for real-time content-oriented video applications. The method focuses on the issues of speed of execution and reliability in the presence of noise, coding artifacts, shadows, occlusion, and object split. Objects are tracked based on the similarity of their features in successive frames. This is done in three steps: feature extraction, object matching, and feature monitoring. In the first step, objects are segmented and their spatial and temporal features are computed. In the second step, using a nonlinear two-stage voting strategy, each object of the previous frame is matched with an object of the current frame creating a unique correspondence. In the third step, object changes, such objects occlusion or split, are monitored and object features are corrected. These new features are then used to update results of previous steps creating module interaction. The contributions in this paper are the real-time two-stage voting strategy, the monitoring of object changes to handle occlusion and object split, and the spatiotemporal adaptation of the tracking parameters. Experiments on indoor and outdoor video shots containing over 6000 frames, including deformable objects, multi-object occlusion, noise, and coding and object segmentation artifacts have demonstrated the reliability and real-time response of the proposed method.

Patent
Jan Erik Solem1, Fredrik Kahl1
11 Aug 2005
TL;DR: In this article, a statistical shape model is used to recover 3D shapes from a 2D representation of the 3D object and compare the recovered 3D shape with known 3D to 2D representations of at least one object of the object class.
Abstract: A method, device, system, and computer program for object recognition of a 3D object of a certain object class using a statistical shape model for recovering 3D shapes from a 2D representation of the 3D object and comparing the recovered 3D shape with known 3D to 2D representations of at least one object of the object class.

Journal ArticleDOI
TL;DR: Major contributions of the proposed system are the ability to automatically initiate an object tracking process, its robustness and invariance towards scaling and translations as well as the computational efficiency since both recognition and pose estimation rely on the same representation of the object.

Proceedings ArticleDOI
15 Oct 2005
TL;DR: This paper presents a novel feature based object representation attributed relational graph (ARG) for reliable object tracking and adopts a competitive and efficient dynamic model to adoptively update the object model by adding new stable features as well as deleting inactive features.
Abstract: Two major problems for model-based object tracking are: 1) how to represent an object so that it can effectively be discriminated with background and other objects; 2) how to dynamically update the model to accommodate the object appearance and structure changes Traditional appearance based representations (like color histogram) fails when the object has rich texture In this paper, we present a novel feature based object representation attributed relational graph (ARG) for reliable object tracking The object is modeled with invariant features (SIFT) and their relationship is encoded in the form of an ARG that can effectively distinguish itself from background and other objects We adopt a competitive and efficient dynamic model to adoptively update the object model by adding new stable features as well as deleting inactive features A relaxation labeling method is used to match the model graph with the observation to gel the best object position Experiments show that our method can get reliable track even under dramatic appearance changes, occlusions, etc

Patent
28 Mar 2005
TL;DR: In this paper, the image object is recognized using pose-specific object recognizers that use outputs from the pose-sensitive object detectors and the fused output of the posespecific object detectors.
Abstract: Methods for image processing for detecting and recognizing an image object include detecting an image object using pose-specific object detectors, and performing fusion of the outputs from the pose-specific object detectors. The image object is recognized using pose-specific object recognizers that use outputs from the pose-specific object detectors and the fused output of the pose-specific object detectors; and by performing fusion of the outputs of the pose-specific object recognizers to recognize the image object.