scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2002"


Journal ArticleDOI
TL;DR: This research demonstrates that the LEM, together with the proposed generic line-segment Hausdorff distance measure, provides a new method for face coding and recognition.
Abstract: The automatic recognition of human faces presents a significant challenge to the pattern recognition research community. Typically, human faces are very similar in structure with minor differences from person to person. They are actually within one class of "human face". Furthermore, lighting conditions change, while facial expressions and pose variations further complicate the face recognition task as one of the difficult problems in pattern analysis. This paper proposes a novel concept: namely, that faces can be recognized using a line edge map (LEM). The LEM, a compact face feature, is generated for face coding and recognition. A thorough investigation of the proposed concept is conducted which covers all aspects of human face recognition, i.e. face recognition under (1) controlled/ideal conditions and size variations, (2) varying lighting conditions, (3) varying facial expressions, and (4) varying pose. The system performance is also compared with the eigenface method, one of the best face recognition techniques, and with reported experimental results of other methods. A face pre-filtering technique is proposed to speed up the search process. It is a very encouraging to find that the proposed face recognition technique has performed better than the eigenface method in most of the comparison experiments. This research demonstrates that the LEM, together with the proposed generic line-segment Hausdorff distance measure, provides a new method for face coding and recognition.

505 citations


Proceedings ArticleDOI
11 Aug 2002
TL;DR: An interactive vision system for a robot that finds an object specified by a user and brings it to the user and the user may provide additional information via speech such as pointing out mistakes and choosing the correct object from multiple candidates.
Abstract: This paper describes an interactive vision system for a robot that finds an object specified by a user and brings it to the user. The system first registers object models automatically. When the user specifies an object, the system tries to recognize the object automatically. When the recognition result is shown to the user, the user may provide additional information via speech such as pointing out mistakes, choosing the correct object from multiple candidates, or giving the relative position of the object. Based on the advice, the system tries again to recognize the object. Experiments are described using real-world refrigerator scenes.

241 citations


01 Jan 2002
TL;DR: In this paper, Principal Component Analysis (PCA) is applied to a set of training similarity plots, mapping them to a lower dimensional space that contains less unwanted variation and offers better separability of the data.
Abstract: We present a novel technique for motion-based recognition of individual gaits in monocular sequences. Recent work has suggested that the image self-similarity plot of a moving person/object is a projection of its planar dynamics. Hence we expect that these plots encode much information about gait motion patterns, and that they can serve as good discriminants between gaits of different people. We propose a method for gait recognition that uses similarity plots the same way that face images are used in eigenface-based face recognition techniques. Specifically, we first apply Principal Component Analysis (PCA) to a set of training similarity plots, mapping them to a lower dimensional space that contains less unwanted variation and offers better separability of the data. Recognition of a new gait is then done via standard pattern classification of its corresponding similarity plot within this simpler space. We use the k-nearest neighbor rule and the Euclidian distance. We test this method on a data set of 40 sequences of six different walking subjects, at 30 FPS each.We use the leave-one-out cross-validation technique to obtain an unbiased estimate of the recognition rate of 93%.

232 citations


Proceedings ArticleDOI
01 Jan 2002
TL;DR: A novel approach to appearance based object recognition based on matching of local image features, reliably recognises objects under very different viewing conditions that is invariant to piecewise-affine image deformations, but still remains very discriminative.
Abstract: A novel approach to appearance based object recognition is introduced. The proposed method, based on matching of local image features, reliably recognises objects under very different viewing conditions. First, distinguished regions of data-dependent shape are robustly detected. On these regions, local affine frames are established using several affine invariant constructions. Direct comparison of photometrically normalised colour intensities in local, geometrically aligned frames results in a matching scheme that is invariant to piecewise-affine image deformations, but still remains very discriminative. The potential of the approach is experimentally verified on COIL-100 and SOIL-47 ‐ publicly available image databases. On SOIL-47, 100% recognition rate is achieved for single training view per object. On COIL-100, 99.9% recognition rate is obtained for 18 training views per object. Robustness to severe occlusions is demonstrated by only a moderate decrease of recognition performance in an experiment where half of each test image is erased.

231 citations


Proceedings ArticleDOI
20 May 2002
TL;DR: This work proposes a face recognition algorithm which can use any number of gallery images per subject, captured at arbitrary poses, andAny number of probe images, again captured at arbitrarily poses, to solve face recognition problems.
Abstract: In many face recognition tasks, the pose of the probe and gallery images are different. In other cases, multiple gallery or probe images may be available, each captured from a different pose. We propose a face recognition algorithm which can use any number of gallery images per subject, captured at arbitrary poses, and any number of probe images, again captured at arbitrary poses. The algorithm operates by estimating the eigen light-field of the subject's head from the input gallery or probe images. Matching between the probe and gallery is then performed using the eigen light-fields. We present results on the CMU (Carnegie-Mellon University) PIE (Pose, Illumination and Expression) and the FERET (FacE REcocnition Technology) face databases.

122 citations


Book
14 Jun 2002
TL;DR: Introduction linear filters for pattern recognition nonlinear filtering for image recognition distortion invariant pattern recognition systems image recognition based on statistical detection theory neural networks based automatic target recognition hyperspectral automatic object recognition laser radarautomatic target recognition radar signature recognition wavelets.
Abstract: Introduction linear filters for pattern recognition nonlinear filtering for image recognition distortion invariant pattern recognition systems image recognition based on statistical detection theory neural networks based automatic target recognition hyperspectral automatic object recognition laser radar automatic target recognition radar signature recognition wavelets for image recognition pattern recognition for anticounterfeiting and security systems applications of pattern recognition techniques to road sign recognition and tracking optical and optoelectronic implementation of linear and nonlinear filters.

115 citations


Proceedings Article
01 Jan 2002
TL;DR: It is demonstrated that the resulting object segmentation eliminates false positives for the part detection, while overcoming occlusion and weak contours for the low-level edge detection.
Abstract: Segmentation and recognition have long been treated as two separate processes We propose a mechanism based on spectral graph partitioning that readily combine the two processes into one A part-based recognition system detects object patches, supplies their partial segmentations as well as knowledge about the spatial configurations of the object The goal of patch grouping is to find a set of patches that conform best to the object configuration, while the goal of pixel grouping is to find a set of pixels that have the best low-level feature similarity Through pixel-patch interactions and between-pateh competition encoded in the solution space, these two processes are realized in one joint optimization problem The globally optima] partition is obtained by solving a constrained eigenvalue problem We demonstrate that the resulting object segmentation eliminates false positives for the part detection, while overcoming occlusion and weak contours for the low-level edge detection

105 citations


Journal ArticleDOI
TL;DR: A new method for 3D object recognition which uses segment-based stereo vision to identify objects in a cluttered environment and its position and orientation are determined accurately enabling a robot to pick up the object and manipulate it.
Abstract: We propose a new method for 3D object recognition which uses segment-based stereo vision. An object is identified in a cluttered environment and its position and orientation (6 dof) are determined accurately enabling a robot to pick up the object and manipulate it. The object can be of any shape (planar figures, polyhedra, free-form objects) and partially occluded by other objects. Segment-based stereo vision is employed for 3D sensing. Both CAD-based and sensor-based object modeling subsystems are available. Matching is performed by calculating candidates for the object position and orientation using local features, verifying each candidate, and improving the accuracy of the position and orientation by an iteration method. Several experimental results are presented to demonstrate the usefulness of the proposed method.

95 citations


Patent
12 Oct 2002
TL;DR: In this paper, a vision-based pointer tracking and object recognition system is presented, which includes a tracking system and a recognition system which are configured to accurately and efficiently track the fingertip of a user in order to allow the user to perform coarse segmentation of an object in preparation for object classification.
Abstract: A vision-based pointer tracking and object recognizing system is presented. The system comprises a computer system including a processor, a memory coupled with the processor, an input coupled with the processor for receiving a series of image frames from a camera, and an output coupled with the processor for outputting a path of a pointer. The computer system includes a tracking system and a recognition system which are configured to accurately and efficiently track the fingertip of a user in order to allow the user to perform coarse segmentation of an object in preparation for object classification. The tracking system includes a Robust Tracking Filter that accommodates uncertainties. The object recognition system incorporates an invariant object recognition technique that allows objects to be recognized despite changes in user viewpoint.

84 citations


Proceedings ArticleDOI
M. Boukraa1, S. Ando1
08 Jul 2002
TL;DR: A machine vision system that uses a Radio Frequency (RF) Tag device to identify objects prior to locating them visually and how the concept of integrated tag based systems can provide new insights in image processing and machine vision is described.
Abstract: In image sensing and processing, ambiguities arise when only one source of information is used. Thus, 3D object recognition and localization is a difficult task when using intensity image as single input. This paper presents a machine vision system that uses a Radio Frequency (RF) Tag device to identify objects prior to locating them visually. The tag system consists of a tag reader that can interrogate, and receive radio signals from, tags attached on objects and characterizing them. Laying the basis of an object model database shared on a network, we perform a knowledge-based recognition task where the information retrieved from the database query serves as a prior knowledge. The recognition algorithm used is a matching with projective invariants. We describe how this system can be used for efficient object registration and how the concept of integrated tag based systems can provide new insights in image processing and machine vision.

75 citations


Book ChapterDOI
22 Nov 2002
TL;DR: This work investigates the performance of the HMAX model of object recognition in cortex on the task of face detection using natural images and shows how visual features of intermediate complexity can be learned in HMAX using a simple learning rule.
Abstract: Models of object recognition in cortex have so far been mostly applied to tasks involving the recognition of isolated objects presented on blank backgrounds. However, ultimately models of the visual system have to prove themselves in real world object recognition tasks. Here we took a first step in this direction: We investigated the performance of the HMAX model of object recognition in cortex recently presented by Riesenhuber & Poggio [1,2] on the task of face detection using natural images. We found that the standard version of HMAX performs rather poorly on this task, due to the low specificity of the hardwired feature set of C2 units in the model (corresponding to neurons in intermediate visual area V4) that do not show any particular tuning for faces vs. background. We show how visual features of intermediate complexity can be learned in HMAX using a simple learning rule. Using this rule, HMAX outperforms a classical machine vision face detection system presented in the literature. This suggests an important role for the set of features in intermediate visual areas in object recognition.

Patent
Young-Su Moon1, Chang-Yeong Kim1
27 Dec 2002
TL;DR: In this article, a method and apparatus for tracking a color-based object in video sequences is presented, where an initial object area in one frame of video sequences desired to be tracked is assigned, and an initial effective window containing the initial object areas is assigned.
Abstract: A method and apparatus for tracking a color-based object in video sequences are provided. According to the method, an initial object area in one frame of video sequences desired to be tracked is assigned, and an initial object effective window containing the initial object area is assigned. A frame following the frame containing the assigned initial object area is received as a newly input image, and an object search window containing the initial object area for tracking and the initial object effective window in the newly input image is assigned. Then, the model histogram of the initial object area corresponding to a predetermined bin resolution value and the input histogram of the image in the object search window are calculated. From the calculated object probability image, using a predetermined method, a new object area to which the initial object area moved is obtained in the next frame in which the initial object area of the frame desired to be tracked is given as a previous (tracked) object area. By doing so, the object in video sequences is tracked. Accordingly, using the continuously extracted video object region information, an object-based interactive-type additional information service function in movies, TV programs, and CFs can be implemented effectively.

Patent
30 Aug 2002
TL;DR: In this article, the authors proposed a hierarchical model for the recognition of objects in an image, where the objects may consist of an arbitrary number of parts that are allowed to move with respect to each other.
Abstract: The present invention provides a method for the recognition of objects in an image, where the objects may consist of an arbitrary number of parts that are allowed to move with respect to each other. In the offline phase the invention automatically learns the relative movements of the single object parts from a sequence of example images and builds a hierarchical model that incorporates a description of the single object parts, the relations between the parts, and an efficient search strategy. This is done by analyzing the pose variations (e.g., variations in position, orientation, and scale) of the single object parts in the example images. The poses can be obtained by an arbitrary similarity measure for object recognition, e.g., normalized cross correlation, Hausdorff distance, generalized Hough transform, the modification of the generalized Hough transform, or the similarity measure. In the online phase the invention uses the hierarchical model to efficiently find the entire object in the search image. During the online phase only valid instances of the object are found, i.e., the object parts are not searched for in the entire image but only in a restricted portion of parameter space that is defined by the relations between the object parts within the hierarchical model, what facilitates an efficient search and makes a subsequent validation step unnecessary.

Journal ArticleDOI
TL;DR: This paper describes and analyzes techniques which have been developed for object representation and recognition and proposes a set of specifications, which all object recognition systems should strive to meet.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: A statistical transition graph model is used to enhance recognition of domain-specific characters, such as ball counts and game score of baseball videos, and real-time speed and significantly improved recognition accuracy are achieved.
Abstract: We have developed generic and domain-specific video algorithms for caption text extraction and recognition in digital video. Our system includes several unique features: for caption box location, we combine the compressed-domain features derived from DCT coefficients and motion vectors. Long-term temporal consistency is employed to enhance localization performance. For character segmentation, we use a single-pass threshold free approach combining classification and projection to address noisy segmentation, text intensity variation, and algorithm complexity. In recognition, we use Zernike moments to achieve more accurate recognition performance. Finally, domain knowledge is explored and a statistical transition graph model is used to enhance recognition of domain-specific characters, such as ball counts and game score of baseball videos. The algorithms achieved real-time speed and significantly improved recognition accuracy. Furthermore, although the experiments were conducted in baseball videos only, these algorithms (except the transition model) are general and can be used in other applications, such as news and films.

Journal ArticleDOI
TL;DR: In this paper, in the context of Principal Component Analysis for face recognition, the concept of self-eigenfaces is introduced and the color information is also incorporated in the face recognition stage.

Proceedings ArticleDOI
13 May 2002
TL;DR: Improvements are proposed which increase the face detection and recognition rate and the overall performance of the system.
Abstract: The objective of this work is the integration and optimization of an automatic face detection and recognition system for video indexing applications. The system is composed of a face detection stage presented previously which provides good results maintaining a low computational cost. The recognition stage is based on the Principal Components Analysis (PCA) approach which has been modified to cope with the video indexing application. After the integration of the two stages, several improvements are proposed which increase the face detection and recognition rate and the overall performance of the system. Good results have been obtained using the MPEG-7 video content set used in the MPEG-7 evaluation group.

Patent
27 Feb 2002
TL;DR: In this paper, a control system for an automotive vehicle coupled to a countermeasure system having a counter-measure includes an object sensor system (18) generating an object signal, an object distance signal, object azimuth position signal, and object relative velocity signal.
Abstract: A control system ( 10 ) for an automotive vehicle ( 50 ) coupled to a countermeasure system having a countermeasure includes an object sensor system ( 18 ) generating an object signal, an object distance signal, an object azimuth position signal, and object relative velocity signal. The control system ( 10 ) further includes an object classifier coupled to the object sensor system ( 18 ) generating an object classification signal in response to the object signal and a controller coupled to the object sensor object classifier for activating the countermeasure ( 42 ) in response to the object distance, object azimuth position, relative velocity and the object classification signal.

Journal ArticleDOI
TL;DR: Investigation of the role of extraretinal information on real-world object recognition found observers performed better in an old/new object recognition task when view changes were causing by viewer movement than when they were caused by object rotation.
Abstract: Many previous studies of object recognition have found view-dependent recognition performance when view changes are produced by rotating objects relative to a stationary viewing position. However, the assumption that an object rotation is equivalent to an observer viewpoint change ignores the potential contribution of extraretinal information that accompanies observer movement. In four experiments, we investigated the role of extraretinal information on real-world object recognition. As in previous studies focusing on the recognition of spatial layouts across view changes, observers performed better in an old/new object recognition task when view changes were caused by viewer movement than when they were caused by object rotation. This difference between viewpoint and orientation changes was due not to the visual background, but to the extraretinal information available during real observer movements. Models of object recognition need to consider other information available to an observer in addition to the retinal projection in order to fully understand object recognition in the real world.


Proceedings ArticleDOI
01 Feb 2002
TL;DR: A memory assistance tool is developed that uses this approach to help people with slight to moderate memory loss keep track of important objects around the house and is currently deployed in a prototype smart home.
Abstract: Tracking is frequently considered a frame-to-frame operation. As such, object recognition techniques are generally too slow to be used for tracking. There are domains, however where the objects of interest do not move most of the time. In these domains, it is possible to watch for activity in the scene and then apply object recognition techniques to find the object's new location. This makes tracking a discrete process of watching for object disappearances and reappearances. We have developed a memory assistance tool that uses this approach to help people with slight to moderate memory loss keep track of important objects around the house. The system is currently deployed in a prototype smart home.

Proceedings ArticleDOI
01 Jan 2002
TL;DR: This work frames the problem of object recognition from edge cues in terms of determining whether individual edge pixels belong to the target object or to clutter, based on the configuration of edges in their vicinity, and applies a cascade of classifiers to the image to save computation and solve the aperture problem.
Abstract: We frame the problem of object recognition from edge cues in terms of determining whether individual edge pixels belong to the target object or to clutter, based on the configuration of edges in their vicinity A classifier solves this problem by computing sparse, localized edge features at image locations determined at training time In order to save computation and solve the aperture problem, we apply a cascade of these classifiers to the image, each of which computes edge features over larger image regions than its predecessors Experiments apply this approach to the recognition of real objects with holes and wiry components in cluttered scenes under arbitrary out-of-image-plane rotation 1

01 Jan 2002
TL;DR: A face detection algorithm for color images in the presence of various lighting conditions as well as complex backgrounds is developed and successful detection of faces with different sizes, color, position, scale, orientation, 3D pose, and expression in several photo collections is demonstrated.
Abstract: : Face recognition has received substantial attention from researchers in biometrics, computer vision, pattern recognition, and cognitive psychology communities because of the increased attention being devoted to security, man-machine communication, content-based image retrieval, and image/video coding. We have proposed two automated recognition paradigms to advance face recognition technology. Three major tasks involved in face recognition systems are: (i) face detection, (ii) face modeling, and (iii) face matching. We have developed a face detection algorithm for color images in the presence of various lighting conditions as well as complex backgrounds. Our detection method first corrects the color bias by a lighting compensation technique that automatically estimates the parameters of reference white for color correction. We overcame the difficulty of detecting the low-luma and high-luma skin tones by applying a nonlinear transformation to the Y CbCr color space. Our method generates face candidates based on the spatial arrangement of detected skin patches. We constructed eye, mouth, and face boundary maps to verify each face candidate. Experimental results demonstrate successful detection of faces with different sizes, color, position, scale, orientation, 3D pose, and expression in several photo collections. 3D human face models augment the appearance-based face recognition approaches to assist face recognition under the illumination and head pose variations. For the two proposed recognition paradigms, we have designed two methods for modeling human faces based on (i) a generic 3D face model and an individual's facial measurements of shape and texture captured in the frontal view, and (ii) alignment of a semantic face graph, derived from a generic 3D face model, onto a frontal face image.

Proceedings ArticleDOI
28 Oct 2002
TL;DR: The result shows that Gabor filter responses give good feature representation in a car recognition system using a camera as sensor to recognize a moving car.
Abstract: This paper describes a car recognition system using a camera as sensor to recognize a moving car. There are four main stages in this process: object detection, object segmentation, feature extraction using Gabor filters and Gabor jet matching to the car database. The experiment was conducted for various types of car with various illuminations (daylight and night). The result shows that Gabor filter responses give good feature representation. The system achieved an average recognition rate of 93.88%.

Journal ArticleDOI
TL;DR: The ability to use FSTs constructed from images rendered from computer-aided design models to recognize real objects in real images is demonstrated and test results for a set of metal machined parts are presented.
Abstract: We advance new active object recognition algorithms that classify rigid objects and estimate their pose from intensity images. Our algorithms automatically detect if the class or pose of an object is ambiguous in a given image, reposition the sensor as needed, and incorporate data from multiple object views in determining the final object class and pose estimate. A probabilistic feature space trajectory (FST) in a global eigenspace is used to represent 3D distorted views of an object and to estimate the class and pose of an input object. Confidence measures for the class and pose estimates, derived using the probabilistic FST object representation, determine when additional observations are required as well as where the sensor should be positioned to provide the most useful information. We demonstrate the ability to use FSTs constructed from images rendered from computer-aided design models to recognize real objects in real images and present test results for a set of metal machined parts.

Journal ArticleDOI
TL;DR: A new algorithm for object detection on a single static image from an image sequence using prediction procedures based on the Kalman filter, which is much more compact and accurate than with the 2D algorithm only.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: An algebraic constraint for planar shape recognition across multiple views is derived based on the rank of a matrix of Fourier domain descriptor coefficients of the shape in different views using the phase of a measure for recognition.
Abstract: Multiview studies in computer vision have concentrated on the constraints satisfied by individual primitives such as points and lines. Not much attention has been paid to the properties of a collection of primitives in multiple views, which could be studied in the spatial domain or in an appropriate transform domain. We derive an algebraic constraint for planar shape recognition across multiple views based on the rank of a matrix of Fourier domain descriptor coefficients of the shape in different views. We also show how correspondence between points on the boundary can be computed for matching shapes using the phase of a measure for recognition.

Journal ArticleDOI
TL;DR: A new Bayesian framework for partially occluded object recognition based on matching extracted local features on a one-to-one basis with object features to compute the generalized likelihoods under two different statistical models for occlusion.
Abstract: In this paper, we present a new Bayesian framework for partially occluded object recognition based on matching extracted local features on a one-to-one basis with object features. We introduce two different statistical models for occlusion: one model assumes that each feature in the model can be occluded independent of whether any other features are occluded, whereas the second model uses spatially correlated occlusion to represent the extent of occlusion. Using these models, the object recognition problem reduces to finding the object hypothesis with largest generalized likelihood. We develop fast algorithms for finding the optimal one-to-one correspondence between scene features and object features to compute the generalized likelihoods under both models. We conduct experiments illustrating the differences between the two occlusion models using different quantitative metrics. We also evaluate the recognition performance of our algorithms using examples extracted from object silhouettes and synthetic aperture radar imagery, and illustrate the performance advantages of our approach over alternative algorithms.

Journal ArticleDOI
J. Park1
TL;DR: An adaptive handwritten word recognition method based on interaction between flexible character classification and deductive decision making is presented and the experimental result shows that the proposed method has advantages in producing valid answers using the same number of features as conventional methods.
Abstract: An adaptive handwritten word recognition method is presented. A recursive architecture based on interaction between flexible character classification and deductive decision making is developed. The recognition process starts from the initial coarse level using a minimum number of features, then increases the discrimination power by adding other features adaptively and recursively until the result is accepted by the decision maker. For the computational aspect of a feasible solution, a unified decision metric, recognition confidence; is derived from two measurements: pattern confidence, evaluation of absolute confidence using shape features, and lexical confidence, evaluation of the relative string dissimilarity in the lexicon. Practical implementation and experimental results in reading the handwritten words of the address components of US mail pieces are provided. Up to a 4 percent improvement in recognition performance is achieved compared to a nonadaptive method. The experimental result shows that the proposed method has advantages in producing valid answers using the same number of features as conventional methods.

01 Jan 2002
TL;DR: This paper presents, in this paper, a hybrid, knowledge – based approach for object recognition in video sequences, which is synopsised as follows: first, moving regions are extracted using an active contour technique, and second, visual descriptions of the moving areas are extracted and are matched with the ones defined for recognizable objects.
Abstract: Intelligent video analysis is a problem of great importance for applications such as surveillance and automatic annotation. We present, in this paper, a hybrid, knowledge – based approach for object recognition in video sequences. Objects are modelled, in the signal level, through the visual descriptors defined by MPEG-7, the ISO standard for description of audiovisual content and in the semantic level, through the semantic relations defined by MPEG7. The method of video analysis is synopsised as follows: first, moving regions are extracted using an active contour technique. Second, visual descriptions of the moving regions are extracted and are matched with the ones defined for recognizable objects.