scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2003"


Proceedings ArticleDOI
13 Oct 2003
TL;DR: A low-dimensional global image representation is presented that provides relevant information for place recognition and categorization, and it is shown how such contextual information introduces strong priors that simplify object recognition.
Abstract: While navigating in an environment, a vision system has to be able to recognize where it is and what the main objects in the scene are. We present a context-based vision system for place and object recognition. The goal is to identify familiar locations (e.g., office 610, conference room 941, main street), to categorize new environments (office, corridor, street) and to use that information to provide contextual priors for object recognition (e.g., tables are more likely in an office than a street). We present a low-dimensional global image representation that provides relevant information for place recognition and categorization, and show how such contextual information introduces strong priors that simplify object recognition. We have trained the system to recognize over 60 locations (indoors and outdoors) and to suggest the presence and locations of more than 20 different object types. The algorithm has been integrated into a mobile system that provides realtime feedback to the user.

1,028 citations


Journal ArticleDOI
TL;DR: A simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.
Abstract: There is general consensus that context can be a rich source of information about an object's identity, location and scale. In fact, the structure of many real-world scenes is governed by strong configurational rules akin to those that apply to a single object. Here we introduce a simple framework for modeling the relationship between context and object properties based on the correlation between the statistics of low-level features across the entire scene and the objects that it contains. The resulting scheme serves as an effective procedure for object priming, context driven focus of attention and automatic scale-selection on real-world scenes.

931 citations


Proceedings ArticleDOI
13 Oct 2003
TL;DR: The evaluation shows that local invariant descriptors are an appropriate representation for object classes such as cars, and it underlines the importance of feature selection.
Abstract: We introduce a novel method for constructing and selecting scale-invariant object parts. Scale-invariant local descriptors are first grouped into basic parts. A classifier is then learned for each of these parts, and feature selection is used to determine the most discriminative ones. This approach allows robust pan detection, and it is invariant under scale changes-that is, neither the training images nor the test images have to be normalized. The proposed method is evaluated in car detection tasks with significant variations in viewing conditions, and promising results are demonstrated. Different local regions, classifiers and feature selection methods are quantitatively compared. Our evaluation shows that local invariant descriptors are an appropriate representation for object classes such as cars, and it underlines the importance of feature selection.

330 citations


Patent
22 Aug 2003
TL;DR: In this article, a video detection and monitoring method and apparatus utilizes an application-specific object based segmentation and recognition system for locating and tracking an object of interest within a number of sequential frames of data collected by a video camera or similar device.
Abstract: A video detection and monitoring method and apparatus utilizes an application-specific object based segmentation and recognition system for locating and tracking an object of interest within a number of sequential frames of data collected by a video camera or similar device. One embodiment includes a background modeling and object segmentation module to isolate from a current frame at least one segment of the current frame containing a possible object of interest, and a classification module adapted to determine whether or not any segment of the output from the background modeling apparatus includes an object of interest and to characterize any such segment as an object segment. An object segment tracking apparatus is adapted to track the location within a current frame of any object segment and to determine a projected location of the object segment in a subsequent frame.

235 citations


Proceedings ArticleDOI
08 Sep 2003
TL;DR: An approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions is described and a new edge-based local feature detector that is invariant to similarity transformations is introduced.
Abstract: In this paper we describe an approach to recognizing poorly textured objects, that may contain holes and tubular parts, in cluttered scenes under arbitrary viewing conditions. To this end we develop a number of novel components. First, we introduce a new edge-based local feature detector that is invariant to similarity transformations. The features are localized on edges and a neighbourhood is estimated in a scale invariant manner. Second, the neighbourhood descriptor computed for foreground features is not affected by background clutter, even if the feature is on an object boundary. Third, the descriptor generalizes Lowe's SIFT method to edges. An object model is learnt from a single training image. The object is then recognized in new images in a series of steps which apply progressively tighter geometric restrictions. A final contribution of this work is to allow sufficient flexibility in the geometric representation that objects in the same visual class can be recognized. Results are demonstrated for various object classes including bikes and rackets.

234 citations


Book ChapterDOI
01 Jan 2003
TL;DR: A novel method for the categorization of unfamiliar objects in difficult real-world scenes is presented, which uses a probabilistic formulation to incorporate knowledge about the recognized category as well as the supporting information in the image to segment the object from the background.
Abstract: Historically, figure-ground segmentation has been seen as an important and even necessary precursor for object recognition In that context, segmentation is mostly defined as a data driven, that is bottom-up, process As for humans object recognition and segmentation are heavily intertwined processes, it has been argued that top-down knowledge from object recognition can and should be used for guiding the segmentation process In this paper, we present a method for the categorization of unfamiliar objects in difficult real-world scenes The method generates object hypotheses without prior segmentation that can be used to obtain a category-specific figure-ground segmentation In particular, the proposed approach uses a probabilistic formulation to incorporate knowledge about the recognized category as well as the supporting information in the image to segment the object from the background This segmentation can then be used for hypothesis verification, to further improve recognition performance Experimental results show the capacity of the approach to categorize and segment object categories as diverse as cars and cows

232 citations


Journal ArticleDOI
Heiko Wersing1, Edgar Körner1
TL;DR: This work proposes a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network.
Abstract: There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.

198 citations


Proceedings ArticleDOI
18 Jun 2003
TL;DR: Multi-view constraints associated with groups of patches are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true three-dimensional affine and Euclidean models from multiple images and their recognition in a single photograph taken from an arbitrary viewpoint.
Abstract: This paper presents a representation for three-dimensional objects in terms of affine-invariant image patches and their spatial relationships. Multi-view constraints associated with groups of patches are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true three-dimensional affine and Euclidean models from multiple images and their recognition in a single photograph taken from an arbitrary viewpoint. The proposed approach does not require a separate segmentation stage and is applicable to cluttered scenes. Preliminary modeling and recognition results are presented.

168 citations


Journal ArticleDOI
TL;DR: Evidence indicates that the task demands and learning that arise from different forms of feedback determine which computational routines are recruited automatically in object recognition.

145 citations


Patent
15 Jan 2003
TL;DR: In this article, a system and method for iris recognition including stereoscopic face recognition, which can recognize irises including stereo face recognition system in order to recognize an authenticatee, is presented.
Abstract: Disclosed herein is a system and method for iris recognition including stereoscopic face recognition, which can recognize irises including stereoscopic face recognition system in order to recognize an authenticatee. The system includes two or more face recognition cameras for photographing two or more face images of an authenticatee; a recognition system for receiving the face images photographed by the face recognition cameras from the face recognition cameras and creating stereoscopic face information on the basis of the face images; and one or more iris recognition cameras controlled by the recognition system to photograph focused irises of the authenticatee using the created stereoscopic face information.

130 citations


Proceedings ArticleDOI
16 Jul 2003
TL;DR: By using a large facial image database called CMU PIE database, a probabilistic model of how facial features change as the pose changes is developed, which achieves a better recognition rate than conventional face recognition methods over a much larger range of pose.
Abstract: Current automatic facial recognition systems are not robust against changes in illumination, pose, facial expression and occlusion. In this paper, we propose an algorithm based on a probabilistic approach for face recognition to address the problem of pose change by a probabilistic approach that takes into account the pose difference between probe and gallery images. By using a large facial image database called CMU PIE database, which contains images of the same set of people taken from many different angles, we have developed a probabilistic model of how facial features change as the pose changes. This model enables us to make our face recognition system more robust to the change of poses in the probe image. The experimental results show that this approach achieves a better recognition rate than conventional face recognition methods over a much larger range of pose. For example, when the gallery contains only images of a frontal face and the probe image varies its pose orientation, the recognition rate remains within a less than 10% difference until the probe pose begins to differ more than 45 degrees, whereas the recognition rate of a PCA-based method begins to drop at a difference as small as 10 degrees, and a representative commercial system at 30 degrees.

Proceedings ArticleDOI
03 Feb 2003
TL;DR: The object recognition and tracking method as well as the correlative generation process for the needed data are developed within the AR-PDA project.
Abstract: In this paper we describe an image based object recognition and tracking method for mobile AR-devices and the correlative process to generate the required data. The object recognition and tracking base on the 3D-geometries of the related objects. Correspondings between live camera images and 3D-models are generated and used to determine the location and orientation of objects in the current scene. The required data for the object recognition is generated from common 3D-CAD-files using a dedicated process model.The object recognition and tracking method as well as the correlative generation process for the needed data are developed within the AR-PDA project. The AR-PDA is a personal digital assistant (e.g. PDA or 3rd generation mobile phone with an integrated camera), which uses AR technology to efficiently support consumers and service forces during their daily tasks.

Proceedings ArticleDOI
27 Oct 2003
TL;DR: The segmentation algorithm is shown to produce results consistent enough to support autonomous collection of datasets for object recognition, which enables often-encountered objects to be segmented without the need for further poking.
Abstract: How a robot should grasp an object depends on its size and shape. Such parameters can be estimated visually, but this is fallible, particularly for unrecognized, unfamiliar objects. Failure will result in a clumsy grasp or glancing blow against the object. If the robot does not learn something from the encounter, then it will be apt to repeat the same mistake again and again. This paper shows how to recover information about an object's extent by poking it, either accidentally or deliberately. Poking an object makes it move, and motion is a powerful cue for visual segmentation. The periods immediately before and after the moment of impact turn out to be particularly informative, and give visual evidence for the boundary of the object that is well suited to segmentation using graph cuts. The segmentation algorithm is shown to produce results consistent enough to support autonomous collection of datasets for object recognition, which enables often-encountered objects to be segmented without the need for further poking.

01 Jan 2003
TL;DR: In this paper, a unified framework for view-based 3D object recognition using shock graphs is proposed, which is based on an improved spectral characterization of the shock graph structure that not only drives a powerful indexing mechanism, but also drives a matching algorithm that can accommodate noise and occlusion.
Abstract: The shock graph is an emerging shape representation for object recognition, in which a 2-D silhouette is decomposed into a set of qualitative parts, captured in a directed acyclic graph. Although a number of approaches have been proposed for shock graph matching, these approaches do not address the equally important indexing problem. We extend our previous work in both shock graph matching and hierarchical structure indexing to propose the first unified framework for view-based 3-D object recognition using shock graphs. The heart of the framework is an improved spectral characterization of shock graph structure that not only drives a powerful indexing mechanism (to retrieve similar candidates from a large database), but also drives a matching algorithm that can accommodate noise and occlusion. We describe the components of our system and evaluate its performance using both unoccluded and occluded queries. The large set of recognition trials (over 25,000) from a large database (over 1400 views) represents one of the most ambitious shock graph-based recognition experiments conducted to date.

Journal ArticleDOI
TL;DR: A shape space based approach for invariant object representation and recognition that is invariant to similarity transformations and is relatively insensitive to noise and occlusion, potentially used for 3D object recognition.

Patent
01 Apr 2003
TL;DR: In this article, the authors describe methods and systems for high-speed observation and recognition of an object which is within or passes through a designated area using 3D image data and tomography, stereo-photogrammetry, range finding and/or structured illumination.
Abstract: Methods and systems are described herein for high-speed observation and recognition of an object which is within or passes through a designated area using 3D image data and tomography, stereo-photogrammetry, range finding and/or structured illumination. Such a security system may comprise multiple pairs of 3D sensors surrounding the designated area such as a portal or doorway. Alternatively, it may comprise a 3D sensor which is capable of acquiring 3D image data of an object that is situated in front of it. Using at least one 3D sensor and a 3D data collection technique such as structured illumination and/or stereo-photogrammetry, 3D image data of the object is acquired. 3D image data representing the object's surface may be determined based on the acquired data and may be compared to data representing known objects to identify the object. Methods for fast processing of 3D data and recognition of objects are also described.

Patent
Mihoko Kunii1, Kenji Nagao1
05 Sep 2003
TL;DR: In this paper, an image including an object to be learned is entered, and it is divided into partial images, and the similarity measure of the both is calculated by using the data of the stored feature and the feature of the object of recognition.
Abstract: According to a disclosed method, an image is learned beforehand, and an image of an object to be recognized is entered, then this object is recognized. An image including an object to be learned is entered, and it is divided into partial images. Further classifying into plural classes, a matrix for feature extraction is calculated in each class. A feature is calculated by using this matrix for feature extraction, and stored. Consequently, an image including an object to be recognized is entered, and it is divided into partial images. From the partial images, the feature of the object of recognition is calculated by using the obtained matrix for feature extraction, and the similarity measure of the both is calculated by using the data of the stored feature and the feature of the object of recognition, and the object is recognized.

Patent
11 Jan 2003
TL;DR: In this paper, an object in a video sequence of frames is tracked by object masks generated for frames in the sequence, and frames are skipped at larger intervals, but more frequent frames are processed when high motion occurs.
Abstract: An object in a video sequence of frames is tracked by object masks generated for frames in the sequence. Macroblocks are motion compensated. Blocks matching entirely within a prior-frame object mask are used to generate an average object motion. When the average motion is below a motion threshold, frames are skipped at larger intervals, but more frequent frames are processed when high motion occurs. When the macroblock best matches a prior-frame block that has the object's boundary passing through the block, the macroblock is uncertain and is sub-divided into smaller sub-blocks that are again motion compensated. Sub-blocks matching blocks within the object mask in the base frame are added to the new object mask for the current frame while sub-blocks matching a block containing the object boundary are uncertain and can again be sub-divided to further refine the object boundary. Frame skipping and adaptive-size blocks on the object boundary reduce computational load.

Proceedings ArticleDOI
18 Jun 2003
TL;DR: This study has conducted experiments using the Yale Face Database B and confirmed that a combination of the photometric alignment and RANSAC provides a simple but effective method for object recognition under varying illumination conditions.
Abstract: For object recognition under varying illumination conditions, we propose a method based on photometric alignment. The photometric alignment is known as a technique that models both diffuse reflection components and attached shadows under a distant point light source by using three basis images. However, in order to reliably reproduce these components in a test image, we have to take into account outliers such as specular reflection components and shadows in the test image. Accordingly, our proposed method utilizes Random Sample Consensus (RANSAC), which has been used successfully for estimating basis images. In the present study, we have conducted experiments using the Yale Face Database B and confirmed that a combination of the photometric alignment and RANSAC provides a simple but effective method for object recognition under varying illumination conditions.

Patent
Jian Wang1, Jian-Lai Zhou1, Jiang Wu1, Hongyun Yang1, Xianfang Wang1, Wenli Zhu1 
15 Dec 2003
TL;DR: In this paper, a list of machine-generated objects based on the electronic ink input is generated and the list including the first machine generated object and alternative machine generated objects is used as a dictionary for converting the speech input.
Abstract: Systems, methods, and computer-readable media for processing electronic ink receive an electronic ink input; convert the electronic ink input to a first machine-generated object using handwriting recognition; display the first machine-generated object on a display; receive speech input; convert the speech input to a second machine-generated object using speech recognition; generate a list of machine-generated objects based on the electronic ink input, the list including the first machine-generated object and alternative machine-generated objects and functioning as a dictionary for converting the speech input; and replace the first machine-generated object with the second machine-generated object. The machine-generated objects may correspond to words, lines, and/or other groupings of machine-generated text. A user may confirm that the second machine-generated object should replace the first machine-generated object and the system will perform the replacement. The systems and methods may generate a list of alternative machine-generated object candidates to the first machine-generated object based on handwriting recognition of the electronic ink input alone or in combination with a statistical language model.

Journal ArticleDOI
TL;DR: It is demonstrated that dynamic information can play a role in visual object recognition and suggest that object representations can encode spatiotemporal information.
Abstract: Although both the object and the observer often move in natural environments, the effect of motion on visual object recognition has not been well documented. The authors examined the effect of a reversal in the direction of rotation on both explicit and implicit memory for novel, 3-dimensional objects. Participants viewed a series of continuously rotating objects and later made either an old-new recognition judgment or a symmetric-asymmetric decision. For both tasks, memory for rotating objects was impaired when the direction of rotation was reversed at test. These results demonstrate that dynamic information can play a role in visual object recognition and suggest that object representations can encode spatio- temporal information. An enduring problem in visual cognition addresses how objects are represented internally for purposes of recognition. The appar- ent ease with which people recognize objects in everyday situa- tions conceals the immense complexity of the underlying mecha- nisms. Despite numerous studies of object recognition, researchers are still far from a complete understanding of the recognition processes and the underlying representations (for a recent review, see Tarr & Bulthoff, 1998). Behavioral studies of object recognition have for the most part used static visual stimuli. This tendency is echoed by noting that theories of object recognition are generally accounts of how static objects are recognized. In the real world, however, both the object and the observer often move about in the environment. Once this movement is noted, it then becomes natural to ask, "What are the effects of motion on object recognition?" The lack of investigation of this question is perhaps caused by a belief that shape is an intrinsic attribute of an object, but motion is an external attribute. This line of reasoning suggests that object recognition should not be affected by an object's motion. This belief is also supported by physiological evidence that demonstrates functional specialization in the visual system. For example, under one interpretation, the ventral stream processes shape information, whereas the dorsal stream processes spatial information, including motion (i.e., the what vs. where distinction; Ungerleider & Mishkin, 1982). Nonetheless, the perceptual literature has provided some exam- ples in which motion is important for the perception of shape. One such example is the kinetic depth effect (Wallach & O'Connell, 1953), in which motion makes the structure of the object (e.g., a wireframe cube) available to observers. A similar effect is found for biological motion, in which motion makes the perception of biological forms possible, even without any shape information (Johansson, 1973). Perhaps the most forceful argument for the importance of motion in perception comes from Gibson (1979). The theory of ecological optics proposes that visual information resides in the optic array and that invariants are extracted from dynamic optic arrays. According to Gibson, motion is paramount, for "invariants of structure do not exist except in relation to variants" (p. 87); that is, invariants are meaningful only when there is change in the optic array, and hence motion. Aside from its role in the perception of structure, motion has been shown to play a role in other visual functions. For example, Kozlowski and Cutting (1977) filmed a walking person in the dark with point-lights attached to major joints. When shown such a film, observers can reliably identify the gender of the walker on the basis of pure motion signals in the display. The direction of motion can also bias the identification of ambiguous figures (Bernstein & Cooper, 1997). Finally, research in representational momentum has clearly demonstrated that motion can bias memory for an object's spatial location (Freyd, 1987).


Proceedings ArticleDOI
03 Dec 2003
TL;DR: The proposed recognition scheme is based on the CCHs used in a classical learning framework that facilitates a "winner-takes-all" strategy across different scales, and is robustness and invariance towards scaling and translations.
Abstract: Robust techniques for object recognition, image segmentation and pose elimination are essential for robotic manipulation and grasping. We present a novel approach for object recognition and pose estimation based on color cooccurrence histograms (CCHs). Consequently, two problems addressed in this paper are: i) robust recognition and segmentation of the object in the scene, and ii) object's pose estimation using an appearance based approach. The proposed recognition scheme is based on the CCHs used in a classical learning framework that facilitates a "winner-takes-all" strategy across different scales. The detected "window of attention" is compared with training images of the object for which the pose is known. The orientation of the object is estimated as the weighted average among competitive poses, in which the weight increases proportional to the degree of matching between the training and the segmented image histograms. The major advantages of the proposed two-step appearance based method are its robustness and invariance towards scaling and translations. The method is also computationally efficient since both recognition and pose estimation rely on the same representation of the object.

Patent
07 May 2003
TL;DR: In this article, a system of using three-dimensional information as a front and for a two-dimensional image comparison system is presented, where the threedimensional information is obtained that is indicated of a known user's face.
Abstract: A system of using three-dimensional information as a front and for a two-dimensional image comparison system. The three-dimensional information is obtained that is indicated of a known user's face. This three-dimensional information is used to generate two-dimensional views from different perspectives, including different poses and/or different lighting effects, and used to populate a database of a two-dimensional recognition system. The images are then two-dimensionally recognized using conventional two-dimensional recognition techniques, but this two-dimensional recognition is carried out on an improved database.

Proceedings ArticleDOI
Aishy Amer1
07 May 2003
TL;DR: A stable object tracking method based on both object segmentation and motion estimation based on a non-linear voting strategy and the monitoring and correction of segmentation errors is proposed for content-oriented applications.
Abstract: In the context of content-oriented applications such as video surveillance and video retrieval this paper proposes a stable object tracking method based on both object segmentation and motion estimation. The method focuses on the issues of speed of execution and reliability in the presence of noise, coding artifacts, shadows, occlusion, and object split. Objects are tracked based on the similarity of their features in successive images. This is done in three steps: object segmentation and motion estimation, object matching, and feature monitoring and correction. In the first step, objects are segmented and their spatial and temporal features are computed. In the second step, using a non-linear voting strategy, each object of the previous image is matched with an object of the current image creating a unique correspondence. In the third step, object segmentation errors, such as when objects occlude or split, are detected and corrected. These new data are then used to update the results of previous steps, i.e., object segmentation and motion estimation. The contributions in this paper are the multi-voting strategy and the monitoring and correction of segmentation errors. Extensive experiments on indoor and outdoor video shots containing over 6000 images, including images with multi-object occlusion, noise, and coding artifacts have demonstrated the reliability and real-time response of the proposed method.

Proceedings ArticleDOI
15 Dec 2003
TL;DR: A framework for pose-invariant face recognition using the pose alignment method, which has the better performance and it increases the recognition rate statistically by 17.75% under the pose that rotated within 30 degree.
Abstract: A framework for pose-invariant face recognition using the pose alignment method is described in this paper. The main idea is to normalize the face view in depth to frontal view as the input of face recognition framework. Concretely, an inputted face image is first normalized using the irises information, and then the pose subspace algorithm is employed to perform the pose estimation. To model the pose-invariance, the face region is divided into three rectangles with different mapping parameters in this pose alignment algorithm. So the affine transformation parameters associated with the different poses can be used to align the input pose image to frontal view. To evaluate this algorithm objectively, the views after the pose alignment are incorporated into the frontal face recognition system. Experimental results show that it has the better performance and it increases the recognition rate statistically by 17.75% under the pose that rotated within 30 degree.

Book ChapterDOI
TL;DR: This paper uses an eigentransformation based hallucination method to improve the image resolution of face recognition using different image resolutions, and finds a low resolution bound for automatic face recognition.
Abstract: In video surveillance, the faces of interest are often of small size. Image resolution is an important factor affecting face recognition by human and computer. In this paper, we study the face recognition performance using different image resolutions. For automatic face recognition, a low resolution bound is found through experiments. We use an eigentransformation based hallucination method to improve the image resolution. The hallucinated face images are not only much helpful for recognition by human, but also make the automatic recognition procedure easier, since they emphasize the face difference by adding some high frequency details.

Patent
20 Mar 2003
TL;DR: In this article, the tracking system of the robot device comprises hierarchical recognition parts 15 having different image recognition levels consisting of a complexion recognition part 11, a face recognition part 12, an individual recognition part 13, and a voice direction recognition part 14 for recognizing the direction of voice, etc.
Abstract: PROBLEM TO BE SOLVED: To enable a robot to perform robust tracking of an object against the unstable recognition of the object after suspension of the tracking by the robot performing an interrupting action or caused by the change of the external environment, etc. SOLUTION: The tracking system of the robot device comprises: hierarchical recognition parts 15 having different image recognition levels consisting of a complexion recognition part 11, a face recognition part 12, an individual recognition part 13, and a voice direction recognition part 14 for recognizing the direction of voice, etc.; a recognition integrating part 21 having a control part for controlling the device to start tracking of the face of a person based on the result of recognition of the individual recognition part 13 and continue the tracking based on the result of recognition of the other recognition parts such as the face recognition part 12 on the failure of recognition of the individual recognition part 13; and a predicting part 31 for controlling the device to continue the tracking for the predicted direction of the object based on the result of recognition obtained until just before, if the results can not be obtained from the other recognition parts. COPYRIGHT: (C)2005,JPO&NCIPI

Journal ArticleDOI
TL;DR: A component-based approach to visual object recognition rooted in supervised learning allows for a vision system that is more robust against changes in an object's pose or illumination.
Abstract: A component-based approach to visual object recognition rooted in supervised learning allows for a vision system that is more robust against changes in an object's pose or illumination. Learning figures prominently in the study of visual systems from the viewpoints of visual neuroscience and computer vision. Whereas visual neuroscience concentrates on mechanisms that let the cortex adapt its circuitry and learn a new task, computer vision aims at devising effectively trainable systems. Vision systems that learn and adapt are one of the most important trends in computer vision research. They might offer the only solution to developing robust, reusable vision systems.

Proceedings Article
01 Jan 2003
TL;DR: In this study, fundamental properties of Gabor features, construction of the simple feature space, and invariant search operations in the feature space are discussed in more detail.
Abstract: Invariant object recognition is one of the most challenging problems in computer vision. The authors propose a simple Gabor feature space, which has been successfully applied to applications, e.g., in invariant face detection to extract facial features in demanding environments. In the proposed feature space, illumination, rotation, scale, and translation invariant recognition of objects can be realized within a reasonable amount of computation. In this study, fundamental properties of Gabor features, construction of the simple feature space, and invariant search operations in the feature space are discussed in more detail.