scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2006"


Journal ArticleDOI
TL;DR: A novel representation for three-dimensional objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches is introduced, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints.
Abstract: This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints. The proposed approach does not require a separate segmentation stage, and it is applicable to highly cluttered scenes. Modeling and recognition results are presented.

458 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper investigates the application of the SIFT approach in the context of face authentication, and proposes and tests different matching schemes using the BANCA database and protocol, showing promising results.
Abstract: Several pattern recognition and classification techniques have been applied to the biometrics domain. Among them, an interesting technique is the Scale Invariant Feature Transform (SIFT), originally devised for object recognition. Even if SIFT features have emerged as a very powerful image descriptors, their employment in face analysis context has never been systematically investigated. This paper investigates the application of the SIFT approach in the context of face authentication. In order to determine the real potential and applicability of the method, different matching schemes are proposed and tested using the BANCA database and protocol, showing promising results.

386 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: The Implicit Shape Model for object class detection is combined with the multi-view specific object recognition system of Ferrari et al. to detect object instances from arbitrary viewpoints.
Abstract: We present a novel system for generic object class detection. In contrast to most existing systems which focus on a single viewpoint or aspect, our approach can detect object instances from arbitrary viewpoints. This is achieved by combining the Implicit Shape Model for object class detection proposed by Leibe and Schiele with the multi-view specific object recognition system of Ferrari et al. After learning single-view codebooks, these are interconnected by so-called activation links, obtained through multi-view region tracks across different training views of individual object instances. During recognition, these integrated codebooks work together to determine the location and pose of the object. Experimental results demonstrate the viability of the approach and compare it to a bank of independent single-view detectors

268 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: The performance of the proposed multi-object class detection approach is competitive to state of the art approaches dedicated to a single object class recognition problem.
Abstract: In this paper we propose an approach capable of simultaneous recognition and localization of multiple object classes using a generative model. A novel hierarchical representation allows to represent individual images as well as various objects classes in a single, scale and rotation invariant model. The recognition method is based on a codebook representation where appearance clusters built from edge based features are shared among several object classes. A probabilistic model allows for reliable detection of various objects in the same image. The approach is highly efficient due to fast clustering and matching methods capable of dealing with millions of high dimensional features. The system shows excellent performance on several object categories over a wide range of scales, in-plane rotations, background clutter, and partial occlusions. The performance of the proposed multi-object class detection approach is competitive to state of the art approaches dedicated to a single object class recognition problem.

266 citations


Book ChapterDOI
TL;DR: This chapter describes a system for constructing 3D metric models from multiple images taken with an uncalibrated handheld camera, recognizing these models in new images, and precisely solving for object pose.
Abstract: Many applications of 3D object recognition, such as augmented reality or robotic manipulation, require an accurate solution for the 3D pose of the recognized objects. This is best accomplished by building a metrically accurate 3D model of the object and all its feature locations, and then fitting this model to features detected in new images. In this chapter, we describe a system for constructing 3D metric models from multiple images taken with an uncalibrated handheld camera, recognizing these models in new images, and precisely solving for object pose. This is demonstrated in an augmented reality application where objects must be recognized, tracked, and superimposed on new images taken from arbitrary viewpoints without perceptible jitter. This approach not only provides for accurate pose, but also allows for integration of features from multiple training images into a single model that provides for more reliable recognition.

196 citations


Journal ArticleDOI
TL;DR: In this paper, a novel object recognition approach based on affine invariant regions is presented, which actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions.
Abstract: We present a novel Object Recognition approach based on affine invariant regions. It actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions. After producing an initial set of matches, the method gradually explores the surrounding image areas, recursively constructing more and more matching regions, increasingly farther from the initial ones. This process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. The approach includes a mechanism for capturing the relationships between multiple model views and exploiting these for integrating the contributions of the views at recognition time. This is based on an efficient algorithm for partitioning a set of region matches into groups lying on smooth surfaces. Integration is achieved by measuring the consistency of configurations of groups arising from different model views. Experimental results demonstrate the stronger power of the approach in dealing with extensive clutter, dominant occlusion, and large scale and viewpoint changes. Non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. All presented techniques can extend any view-point invariant feature extractor.

186 citations


Journal ArticleDOI
TL;DR: A method for automatically obtaining object representations suitable for retrieval from generic video shots that includes associating regions within a single shot to represent a deforming object and an affine factorization method that copes with motion degeneracy.
Abstract: We describe a method for automatically obtaining object representations suitable for retrieval from generic video shots. The object representation consists of an association of frame regions. These regions provide exemplars of the object's possible visual appearances. Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating regions from the multiple visual aspects of a 3D object, thereby implicitly representing 3D structure. For the association we exploit temporal continuity (tracking) and wide baseline matching of affine covariant regions. In the implementation there are three areas of novelty: First, we describe a method to repair short gaps in tracks. Second, we show how to join tracks across occlusions (where many tracks terminate simultaneously). Third, we develop an affine factorization method that copes with motion degeneracy. We obtain tracks that last throughout the shot, without requiring a 3D reconstruction. The factorization method is used to associate tracks into object-level groups, with common motion. The outcome is that separate parts of an object that are not simultaneously visible (such as the front and back of a car, or the front and side of a face) are associated together. In turn this enables object-level matching and recognition throughout a video. We illustrate the method on the feature film "Groundhog Day." Examples are given for the retrieval of deforming objects (heads, walking people) and rigid objects (vehicles, locations).

162 citations


Journal ArticleDOI
TL;DR: A recognition framework based on the concept of the so-called generic learning is introduced as an attempt to boost the performance of traditional appearance-based recognition solutions in the one training sample application scenario.

108 citations


Book ChapterDOI
13 May 2006
TL;DR: An instance based machine learning algorithm and system for real-time object classification and human action recognition which can help to build intelligent surveillance systems are presented.
Abstract: In this paper we present an instance based machine learning algorithm and system for real-time object classification and human action recognition which can help to build intelligent surveillance systems. The proposed method makes use of object silhouettes to classify objects and actions of humans present in a scene monitored by a stationary camera. An adaptive background subtract-tion model is used for object segmentation. Template matching based supervised learning method is adopted to classify objects into classes like human, human group and vehicle; and human actions into predefined classes like walking, boxing and kicking by making use of object silhouettes.

104 citations


Journal ArticleDOI
TL;DR: A new scheme that merges color- and shape-invariant information for object recognition and is robust against changing illumination, camera viewpoint, object pose, and noise is proposed.
Abstract: In this paper, we propose a new scheme that merges color- and shape-invariant information for object recognition. To obtain robustness against photometric changes, color-invariant derivatives are computed first. Color invariance is an important aspect of any object recognition scheme, as color changes considerably with the variation in illumination, object pose, and camera viewpoint. These color invariant derivatives are then used to obtain similarity invariant shape descriptors. Shape invariance is equally important as, under a change in camera viewpoint and object pose, the shape of a rigid object undergoes a perspective projection on the image plane. Then, the color and shape invariants are combined in a multidimensional color-shape context which is subsequently used as an index. As the indexing scheme makes use of a color-shape invariant context, it provides a high-discriminative information cue robust against varying imaging conditions. The matching function of the color-shape context allows for fast recognition, even in the presence of object occlusion and cluttering. From the experimental results, it is shown that the method recognizes rigid objects with high accuracy in 3-D complex scenes and is robust against changing illumination, camera viewpoint, object pose, and noise.

97 citations


Proceedings ArticleDOI
20 Aug 2006
TL;DR: A new DBN model structure with state duration to model human interacting activities based on dynamic Bayesian network is proposed, which combines the global features with local ones harmoniously.
Abstract: Activity recognition is significant in intelligent surveillance. In this paper, we present a novel approach to the recognition of interacting activities based on dynamic Bayesian network (DBN). In this approach the features representing the object motion are divided into two classes: global features and local features, which are at two different spatial scales. Global features describe object motion at a large spatial scale and relations between objects or between the object and environment, and local ones represent the motion details of objects of interest. We propose a new DBN model structure with state duration to model human interacting activities. This DBN model structure combines the global features with local ones harmoniously. The effectiveness of this novel approach is demonstrated by experiment.

Patent
29 Aug 2006
TL;DR: In this article, a mobile device is used to electronically capture image data of a real-world object, and the image data are used to identify information related to the real world object and interact with software to control at least one aspect of an electronic game; and a second device local to the mobile device.
Abstract: Systems and methods of interacting with a virtual space, in which a mobile device is used to electronically capture image data of a real-world object, the image data is used to identify information related to the real-world object, and the information is used to interact with software to control at least one of: (a) an aspect of an electronic game; and (b) a second device local to the mobile device. Contemplated systems and methods can be used to gaming, in which the image data can be used to identify a name of the real-world object, to classify the real-world object, identify the real-world object as a player in the game, to identify the real-world object as a goal object or as having some other value in the game, to use the image data to identify the real-world object as a goal object in the game.

Proceedings ArticleDOI
07 Jun 2006
TL;DR: By studying face geometry, this work is able to determine which type of facial expression has been carried out, thus building an expression classifier which is capable of recognizing faces with different expressions.
Abstract: Face recognition is one of the most intensively studied topics in computer vision and pattern recognition. Facial expression, which changes face geometry, usually has an adverse effect on the performance of a face recognition system. On the other hand, face geometry is a useful cue for recognition. Taking these into account, we utilize the idea of separating geometry and texture information in a face image and model the two types of information by projecting them into separate PCA spaces which are specially designed to capture the distinctive features among different individuals. Subsequently, the texture and geometry attributes are re-combined to form a classifier which is capable of recognizing faces with different expressions. Finally, by studying face geometry, we are able to determine which type of facial expression has been carried out, thus build an expression classifier. Numerical validations of the proposed method are given.

Journal ArticleDOI
TL;DR: The application of this particular visual pattern recognition (ViPR) technology to a variety of robotics applications: object recognition, navigation, manipulation, and human-machine interaction is described.
Abstract: Recent advances in computer vision have given rise to a robust and invariant visual pattern recognition technology that is based on extracting a set of characteristic features from an image. Such features are obtained with the scale invariant feature transform (SIFT) which represents the variations in brightness of the image around the point of interest. Recognition performed with these features has been shown to be quite robust in realistic settings. This paper describes the application of this particular visual pattern recognition (ViPR) technology to a variety of robotics applications: object recognition, navigation, manipulation, and human-machine interaction. The paper also describes the technology in more detail and presents a business case for visual pattern recognition in the field of robotics and automation

Book ChapterDOI
Stan Z. Li1, Rufeng Chu1, Meng Ao1, Lun Zhang1, Ran He1 
05 Jan 2006
TL;DR: In this article, a real-time face recognition system for cooperative user applications is presented, which is based on local feature representation and statistical learning is applied to learn most effective features and classifiers for building face detection and recognition engines.
Abstract: In this paper, we present a highly accurate, realtime face recognition system for cooperative user applications. The novelties are: (1) a novel design of camera hardware, and (2) a learning based procedure for effective face and eye detection and recognition with the resulting imagery. The hardware minimizes environmental lighting and delivers face images with frontal lighting. This avoids many problems in subsequent face processing to a great extent. The face detection and recognition algorithms are based on a local feature representation. Statistical learning is applied to learn most effective features and classifiers for building face detection and recognition engines. The novel imaging system and the detection and recognition engines are integrated into a powerful face recognition system. Evaluated in real-world user scenario, a condition that is harder than a technology evaluation such as Face Recognition Vendor Tests (FRVT), the system has demonstrated excellent accuracy, speed and usability.

Journal Article
Stan Z. Li1, Rufeng Chu1, Meng Ao1, Lun Zhang1, Ran He1 
TL;DR: A novel design of camera hardware, and a learning based procedure for effective face and eye detection and recognition with the resulting imagery, which has demonstrated excellent accuracy, speed and usability.
Abstract: In this paper, we present a highly accurate, realtime face recognition system for cooperative user applications. The novelties are: (1) a novel design of camera hardware, and (2) a learning based procedure for effective face and eye detection and recognition with the resulting imagery. The hardware minimizes environmental lighting and delivers face images with frontal lighting. This avoids many problems in subsequent face processing to a great extent. The face detection and recognition algorithms are based on a local feature representation. Statistical learning is applied to learn most effective features and classifiers for building face detection and recognition engines. The novel imaging system and the detection and recognition engines are integrated into a powerful face recognition system. Evaluated in real-world user scenario, a condition that is harder than a technology evaluation such as Face Recognition Vendor Tests (FRVT), the system has demonstrated excellent accuracy, speed and usability.

Proceedings ArticleDOI
01 Jan 2006
TL;DR: A new method for providing insensitivity to expression variation in range images based on Log-Gabor Templates is presented by decomposing a single image of a subject into 147 observations allowing high accuracy even in the presence of occulusions, distortions and facial expressions.
Abstract: The use of Three Dimensional (3D) data allows new facial recognition algorithms to overcome factors such as pose and illumination variations which have plagued traditional 2D Face Recognition. In this paper a new method for providing insensitivity to expression variation in range images based on Log-Gabor Templates is presented. By decomposing a single image of a subject into 147 observations the reliance of the algorithm upon any particular part of the face is relaxed allowing high accuracy even in the presence of occulusions, distortions and facial expressions. Using the 3D database collected by University of Notre Dame for the Face Recognition Grand Challenge (FRGC), benchmarking results are presented showing superior performance of the proposed method. Comparisons showing the relative strength of the algorithm against two commercial and two academic 3D face recognition algorithms are also presented. algoritms are also presented. 1 Introduction

Journal ArticleDOI
TL;DR: The formulation of a probabilistic appearance-based face recognition approach is extended to work with multiple images and video sequences and it is shown that regardless of the algorithm used, the recognition results improve considerably when one uses a video sequence rather than a single still.

Book ChapterDOI
07 May 2006
TL;DR: A fully automatic recognition system based on the proposed method and an extensive evaluation on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation that consistently demonstrated a nearly perfect recognition rate is described.
Abstract: In spite of over two decades of intense research, illumination and pose invariance remain prohibitively challenging aspects of face recognition for most practical applications. The objective of this work is to recognize faces using video sequences both for training and recognition input, in a realistic, unconstrained setup in which lighting, pose and user motion pattern have a wide variability and face images are of low resolution. In particular there are three areas of novelty: (i) we show how a photometric model of image formation can be combined with a statistical model of generic face appearance variation, learnt offline, to generalize in the presence of extreme illumination changes; (ii) we use the smoothness of geodesically local appearance manifold structure and a robust same-identity likelihood to achieve invariance to unseen head poses; and (iii) we introduce an accurate video sequence “reillumination” algorithm to achieve robustness to face motion patterns in video. We describe a fully automatic recognition system based on the proposed method and an extensive evaluation on 171 individuals and over 1300 video sequences with extreme illumination, pose and head motion variation. On this challenging data set our system consistently demonstrated a nearly perfect recognition rate (over 99.7%), significantly outperforming state-of-the-art commercial software and methods from the literature.

Patent
John Winn1, Jamie Shotton1
21 Sep 2006
TL;DR: In this article, a conditional random field is used to force a global part labeling which is substantially layout-consistent and a part label map is inferred from this, which can be used to estimate belief distributions over parts for each image element of a test image.
Abstract: During a training phase we learn parts of images which assist in the object detection and recognition task. A part is a densely represented area of an image of an object to which we assign a unique label. Parts contiguously cover an image of an object to give a part label map for that object. The parts do not necessarily correspond to semantic object parts. During the training phase a classifier is learnt which can be used to estimate belief distributions over parts for each image element of a test image. A conditional random field is used to force a global part labeling which is substantially layout-consistent and a part label map is inferred from this. By recognizing parts we enable object detection and recognition even for partially occluded objects, for multiple-objects of different classes in the same scene, for unstructured and structured objects and allowing for object deformation.

Proceedings ArticleDOI
22 Nov 2006
TL;DR: A video surveillance system aimed at the automatic identification of events of interest, especially of abandoned and stolen objects in a guarded indoor environment, which combines three phases of data processing: object extraction, object recognition and tracking, and decision about actions.
Abstract: This paper describes a video surveillance system aimed at the automatic identification of events of interest, especially of abandoned and stolen objects in a guarded indoor environment. In particular the implemented system combines three phases of data processing: object extraction, object recognition and tracking, and decision about actions. Extracted objects are classified as "human" or "non-human" and static or dynamic, an event of interest following from a split between a "human" and a static "non-human" object, finally static "nonhuman" is analyzed to discriminate between abandoned or stolen object.

Proceedings ArticleDOI
Masahiro Tomono1
01 Oct 2006
TL;DR: This paper proposes a framework to integrate dense shape and recognition features into an object model, and shows that an object map of a room was built successfully using the proposed object models.
Abstract: This paper presents a method of object map building using object models created from image sequences captured by a single camera. Object map is a highly structured map, which is built by placing 3-D object models on the floor plane according to object recognition results. To increase the efficiency of object map building, we propose a framework to integrate dense shape and recognition features into an object model. Experimental results show that an object map of a room was built successfully using the proposed object models.

Proceedings ArticleDOI
14 Jun 2006
TL;DR: A complete object recognition system, based on a 3D laser scanner, reliable contour extraction with floor interpretation, feature extraction using a new, fast eigen-CSS method, and a supervised learning algorithm is proposed.
Abstract: This paper presents a novel object recognition approach based on range images. Due to its insensitivity to illumination, range data is well suited for reliable silhouette extraction. Silhouette or contour descriptions are good sources of information for object recognition. We propose a complete object recognition system, based on a 3D laser scanner, reliable contour extraction with floor interpretation, feature extraction using a new, fast eigen-CSS method, and a supervised learning algorithm. The recognition system was successfully tested on range images acquired with a mobile robot, and the results are compared to standard techniques, i.e., geometric features, Hu and Zernike moments, the border signature method and the angular radial transformation. An evaluation using the receiver operating characteristic analysis completes this paper. The eigen-CSS method has proved to be comparable in detection performance to the top competitors, yet faster than the best one by an order of magnitude in feature extraction time.

Proceedings ArticleDOI
01 Oct 2006
TL;DR: This paper proposes a SLAM scheme based on visual object recognition, not just a scene matching, in home environment is proposed without using artificial landmarks, and shows that the final pose error was bounded after battery-run-out autonomous navigation for 50 minutes.
Abstract: Reliable data association is crucial to localization and map building for mobile robot applications. For that reason, many mobile robots tend to choose vision-based SLAM solutions. In this paper, a SLAM scheme based on visual object recognition, not just a scene matching, in home environment is proposed without using artificial landmarks. For the object-based SLAM, the following algorithms are suggested: 1) a novel local invariant feature extraction by combining advantages of multi-scale Harris corner as a detector and its SIFT descriptor for natural object recognition, 2) the RANSAC clustering for robust object recognition in the presence of outliers and 3) calculating accurate metric information for SLAM update. The proposed algorithms increase robustness by correct data association and accurate observation. Moreover, it also can be easily implemented real-time by reducing the number of representative landmarks, i.e. objects. The performance of the proposed algorithm was verified by experiments using EKF-SLAM with a stereo camera in home-like environments, and it showed that the final pose error was bounded after battery-run-out autonomous navigation for 50 minutes

Patent
25 Jan 2006
TL;DR: In this paper, an image recognizing apparatus which can increase the recognition rate of the image of a recognition target even when the recognition performance in the image recognition operation would deteriorate otherwise due to inability of obtaining good image information on the recognition target if the operation relied solely on picked up image information.
Abstract: There is provided e.g. an image recognizing apparatus which can increase the recognition rate of the image of a recognition target even when the recognition rate in the image recognition operation would deteriorate otherwise due to inability of obtaining good image information on the recognition target if the operation relied solely on picked up image information. The apparatus includes an image information obtaining section 3, an imaging position obtaining section 7, a land object information storing section 8, a land object information obtaining section 9 for obtaining, from the land object information storing section 8, the land object information on one or more land objects included within an imaging area of the image information, a determining section 15 for determining whether or not a plurality of recognition target land objects to be recognized are included within the imaging area of the image information, based on the obtained land object information and an image recognizing section 10 for recognizing an image of one recognition target land object, based on result of image recognition of another recognition target land object and on position relationship between the one recognition target land object and another recognition target land object based on the position information included in the land object information, if the determining section has determined that a plurality of recognition target land objects are included.

Proceedings ArticleDOI
02 Feb 2006
TL;DR: This work takes a very practical look at the automated shape recognition for common industrial tasks and presents a very fast novel approach for the detection of deformed shapes which are in the broadest sense elliptic.
Abstract: The detection of varying 2D shapes is a recurrent task for Computer Vision applications, and camera based object recognition has become a standard procedure. Due to the discrete nature of digital images and aliasing effects, shape recognition can be complicated. There are many existing algorithms that discuss the identification of circles and ellipses, but they are very often limited in flexibility or speed or require high quality input data. Our work considers the application of shape recognition for processes in industrial environments and, especially the automatization requires reliable and fast algorithms at the same time. We take a very practical look at the automated shape recognition for common industrial tasks and present a very fast novel approach for the detection of deformed shapes which are in the broadest sense elliptic. Furthermore, we consider the automated recognition of bacteria colonies and coded markers for both 3D object tracking and an automated camera calibration procedure.

Book ChapterDOI
12 Sep 2006
TL;DR: A novel model for object recognition and detection that follows the widely adopted assumption that objects in images can be represented as a set of loosely coupled parts is presented and yields very competitive results for the commonly used Caltech object detection tasks.
Abstract: We present a novel model for object recognition and detection that follows the widely adopted assumption that objects in images can be represented as a set of loosely coupled parts. In contrast to former models, the presented method can cope with an arbitrary number of object parts. Here, the object parts are modelled by image patches that are extracted at each position and then efficiently stored in a histogram. In addition to the patch appearance, the positions of the extracted patches are considered and provide a significant increase in the recognition performance. Additionally, a new and efficient histogram comparison method taking into account inter-bin similarities is proposed. The presented method is evaluated for the task of radiograph recognition where it achieves the best result published so far. Furthermore it yields very competitive results for the commonly used Caltech object detection tasks.

Journal Article
TL;DR: In this article, a class-specific edge classification method is proposed to prune edges which are not relevant to the object class, and thereby improve the performance of subsequent processing, and demonstrate learning class specific edges for a number of object classes under challenging scale and illumination variation.
Abstract: Recent research into recognizing object classes (such as humans, cows and hands) has made use of edge features to hypothesize and localize class instances. However, for the most part, these edge-based methods operate solely on the geometric shape of edges, treating them equally and ignoring the fact that for certain object classes, the appearance of the object on the inside of the edge may provide valuable recognition cues. We show how. for such object classes, small regions around edges can be used to classify the edge into object or non-object. This classifier may then be used to prune edges which are not relevant to the object class, and thereby improve the performance of subsequent processing. We demonstrate learning class specific edges for a number of object classes -oranges, bananas and bottles - under challenging scale and illumination variation. Because class-specific edge classification provides a low-level analysis of the image it may be integrated into any edge-based recognition strategy without significant change in the high-level algorithms. We illustrate its application to two algorithms: (i) chamfer matching for object detection, and (ii) modulating contrast terms in MRF based object-specific segmentation. We show that performance of both algorithms (matching and segmentation) is considerably improved by the class-specific edge labelling.

Book ChapterDOI
13 Dec 2006
TL;DR: It is shown how, for certain object classes, small regions around edges can be used to classify the edge into object or non-object, and performance of both algorithms (matching and segmentation) is considerably improved by the class-specific edge labelling.
Abstract: Recent research into recognizing object classes (such as humans, cows and hands) has made use of edge features to hypothesize and localize class instances. However, for the most part, these edge-based methods operate solely on the geometric shape of edges, treating them equally and ignoring the fact that for certain object classes, the appearance of the object on the “inside” of the edge may provide valuable recognition cues. We show how, for such object classes, small regions around edges can be used to classify the edge into object or non-object. This classifier may then be used to prune edges which are not relevant to the object class, and thereby improve the performance of subsequent processing. We demonstrate learning class specific edges for a number of object classes — oranges, bananas and bottles — under challenging scale and illumination variation. Because class-specific edge classification provides a low-level analysis of the image it may be integrated into any edge-based recognition strategy without significant change in the high-level algorithms. We illustrate its application to two algorithms: (i) chamfer matching for object detection, and (ii) modulating contrast terms in MRF based object-specific segmentation. We show that performance of both algorithms (matching and segmentation) is considerably improved by the class-specific edge labelling.

Proceedings ArticleDOI
01 Oct 2006
TL;DR: The effectiveness of the method for representing and recognizing visual events using attribute grammars for the task of recognizing vehicle casing in parking lots and events occurring in an airport tarmac is demonstrated.
Abstract: We present a method for representing and recognizing visual events using attribute grammars. In contrast to conventional grammars, attribute grammars are capable of describing features that are not easily represented by finite symbols. Our approach handles multiple concurrent events involving multiple entities by associating unique object identification labels with multiple event threads. Probabilistic parsing and probabilistic conditions on the attributes are used to achieve a robust recognition system. We demonstrate the effectiveness of our method for the task of recognizing vehicle casing in parking lots and events occurring in an airport tarmac.