scispace - formally typeset
Search or ask a question

Showing papers on "Object model published in 2012"


Journal ArticleDOI
TL;DR: The CCFV removes the constraint of static camera array settings for view capturing and can be applied to any view-based 3-D object database and experimental results show that the proposed scheme can achieve better performance than state-of-the-art methods.
Abstract: Recently, extensive research efforts have been dedicated to view-based methods for 3-D object retrieval due to the highly discriminative property of multiviews for 3-D object representation. However, most of state-of-the-art approaches highly depend on their own camera array settings for capturing views of 3-D objects. In order to move toward a general framework for 3-D object retrieval without the limitation of camera array restriction, a camera constraint-free view-based (CCFV) 3-D object retrieval algorithm is proposed in this paper. In this framework, each object is represented by a free set of views, which means that these views can be captured from any direction without camera constraint. For each query object, we first cluster all query views to generate the view clusters, which are then used to build the query models. For a more accurate 3-D object comparison, a positive matching model and a negative matching model are individually trained using positive and negative matched samples, respectively. The CCFV model is generated on the basis of the query Gaussian models by combining the positive matching model and the negative matching model. The CCFV removes the constraint of static camera array settings for view capturing and can be applied to any view-based 3-D object database. We conduct experiments on the National Taiwan University 3-D model database and the ETH 3-D object database. Experimental results show that the proposed scheme can achieve better performance than state-of-the-art methods.

223 citations


Journal ArticleDOI
TL;DR: The system described in this article was constructed specifically for the generation of model data for object recognition, localization and manipulation tasks and it allows 2D image and 3D geometric data of everyday objects to be obtained semi-automatically.
Abstract: For the execution of object recognition, localization and manipulation tasks, most algorithms use object models. Most models are derived from, or consist of two-dimensional (2D) images and/or three-dimensional (3D) geometric data. The system described in this article was constructed specifically for the generation of such model data. It allows 2D image and 3D geometric data of everyday objects be obtained semi-automatically. The calibration provided allows 2D data to be related to 3D data. Through the use of high-quality sensors, high-accuracy data is achieved. So far over 100 objects have been digitized using this system and the data has been successfully used in several international research projects. All of the models are freely available on the web via a front-end that allows preview and filtering of the data.

220 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories and can be applied to scene understanding tasks that local detectors alone cannot solve.
Abstract: There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. A context model can rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit of context models has been limited because most of the previous methods were tested on data sets with only a few object categories, in which most images contain one or two object categories. In this paper, we introduce a new data set with images that contain many instances of different object categories, and propose an efficient model that captures the contextual information among more than a hundred object categories using a tree structure. Our model incorporates global image features, dependencies between object categories, and outputs of local detectors into one probabilistic framework. We demonstrate that our context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories. In addition, our model can be applied to scene understanding tasks that local detectors alone cannot solve, such as detecting objects out of context or querying for the most typical and the least typical scenes in a data set.

166 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel method for multivehicle detection and tracking using a vehicle-mounted monocular camera that combines both global and local features of the vehicle as a deformable object model through the combination of a latent support vector machine and histograms of oriented gradients.
Abstract: This paper proposes a novel method for multivehicle detection and tracking using a vehicle-mounted monocular camera. In the proposed method, the features of vehicles are learned as a deformable object model through the combination of a latent support vector machine (LSVM) and histograms of oriented gradients (HOGs). The detection algorithm combines both global and local features of the vehicle as a deformable object model. Detected vehicles are tracked through a particle filter, which estimates the particles' likelihood by using a detection scores map and template compatibility for both root and parts of the vehicle while considering the deformation cost caused by the movement of vehicle parts. Tracking likelihoods are iteratively used as a priori probability to generate vehicle hypothesis regions and update the detection threshold to reduce false negatives of the algorithm presented before. Extensive experiments in urban scenarios showed that the proposed method can achieve an average vehicle detection rate of 97% and an average vehicle-tracking rate of 86% with a false positive rate of less than 0.26%.

159 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: Experiments on challenging video sequences show that the algorithm significantly improves over state-of-the-art descriptor matching techniques using a range of descriptors, as well as recent online learning based approaches.
Abstract: Efficient keypoint-based object detection methods are used in many real-time computer vision applications. These approaches often model an object as a collection of keypoints and associated descriptors, and detection then involves first constructing a set of correspondences between object and image keypoints via descriptor matching, and subsequently using these correspondences as input to a robust geometric estimation algorithm such as RANSAC to find the transformation of the object in the image. In such approaches, the object model is generally constructed offline, and does not adapt to a given environment at runtime. Furthermore, the feature matching and transformation estimation stages are treated entirely separately. In this paper, we introduce a new approach to address these problems by combining the overall pipeline of correspondence generation and transformation estimation into a single structured output learning framework. Following the recent trend of using efficient binary descriptors for feature matching, we also introduce an approach to approximate the learned object model as a collection of binary basis functions which can be evaluated very efficiently at runtime. Experiments on challenging video sequences show that our algorithm significantly improves over state-of-the-art descriptor matching techniques using a range of descriptors, as well as recent online learning based approaches.

139 citations


Book ChapterDOI
07 Oct 2012
TL;DR: This paper constructs a functional object description with the aim to recognize objects by the way people interact with them, and describes scene objects (sofas, tables, chairs) by associated human poses and object appearance.
Abstract: Our everyday objects support various tasks and can be used by people for different purposes While object classification is a widely studied topic in computer vision, recognition of object function, ie, what people can do with an object and how they do it, is rarely addressed In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them We describe scene objects (sofas, tables, chairs) by associated human poses and object appearance Our model is learned discriminatively from automatically estimated body poses in many realistic scenes In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes

129 citations


Book ChapterDOI
Manu Sridharan1, Julian Dolby1, Satish Chandra1, Max Schäfer1, Frank Tip1 
11 Jun 2012
TL;DR: In an experimental evaluation, it is found that correlation tracking often dramatically improved analysis scalability and precision on popular JavaScript frameworks, though in some cases scalability challenges remain.
Abstract: JavaScript poses significant challenges for points-to analysis, particularly due to its flexible object model in which object properties can be created and deleted at run-time and accessed via first-class names. These features cause an increase in the worst-case running time of field-sensitive Andersen-style analysis, which becomes O(N4), where N is the program size, in contrast to the O(N3) bound for languages like Java. In practice, we found that a standard implementation of the analysis was unable to analyze popular JavaScript frameworks.We identify correlated dynamic property accesses as a common code pattern that is analyzed very imprecisely by the standard analysis, and show how a novel correlation tracking technique enables us to handle this pattern more precisely, thereby making the analysis more scalable. In an experimental evaluation, we found that correlation tracking often dramatically improved analysis scalability and precision on popular JavaScript frameworks, though in some cases scalability challenges remain.

126 citations


Proceedings ArticleDOI
24 Dec 2012
TL;DR: An object pose estimation algorithm exploiting both depth and color information is presented, and it is argued that exploiting color information significantly enhances the performance of the voting process in terms of both time and accuracy.
Abstract: In this paper, we present an object pose estimation algorithm exploiting both depth and color information. While many approaches assume that a target region is cleanly segmented from background, our approach does not rely on that assumption, and thus it can estimate pose of a target object in heavy clutter. Recently, an oriented point pair feature was introduced as a low dimensional description of object surfaces. The feature has been employed in a voting scheme to find a set of possible 3D rigid transformations between object model and test scene features. While several approaches using the pair features require an accurate 3D CAD model as training data, our approach only relies on several scanned views of a target object, and hence it is straightforward to learn new objects. In addition, we argue that exploiting color information significantly enhances the performance of the voting process in terms of both time and accuracy. To exploit the color information, we define a color point pair feature, which is employed in a voting scheme for more effective pose estimation. We show extensive quantitative results of comparative experiments between our approach and a state-of-the-art.

109 citations


Book ChapterDOI
07 Oct 2012
TL;DR: The 3D Deformable Part Model (3D2PM) leverages on CAD data of the object class, as a 3D geometry proxy, and can be jointly optimized in a discriminative fashion for object detection and viewpoint estimation.
Abstract: As objects are inherently 3-dimensional, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 2D feature-based models are the predominant paradigm in object recognition today. While such models have shown competitive bounding box (BB) detection performance, they are clearly limited in their capability of fine-grained reasoning in 3D or continuous viewpoint estimation as required for advanced tasks such as 3D scene understanding. This work extends the deformable part model [1] to a 3D object model. It consists of multiple parts modeled in 3D and a continuous appearance model. As a result, the model generalizes beyond BB oriented object detection and can be jointly optimized in a discriminative fashion for object detection and viewpoint estimation. Our 3D Deformable Part Model (3D2PM) leverages on CAD data of the object class, as a 3D geometry proxy.

103 citations


Proceedings Article
28 May 2012
TL;DR: A model to specify workflow-centric research objects is proposed, and how the model can be grounded using semantic technologies and existing vocabularies, in particular the Object Reuse and Exchange model and the Annotation Ontology (AO).
Abstract: A workflow-centric research object bundles a workflow, the provenance of the results obtained by its enactment, other digital objects that are relevant for the experiment (papers, datasets, etc.), and annotations that semantically describe all these objects. In this paper, we propose a model to specify workflow-centric research objects, and show how the model can be grounded using semantic technologies and existing vocabularies, in particular the Object Reuse and Exchange (ORE) model and the Annotation Ontology (AO).We describe the life-cycle of a research object, which resembles the life-cycle of a scientific experiment.

102 citations


Proceedings ArticleDOI
24 Dec 2012
TL;DR: This paper presents an approach to textureless object detection and tracking of the 3D pose that is coherently integrated in a particle filtering framework on the special Euclidean group, SE(3), in which the visual tracking problem is tackled by maintaining multiple hypotheses of the object pose.
Abstract: This paper presents an approach to textureless object detection and tracking of the 3D pose. Our detection and tracking schemes are coherently integrated in a particle filtering framework on the special Euclidean group, SE(3), in which the visual tracking problem is tackled by maintaining multiple hypotheses of the object pose. For textureless object detection, an efficient chamfer matching is employed so that a set of coarse pose hypotheses is estimated from the matching between 2D edge templates of an object and a query image. Particles are then initialized from the coarse pose hypotheses by randomly drawing based on costs of the matching. To ensure the initialized particles are at or close to the global optimum, an annealing process is performed after the initialization. While a standard edge-based tracking is employed after the annealed initialization, we employ a refinement process to establish improved correspondences between projected edge points from the object model and edge points from an input image. Comparative results for several image sequences with clutter are shown to validate the effectiveness of our approach.

Journal ArticleDOI
TL;DR: A novel foreground object detection scheme that integrates the top-down information based on the expectation maximization (EM) framework and uses the detection result of moving object to incorporate the domain knowledge of the object shapes into the construction of top- down information.
Abstract: In this paper, we present a novel foreground object detection scheme that integrates the top-down information based on the expectation maximization (EM) framework. In this generalized EM framework, the top-down information is incorporated in an object model. Based on the object model and the state of each target, a foreground model is constructed. This foreground model can augment the foreground detection for the camouflage problem. Thus, an object's state-specific Markov random field (MRF) model is constructed for detection based on the foreground model and the background model. This MRF model depends on the latent variables that describe each object's state. The maximization of the MRF model is the M-step in the EM framework. Besides fusing spatial information, this MRF model can also adjust the contribution of the top-down information for detection. To obtain detection result using this MRF model, sampling importance resampling is used to sample the latent variable and the EM framework refines the detection iteratively. Besides the proposed generalized EM framework, our method does not need any prior information of the moving object, because we use the detection result of moving object to incorporate the domain knowledge of the object shapes into the construction of top-down information. Moreover, in our method, a kernel density estimation (KDE)—Gaussian mixture model (GMM) hybrid model is proposed to construct the probability density function of background and moving object model. For the background model, it has some advantages over GMM- and KDE-based methods. Experimental results demonstrate the capability of our method, particularly in handling the camouflage problem.

Proceedings ArticleDOI
24 Dec 2012
TL;DR: This work describes and experimentally verify a semantic querying system aboard a mobile robot equipped with a Microsoft Kinect RGB-D sensor, which allows the system to operate in large, dynamic, and uncon-strained environments, where modeling every object that occurs or might occur is impractical.
Abstract: Recent years have seen rising interest in robotic mapping algorithms that operate at the level of objects, rather than two- or three-dimensional occupancy. Such “semantic maps” permit higher-level reasoning than occupancy maps, and are useful for any application that involves dealing with objects, including grasping, change detection, and object search. We describe and experimentally verify such a system aboard a mobile robot equipped with a Microsoft Kinect RGB-D sensor. Our representation is object-based, and makes uniquely weak assumptions about the quality of the perceptual data available; in particular, we perform no explicit object recognition. This allows our system to operate in large, dynamic, and uncon-strained environments, where modeling every object that occurs (or might occur) is impractical. Our dataset, which is publicly available, consists of 67 autonomous runs of our robot over a six-week period in a roughly 1600m2 office environment. We demonstrate two applications built on our system: semantic querying and change detection.

Journal ArticleDOI
TL;DR: This paper presents an innovative approach for detecting and localizing duplicate objects in pick-and-place applications under extreme conditions of occlusion, where standard appearance-based approaches are likely to be ineffective.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: This work presents a system that is able to recognize complex, fine-grained human actions involving the manipulation of objects in realistic action sequences by combining these elements in a single model that outperforms existing state of the art techniques on this dataset.
Abstract: Understanding natural human activity involves not only identifying the action being performed, but also locating the semantic elements of the scene and describing the person's interaction with them. We present a system that is able to recognize complex, fine-grained human actions involving the manipulation of objects in realistic action sequences. Our method takes advantage of recent advances in sensors and pose trackers in learning an action model that draws on successful discriminative techniques while explicitly modeling both pose trajectories and object manipulations. By combining these elements in a single model, we are able to simultaneously recognize actions and track the location and manipulation of objects. To showcase this ability, we introduce a novel Cooking Action Dataset that contains video, depth readings, and pose tracks from a Kinect sensor. We show that our model outperforms existing state of the art techniques on this dataset as well as the VISINT dataset with only video sequences.

Journal ArticleDOI
TL;DR: The means by which object structure constrains the distribution of spatial attention were examined, resulting in a "grouped array" representation of attention, implicating a relatively early locus for the grouped array representation.
Abstract: Attention operates to select both spatial locations and perceptual objects. However, the specific mechanism by which attention is oriented to objects is not well understood. We examined the means by which object structure constrains the distribution of spatial attention (i.e., a “grouped array”). Using a modified version of the Egly et al. object cuing task, we systematically manipulated within-object distance and object boundaries. Four major findings are reported: 1) spatial attention forms a gradient across the attended object; 2) object boundaries limit the distribution of this gradient, with the spread of attention constrained by a boundary; 3) boundaries within an object operate similarly to across-object boundaries: we observed object-based effects across a discontinuity within a single object, without the demand to divide or switch attention between discrete object representations; and 4) the gradient of spatial attention across an object directly modulates perceptual sensitivity, implicating a relatively early locus for the grouped array representation.

Patent
05 Sep 2012
TL;DR: Zhang et al. as mentioned in this paper proposed a method and a system for providing related search, so as to solve the problem that the prior search engine can not provide wider query by building an object model of web page information, and defining object properties in the object model; identifying and extracting object property information from the web pages, and performing pertinent recommendation on objects of which the incidence relations accord with preset conditions.
Abstract: The invention discloses a method and a system for providing related search, so as to solve the problem that the prior search engine can not provide wider query. The method comprises: building an object model of web page information, and defining object properties in the object model; identifying and extracting object property information from the web page information according to the definition of the object model; measuring incidence relations between objects; and performing pertinent recommendation on objects of which the incidence relations accord with preset conditions. The method provides a novel search mode, not only can provide more precise search within the search band but also can provide wider search, and provides pertinent recommendation function under the condition of query and browse. Based on the search mode, a user can continuously keep clicking on query recommendation under the browse mode after inputting query words under the query mode for the first time and enteringinto the browse mode, so as to achieve the effect of query browsing.

Journal ArticleDOI
TL;DR: The results show that the system is able to autonomously build a 3D model of an object in situ in an unknown environment and the planner deems that no further progress can be made, and the system stops.
Abstract: We present an integrated and fully autonomous eye-in-hand system for 3D object modeling. The system hardware consists of a laser range scanner mounted on a six-DOF manipulator arm and the task is to autonomously build a 3D model of an object in situ where the object may not be moved and must be scanned in its original location. Our system assumes no knowledge of object shape or geometry other than that it is within a bounding box whose location and size are known a priori, and, furthermore, the environment is unknown. The overall planner integrates the three main algorithms in the system: one that finds the next best view (NBV) for modeling the object; one that finds the NBV for exploration, i.e. exploring the environment, so the arm can move to the modeling view pose; and finally a sensor-based path planner, that is able to find a collision-free path to the view configuration determined by either of the the two view planners. Our modeling NBV algorithm efficiently searches the five-dimensional view space to determine the best modeling viewpoint, while considering key constraints such as field of view (FOV), overlap, and occlusion. If the determined viewpoint is reachable, the sensor-based path planner determines a collision-free path to move the manipulator to the desired view configuration, and a scan of the object is taken. Since the workspace is initially unknown, in some phases, the exploration view planner is used to increase information about the reachability and also the status of the modeling view configurations, since the view configuration may lie in an unknown workspace. This is repeated until the object modeling is complete or the planner deems that no further progress can be made, and the system stops. We have implemented the system with a six-DOF powercube arm and a wrist mounted Hokuyo URG-04LX laser scanner. Our results show that the system is able to autonomously build a 3D model of an object in situ in an unknown environment.

Patent
17 Feb 2012
TL;DR: In this paper, an object model transformer, a region comparator, and a model parameter determiner are configured to determine an updated set of model parameters on the basis of the region-related similarity measure and an optimization scheme.
Abstract: Apparatus for determining model parameters, the apparatus comprising an object model transformer, a region comparator, and a model parameter determiner. The object model transformer is configured to receive an object model of a known object and to transform the object model based on a set of model parameters from a first frame of reference to a second frame of reference, and is further configured to determine as result of this transformation a transformed object model comprising at least one region that is associated to an object region of the object. The region comparator is configured to receive the transformed object model and an image depicting the object, to determine for a selected region of the transformed object model a region-related similarity measure. The model parameter determiner is configured to determine an updated set of model parameters on the basis of the region-related similarity measure and an optimization scheme.

Journal ArticleDOI
27 Feb 2012-PLOS ONE
TL;DR: The proposed model extended a hierarchical model, which is motivated by biology, for different object recognition tasks, and used an evolutionary algorithm approach to select a set of informative patches, which are more informative than usual random patches.
Abstract: Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

Patent
27 Jan 2012
TL;DR: In this paper, an identification request is received from different objects of a network, and attributes and values of each object are ascertained, and at least one attribute-value pair from each object is filtered out.
Abstract: Methods and arrangements for object identification. An identification request is received from different objects of a network. Attributes and values of each object are ascertained, and at least one attribute-value pair from each object is filtered out. An ID is generated for each object based on at least one remaining attribute-value pair from the filtering.

Book ChapterDOI
07 Oct 2012
TL;DR: The co-detector designed in this paper obtains more accurate detection results than if objects were to be detected from each image individually, and the relevance of the scheme to other recognition problems such as single instance object recognition, wide-baseline matching, and image query is demonstrated.
Abstract: In this paper we introduce a new problem which we call object co-detection. Given a set of images with objects observed from two or multiple images, the goal of co-detection is to detect the objects, establish the identity of individual object instance, as well as estimate the viewpoint transformation of corresponding object instances. In designing a co-detector, we follow the intuition that an object has consistent appearance when observed from the same or different viewpoints. By modeling an object using state-of-the-art part-based representations such as [1,2], we measure appearance consistency between objects by comparing part appearance and geometry across images. This allows to effectively account for object self-occlusions and viewpoint transformations. Extensive experimental evaluation indicates that our co-detector obtains more accurate detection results than if objects were to be detected from each image individually. Moreover, we demonstrate the relevance of our co-detection scheme to other recognition problems such as single instance object recognition, wide-baseline matching, and image query.

Proceedings Article
01 Nov 2012
TL;DR: The transductive SVM (T-SVM) learning algorithm is explored in order to adapt virtual and real worlds for pedestrian detection and the use of unsupervised domain adaptation techniques that avoid human intervention during the adaptation process is proposed.
Abstract: Vision-based object detectors are crucial for different applications. They rely on learnt object models. Ideally, we would like to deploy our vision system in the scenario where it must operate. Then, the system should self-learn how to distinguish the objects of interest, i.e., without human intervention. However, the learning of each object model requires labelled samples collected through a tiresome manual process. For instance, we are interested in exploring the self-training of a pedestrian detector for driver assistance systems. Our first approach to avoid manual labelling consisted in the use of samples coming from realistic computer graphics, so that their labels are automatically available [12]. This would make possible the desired self-training of our pedestrian detector. However, as we showed in [14], between virtual and real worlds it may be a dataset shift. In order to overcome it, we propose the use of unsupervised domain adaptation techniques that avoid human intervention during the adaptation process. In particular, this paper explores the use of the transductive SVM (T-SVM) learning algorithm in order to adapt virtual and real worlds for pedestrian detection (Fig. 1).

Patent
29 Jun 2012
TL;DR: In this article, a system for allowing a virtual object to interact with other virtual objects across different spaces within an augmented reality (AR) environment and to transition between the different spaces is described.
Abstract: A system for allowing a virtual object to interact with other virtual objects across different spaces within an augmented reality (AR) environment and to transition between the different spaces is described. An AR environment may include a plurality of spaces, each comprising a bounded area or volume within the AR environment. In one example, an AR environment may be associated with a three-dimensional world space and a two-dimensional object space corresponding with a page of a book within the AR environment. A virtual object within the AR environment may be assigned to the object space and transition from the two-dimensional object space to the three-dimensional world space upon the detection of a space transition event. In some cases, a dual representation of the virtual object may be used to detect interactions between the virtual object and other virtual objects in both the world space and the object space.

Journal ArticleDOI
TL;DR: This paper investigates the visual extent of an object on the Pascal VOC dataset using a Bag-of-Words implementation with (colour) SIFT descriptors and confirms an early observation from human psychology: in the ideal situation with known object locations, recognition is no longer improved by considering surroundings, in contrast, in the normal situation with unknown objects, the surroundings significantly contribute to the recognition of most classes.
Abstract: The visual extent of an object reaches beyond the object itself. This is a long standing fact in psychology and is reflected in image retrieval techniques which aggregate statistics from the whole image in order to identify the object within. However, it is unclear to what degree and how the visual extent of an object affects classification performance. In this paper we investigate the visual extent of an object on the Pascal VOC dataset using a Bag-of-Words implementation with (colour) SIFT descriptors. Our analysis is performed from two angles. (a) Not knowing the object location, we determine where in the image the support for object classification resides. We call this the normal situation. (b) Assuming that the object location is known, we evaluate the relative potential of the object and its surround, and of the object border and object interior. We call this the ideal situation. Our most important discoveries are: (i) Surroundings can adequately distinguish between groups of classes: furniture, animals, and land-vehicles. For distinguishing categories within one group the surroundings become a source of confusion. (ii) The physically rigid plane, bike, bus, car, and train classes are recognised by interior boundaries and shape, not by texture. The non-rigid animals dog, cat, cow, and sheep are recognised primarily by texture, i.e. fur, as their projected shape varies greatly. (iii) We confirm an early observation from human psychology (Biederman in Perceptual Organization, pp. 213---263, 1981): in the ideal situation with known object locations, recognition is no longer improved by considering surroundings. In contrast, in the normal situation with unknown object locations, the surroundings significantly contribute to the recognition of most classes.

Proceedings Article
01 Nov 2012
TL;DR: This paper proposes a new approach to address the problem of Car Make and Model recognition by combining global and local information and utilizing discriminative information labeled by a human expert and it is validated through experiments on recognizing the make and model of sedan cars from single view images.
Abstract: This paper addresses the problem of Car Make and Model recognition as an example of within-category object class recognition. In this problem, it is assumed that the general category of the object is given and the goal is to recognize the object class within the same category. As compared to general object recognition, this problem is more challenging because the variations among classes within the same category are subtle, mostly dominated by the category overall characteristics, and easily missed due to pose and illumination variations. Therefore, this specific problem may not be effectively addressed using generic object recognition approaches. In this paper, we propose a new approach to address this specific problem by combining global and local information and utilizing discriminative information labeled by a human expert. We validate our approach through experiments on recognizing the make and model of sedan cars from single view images.

Journal ArticleDOI
TL;DR: A system for efficient geotag propagation based on a combination of object duplicate detection and user trust modeling is presented, which reduces the risk of propagating wrong tags caused by spamming or faulty annotation.
Abstract: In the past few years sharing photos within social networks has become very popular. In order to make these huge collections easier to explore, images are usually tagged with representative keywords such as persons, events, objects, and locations. In order to speed up the time consuming tag annotation process, tags can be propagated based on the similarity between image content and context. In this paper, we present a system for efficient geotag propagation based on a combination of object duplicate detection and user trust modeling. The geotags are propagated by training a graph based object model for each of the landmarks on a small tagged image set and finding its duplicates within a large untagged image set. Based on the established correspondences between these two image sets and the reliability of the user, tags are propagated from the tagged to the untagged images. The user trust modeling reduces the risk of propagating wrong tags caused by spamming or faulty annotation. The effectiveness of the proposed method is demonstrated through a set of experiments on an image database containing various landmarks.

Book ChapterDOI
07 Oct 2012
TL;DR: The approach groups fragmented object regions using the Multiple Instance Learning (MIL) framework to obtain a meaningful representation of object shape which, at the same time, crops away distracting background clutter to improve the appearance representation.
Abstract: Visual recognition requires to learn object models from training data. Commonly, training samples are annotated by marking only the bounding-box of objects, since this appears to be the best trade-off between labeling information and effectiveness. However, objects are typically not box-shaped. Thus, the usual parametrization of object hypotheses by only their location, scale and aspect ratio seems inappropriate since the box contains a significant amount of background clutter. Most important, however, is that object shape becomes only explicit once objects are segregated from the background. Segmentation is an ill-posed problem and so we propose an approach for learning object models for detection while, simultaneously, learning to segregate objects from clutter and extracting their overall shape. For this purpose, we exclusively use bounding-box annotated training data. The approach groups fragmented object regions using the Multiple Instance Learning (MIL) framework to obtain a meaningful representation of object shape which, at the same time, crops away distracting background clutter to improve the appearance representation.

Journal ArticleDOI
TL;DR: The DRLTracker can effectively alleviate the distraction problem, and its superior ability over several representative and state-of-the-art trackers is demonstrated through extensive experiments.
Abstract: In this paper, we propose a novel tracking algorithm, i.e., the discriminative ranking list-based tracker (DRLTracker). The DRLTracker models the target object and its local background by using ranking lists of patches of different scales within object bounding boxes. The ranking list of each of such patches is its K nearest neighbors. Patches of the same scale with ranking lists of high purity values (meaning high probabilities to be on the target object) and some confusable background patches constitute the object model under that scale. A pair of object models of two different scales collaborate to determine which patches may belong to the target object in the next frame. The DRLTracker can effectively alleviate the distraction problem, and its superior ability over several representative and state-of-the-art trackers is demonstrated through extensive experiments.

Journal ArticleDOI
TL;DR: A heavy-weighted ontology-based construction method for interoperation models to support the reuse of subsystems in various collaborative contexts and is more flexible, efficient and reliable than existing interoperation modeling methods.