scispace - formally typeset
Search or ask a question

Showing papers on "Object (computer science) published in 2012"


Journal ArticleDOI
TL;DR: A novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection, and develops a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: P-expert estimates missed detections, and N-ex Expert estimates false alarms.
Abstract: This paper investigates long-term tracking of unknown objects in a video stream. The object is defined by its location and extent in a single frame. In every frame that follows, the task is to determine the object's location and extent or indicate that the object is not present. We propose a novel tracking framework (TLD) that explicitly decomposes the long-term tracking task into tracking, learning, and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates the detector's errors and updates it to avoid these errors in the future. We study how to identify the detector's errors and learn from them. We develop a novel learning method (P-N learning) which estimates the errors by a pair of “experts”: (1) P-expert estimates missed detections, and (2) N-expert estimates false alarms. The learning process is modeled as a discrete dynamical system and the conditions under which the learning guarantees improvement are found. We describe our real-time implementation of the TLD framework and the P-N learning. We carry out an extensive quantitative evaluation which shows a significant improvement over state-of-the-art approaches.

3,137 citations


Patent
14 Sep 2012
TL;DR: In this article, the authors propose a method and system for placing graphical objects on a page to optimize the occurrence of an event associated with such objects, such as advertisements on a webpage.
Abstract: A method and system for placement of graphical objects on a page to optimize the occurrence of an event associated with such objects. The graphical objects might include, for instance, advertisements on a webpage, and the event would include a user clicking on that ad. The page includes positions for receipt of the object material. Data regarding the past performance of the objects is stored and updated as new data is received. A user requests a page from a server associated with system. The server uses the performance data to derive a prioritized arrangement of the objects on the page. The server performs a calculation regarding the likelihood that an event will occur for a given object, as displayed to a particular user. The objects are arranged according to this calculation and returned to the user on the requested page. The likelihood can also be multiplied by a weighting factor and the objects arranged according to this product.

984 citations


Book ChapterDOI
07 Oct 2012
TL;DR: This paper proposes to automatically populate ImageNet with pixelwise segmentations, by leveraging existing manual annotations in the form of class labels and bounding-boxes, and effectively exploits the hierarchical structure of ImageNet.
Abstract: ImageNet is a large-scale hierarchical database of object classes. We propose to automatically populate it with pixelwise segmentations, by leveraging existing manual annotations in the form of class labels and bounding-boxes. The key idea is to recursively exploit images segmented so far to guide the segmentation of new images. At each stage this propagation process expands into the images which are easiest to segment at that point in time, e.g. by moving to the semantically most related classes to those segmented so far. The propagation of segmentation occurs both (a) at the image level, by transferring existing segmentations to estimate the probability of a pixel to be foreground, and (b) at the class level, by jointly segmenting images of the same class and by importing the appearance models of classes that are already segmented. Through an experiment on 577 classes and 500k images we show that our technique (i) annotates a wide range of classes with accurate segmentations; (ii) effectively exploits the hierarchical structure of ImageNet; (iii) scales efficiently; (iv) outperforms a baseline GrabCut [1] initialized on the image center, as well as our recent segmentation transfer technique [2] on which this paper is based. Moreover, our method also delivers state-of-the-art results on the recent iCoseg dataset for co-segmentation.

648 citations


Journal ArticleDOI
TL;DR: A hypergraph analysis approach to address the problem of view-based 3-D object retrieval and recognition by avoiding the estimation of the distance between objects by constructing multiple hypergraphs based on their 2-D views.
Abstract: View-based 3-D object retrieval and recognition has become popular in practice, e.g., in computer aided design. It is difficult to precisely estimate the distance between two objects represented by multiple views. Thus, current view-based 3-D object retrieval and recognition methods may not perform well. In this paper, we propose a hypergraph analysis approach to address this problem by avoiding the estimation of the distance between objects. In particular, we construct multiple hypergraphs for a set of 3-D objects based on their 2-D views. In these hypergraphs, each vertex is an object, and each edge is a cluster of views. Therefore, an edge connects multiple vertices. We define the weight of each edge based on the similarities between any two views within the cluster. Retrieval and recognition are performed based on the hypergraphs. Therefore, our method can explore the higher order relationship among objects and does not use the distance between objects. We conduct experiments on the National Taiwan University 3-D model dataset and the ETH 3-D object collection. Experimental results demonstrate the effectiveness of the proposed method by comparing with the state-of-the-art methods.

573 citations


Journal ArticleDOI
TL;DR: The Weizmann Interactive Supernova data REPository (WISeREP) as discussed by the authors is an SQL-based database with an interactive web-based graphical interface for supernova data.
Abstract: We have entered an era of massive data sets in astronomy. In particular, the number of supernova (SN) discoveries and classifications has substantially increased over the years from few tens to thousands per year. It is no longer the case that observations of a few prototypical events encapsulate most spectroscopic information about SNe, motivating the development of modern tools to collect, archive, organize and distribute spectra in general, and SN spectra in particular. For this reason we have developed the Weizmann Interactive Supernova data REPository - WISeREP - an SQL-based database (DB) with an interactive web-based graphical interface. The system serves as an archive of high quality SN spectra, including both historical (legacy) data as well as data that is accumulated by ongoing modern programs. The archive provides information about objects, their spectra, and related meta-data. Utilizing interactive plots, we provide a graphical interface to visualize data, perform line identification of the major relevant species, determine object redshifts, classify SNe and measure expansion velocities. Guest users may view and download spectra or other data that have been placed in the public domain. Registered users may also view and download data that are proprietary to specific programs with which they are associated. The DB currently holds >8000 spectra, of which >5000 are public; the latter include published spectra from the Palomar Transient Factory, all of the SUSPECT archive, the Caltech-Core-Collapse Program, the CfA SN spectra archive and published spectra from the UC Berkeley SNDB repository. It offers an efficient and convenient way to archive data and share it with colleagues, and we expect that data stored in this way will be easy to access, increasing its visibility, usefulness and scientific impact.

477 citations


Journal ArticleDOI
TL;DR: An overview of the main issues in change detection is presented, followed by the motivations for using OBCD as compared to pixel-based approaches, and a conceptual overview of solutions is provided.
Abstract: Characterizations of land-cover dynamics are among the most important applications of Earth observation data, providing insights into management, policy and science. Recent progress in remote sensing and associated digital image processing offers unprecedented opportunities to detect changes in land cover more accurately over increasingly large areas, with diminishing costs and processing time. The advent of high-spatial-resolution remote-sensing imagery further provides opportunities to apply change detection with object-based image analysis (OBIA), that is, object-based change detection (OBCD). When compared with the traditional pixel-based change paradigm, OBCD has the ability to improve the identification of changes for the geographic entities found over a given landscape. In this article, we present an overview of the main issues in change detection, followed by the motivations for using OBCD as compared to pixel-based approaches. We also discuss the challenges caused by the use of objects in change ...

450 citations


Patent
30 Jan 2012
TL;DR: In this paper, a system for collecting data comprising a mobile terminal for capturing a plurality of frames of image data, the mobile terminal having a first imaging assembly and a second imaging assembly, was proposed, wherein the system for use in collecting data is operative for associating first frame information and second frame information.
Abstract: A system for collecting data comprising a mobile terminal for capturing a plurality of frames of image data, the mobile terminal having a first imaging assembly and a second imaging assembly, the first imaging assembly for capturing a first frame of image data representing a first object and the second imaging assembly for capturing a second frame of image data representing a second object, wherein the system for use in collecting data is operative for associating first frame information and second frame information, the first frame information including one or more of image data of the first frame of image data and information derived utilizing the image data of the first frame of image data, the second frame information including one or more of image data of the second frame of image data and information derived utilizing the image data of the second frame of image data.

372 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper proposes the use of color attributes as an explicit color representation for object detection and shows that this method improves over state-of-the-art techniques despite its simplicity.
Abstract: State-of-the-art object detectors typically use shape information as a low level feature representation to capture the local structure of an object. This paper shows that early fusion of shape and color, as is popular in image classification, leads to a significant drop in performance for object detection. Moreover, such approaches also yields suboptimal results for object categories with varying importance of color and shape. In this paper we propose the use of color attributes as an explicit color representation for object detection. Color attributes are compact, computationally efficient, and when combined with traditional shape features provide state-of-the-art results for object detection. Our method is tested on the PASCAL VOC 2007 and 2009 datasets and results clearly show that our method improves over state-of-the-art techniques despite its simplicity. We also introduce a new dataset consisting of cartoon character images in which color plays a pivotal role. On this dataset, our approach yields a significant gain of 14% in mean AP over conventional state-of-the-art methods.

303 citations


Patent
07 Mar 2012
TL;DR: In this article, a system and method of gathering and analyzing data from device operators aiming their image capture devices and thereby creating a line of sight to an object of interest, for example through the process of obtaining photographs, videos or other digital images of an event or geographical location, where the real-time or embedded location, compass heading, and time data from each of a plurality of image providers are communicated from the plurality of imaging devices to one or more servers for statistical analysis of the proportionate amount of providers focusing on each image target or sub-target at the event or location.
Abstract: The invention relates to a system and method of gathering and analyzing data from device operators aiming their image capture devices and thereby creating a line of sight to an object of interest, for example through the process of obtaining photographs, videos or other digital images of an event or geographical location, where the real-time or embedded location, compass heading, and time data from each of a plurality of image providers are communicated from the plurality of image-capture devices to one or more servers for statistical analysis of the proportionate amount of providers focusing on each image target or sub-target at the event or location.

288 citations


Patent
14 Dec 2012
TL;DR: In this article, a method and apparatus for interactive TV camera-based games in which position or orientation of points on a player or of an object held by a player are determined and used to control a video display is presented.
Abstract: A method and apparatus for interactive TV camera based games in which position or orientation of points on a player or of an object held by a player are determined and used to control a video display. Both single camera and stereo camera pair based embodiments are disclosed, preferably using stereo photogrammetry where multi-degree of freedom information is desired. Large video displays, preferably life-size may be used where utmost realism of the game experience is desired.

288 citations


Patent
10 Apr 2012
TL;DR: In this paper, the authors describe a system for providing realistic 3D spatial occlusion between a virtual object displayed by a head mounted, augmented reality display system and a real object visible to the user through the display.
Abstract: Technology is described for providing realistic occlusion between a virtual object displayed by a head mounted, augmented reality display system and a real object visible to the user's eyes through the display. A spatial occlusion in a user field of view of the display is typically a three dimensional occlusion determined based on a three dimensional space mapping of real and virtual objects. An occlusion interface between a real object and a virtual object can be modeled at a level of detail determined based on criteria such as distance within the field of view, display size or position with respect to a point of gaze. Technology is also described for providing three dimensional audio occlusion based on an occlusion between a real object and a virtual object in the user environment.

Journal ArticleDOI
01 Mar 2012
TL;DR: The architecture introduces the use of the Smart Object framework to encapsulate radio-frequency identification, sensor technologies, embedded object logic, object ad-hoc networking, and Internet-based information infrastructure and outperforms existing industry standards in metrics such as network throughput, delivery ratio, or routing distance.
Abstract: The Internet of Things (IoT) concept is being widely presented as the next revolution toward massively distributed information, where any real-world object can automatically participate in the Internet and thus be globally discovered and queried. Despite the consensus on the great potential of the concept and the significant progress in a number of enabling technologies, there is a general lack of an integrated vision on how to realize it. This paper examines the technologies that will be fundamental for realizing the IoT and proposes an architecture that integrates them into a single platform. The architecture introduces the use of the Smart Object framework to encapsulate radio-frequency identification (RFID), sensor technologies, embedded object logic, object ad-hoc networking, and Internet-based information infrastructure. We evaluate the architecture against a number of energy-based performance measures, and also show that it outperforms existing industry standards in metrics such as network throughput, delivery ratio, or routing distance. Finally, we demonstrate the feasibility and flexibility of the architecture by detailing an implementation using Wireless Sensor Networks and Web Services, and describe a prototype for the real-time monitoring of goods flowing through a supply chain.

Patent
05 Nov 2012
TL;DR: In this paper, the authors describe a method for detecting suspicious behavior associated with an object, instantiating an emulation environment in response to the detected suspicious behavior, processing, recording responses to, and tracing operations of the object within the emulation environment, detecting a divergence between the traced operations of an object within a virtualization environment to the traces of the operation within an emulated environment, re-instantiating the virtualisation environment, providing the recorded response from the emulated object to the object in the VM, monitoring the operation of the objects within the VM and generating a report regarding
Abstract: Systems and methods for virtualization and emulation malware enabled detection are described. In some embodiments, a method comprises intercepting an object, instantiating and processing the object in a virtualization environment, tracing operations of the object while processing within the virtualization environment, detecting suspicious behavior associated with the object, instantiating an emulation environment in response to the detected suspicious behavior, processing, recording responses to, and tracing operations of the object within the emulation environment, detecting a divergence between the traced operations of the object within the virtualization environment to the traced operations of the object within the emulation environment, re-instantiating the virtualization environment, providing the recorded response from the emulation environment to the object in the virtualization environment, monitoring the operations of the object within the re-instantiation of the virtualization environment, identifying untrusted actions from the monitored operations, and generating a report regarding the identified untrusted actions of the object.

Proceedings Article
03 Dec 2012
TL;DR: This paper proposes a novel approach that extends the well-acclaimed deformable part-based model to reason in 3D, and represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box.
Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the state-of-the-art in both 2D [1] and 3D object detection [2].

Journal ArticleDOI
TL;DR: The CCFV removes the constraint of static camera array settings for view capturing and can be applied to any view-based 3-D object database and experimental results show that the proposed scheme can achieve better performance than state-of-the-art methods.
Abstract: Recently, extensive research efforts have been dedicated to view-based methods for 3-D object retrieval due to the highly discriminative property of multiviews for 3-D object representation. However, most of state-of-the-art approaches highly depend on their own camera array settings for capturing views of 3-D objects. In order to move toward a general framework for 3-D object retrieval without the limitation of camera array restriction, a camera constraint-free view-based (CCFV) 3-D object retrieval algorithm is proposed in this paper. In this framework, each object is represented by a free set of views, which means that these views can be captured from any direction without camera constraint. For each query object, we first cluster all query views to generate the view clusters, which are then used to build the query models. For a more accurate 3-D object comparison, a positive matching model and a negative matching model are individually trained using positive and negative matched samples, respectively. The CCFV model is generated on the basis of the query Gaussian models by combining the positive matching model and the negative matching model. The CCFV removes the constraint of static camera array settings for view capturing and can be applied to any view-based 3-D object database. We conduct experiments on the National Taiwan University 3-D model database and the ETH 3-D object database. Experimental results show that the proposed scheme can achieve better performance than state-of-the-art methods.

Journal ArticleDOI
TL;DR: The system described in this article was constructed specifically for the generation of model data for object recognition, localization and manipulation tasks and it allows 2D image and 3D geometric data of everyday objects to be obtained semi-automatically.
Abstract: For the execution of object recognition, localization and manipulation tasks, most algorithms use object models. Most models are derived from, or consist of two-dimensional (2D) images and/or three-dimensional (3D) geometric data. The system described in this article was constructed specifically for the generation of such model data. It allows 2D image and 3D geometric data of everyday objects be obtained semi-automatically. The calibration provided allows 2D data to be related to 3D data. Through the use of high-quality sensors, high-accuracy data is achieved. So far over 100 objects have been digitized using this system and the data has been successfully used in several international research projects. All of the models are freely available on the web via a front-end that allows preview and filtering of the data.

Journal ArticleDOI
TL;DR: In this article, the authors used multivoxel pattern analysis to test whether activity patterns in ATLs carry information about conceptual object properties, such as where and how an object is used.
Abstract: Interaction with everyday objects requires the representation of conceptual object properties, such as where and how an object is used. What are the neural mechanisms that support this knowledge? While research on semantic dementia has provided evidence for a critical role of the anterior temporal lobes (ATLs) in object knowledge, fMRI studies using univariate analysis have primarily implicated regions outside the ATL. In the present human fMRI study we used multivoxel pattern analysis to test whether activity patterns in ATLs carry information about conceptual object properties. Participants viewed objects that differed on two dimensions: where the object is typically found (in the kitchen or the garage) and how the object is commonly used (with a rotate or a squeeze movement). Anatomical region-of-interest analyses covering the ventral visual stream revealed that information about the location and action dimensions increased from posterior to anterior ventral temporal cortex, peaking in the temporal pole. Whole-brain multivoxel searchlight analysis confirmed these results, revealing highly significant and regionally specific information about the location and action dimensions in the anterior temporal lobes bilaterally. In contrast to conceptual object properties, perceptual and low-level visual properties of the objects were reflected in activity patterns in posterior lateral occipitotemporal cortex and occipital cortex, respectively. These results provide fMRI evidence that object representations in the anterior temporal lobes are abstracted away from perceptual properties, categorizing objects in semantically meaningful groups to support conceptual object knowledge.

Proceedings ArticleDOI
13 Aug 2012
TL;DR: The key insight behind HyperDex is the concept of hyperspace hashing in which objects with multiple attributes are mapped into a multidimensional hyperspace, which leads to efficient implementations not only for retrieval by primary key, but also for partially-specified secondary attribute searches and range queries.
Abstract: Distributed key-value stores are now a standard component of high-performance web services and cloud computing applications. While key-value stores offer significant performance and scalability advantages compared to traditional databases, they achieve these properties through a restricted API that limits object retrieval---an object can only be retrieved by the (primary and only) key under which it was inserted. This paper presents HyperDex, a novel distributed key-value store that provides a unique search primitive that enables queries on secondary attributes. The key insight behind HyperDex is the concept of hyperspace hashing in which objects with multiple attributes are mapped into a multidimensional hyperspace. This mapping leads to efficient implementations not only for retrieval by primary key, but also for partially-specified secondary attribute searches and range queries. A novel chaining protocol enables the system to achieve strong consistency, maintain availability and guarantee fault tolerance. An evaluation of the full system shows that HyperDex is 12-13x faster than Cassandra and MongoDB for finding partially specified objects. Additionally, HyperDex achieves 2-4x higher throughput for get/put operations.

Patent
27 Mar 2012
TL;DR: In this paper, a system and method for capturing objects and balancing systems resources in a capture system is described, where an object is captured, metadata associated with the object is generated, and the object and metadata stored.
Abstract: A system and method for capturing objects and balancing systems resources in a capture system are described. An object is captured, metadata associated with the objected generated, and the object and metadata stored.

Patent
14 Feb 2012
TL;DR: In this article, a method of classifying a computer object as malware includes receiving at a base computer data about a computer objects from each of plural remote computers on which the object or similar objects are stored.
Abstract: In one aspect, a method of classifying a computer object as malware includes receiving at a base computer data about a computer object from each of plural remote computers on which the object or similar objects are stored. The data about the computer object received from the plural computers is compared in the base computer. The computer object is classified as malware on the basis of said comparison. In one embodiment, the data about the computer object includes one or more of : executable instructions contained within or constituted by the object; the size of the object; the name of the object; the logical storage location or path of the object on the respective remote computers; the vendor of the object; the software product and version associated with the object; and, events initiated by or involving the object when the object is created, configured or runs on the respective remote computers.

Patent
21 Aug 2012
TL;DR: In this paper, a method for locating a conductive object at a touch-sensing surface may include detecting a first resolved location for the conductive objects at the touch surface based on a first scan of the touch sensing surface and determining a second resolved location by performing a second scan of a subset of sensor elements.
Abstract: A method for locating a conductive object at a touch-sensing surface may include detecting a first resolved location for the conductive object at the touch-sensing surface based on a first scan of the touch-sensing surface, predicting a location for the conductive object, and determining a second resolved location for the conductive object by performing a second scan of a subset of sensor elements of the touch-sensing surface, wherein the subset of sensor elements is selected based on the predicted location of the conductive object.

Proceedings ArticleDOI
14 May 2012
TL;DR: A new, learned, local feature descriptor for RGB-D images, the convolutional k-means descriptor, which automatically learns feature responses in the neighborhood of detected interest points and is able to combine all available information, such as color and depth into one, concise representation.
Abstract: In this work we address the problem of feature extraction for object recognition in the context of cameras providing RGB and depth information (RGB-D data). We consider this problem in a bag of features like setting and propose a new, learned, local feature descriptor for RGB-D images, the convolutional k-means descriptor. The descriptor is based on recent results from the machine learning community. It automatically learns feature responses in the neighborhood of detected interest points and is able to combine all available information, such as color and depth into one, concise representation. To demonstrate the strength of this approach we show its applicability to different recognition problems. We evaluate the quality of the descriptor on the RGB-D Object Dataset where it is competitive with previously published results and propose an embedding into an image processing pipeline for object recognition and pose estimation.

Patent
25 Jul 2012
TL;DR: In this article, the authors divide the basic time interval in multiple durations, "class regions", for different message classes, and use different wireless bandwidth allocation algorithms for the class regions.
Abstract: Device, system and method, in a vehicle communication system, to transmit wirelessly a message comprising the position, heading and speed of a vehicle or other moving object, wherein the transmission is repeated at regular intervals in a temporarily fixed time slot within a predetermined basic time interval. In a key embodiment the message duration is equal to or less than a predetermined time slot duration. Embodiments use generally the same time slot in a contiguous sequence of basic time intervals. Algorithms are described to resolve wireless interference within a time slot. Embodiments divide the basic time interval in multiple durations, “class regions,” for different message classes. Embodiments use different wireless bandwidth allocation algorithms for the class regions.

Patent
07 Aug 2012
TL;DR: In this article, an apparatus and a method for recognizing a location of an object using radio frequency identification (RFID) was presented, which includes activating tags located around RFID tags by transmitting power to the tags in a magnetic resonance type and transmitting a call signal for identifying a mobile tag attached to an object.
Abstract: Disclosed are an apparatus and a method for recognizing a location of an object using radio frequency Identification (RFID). The method for recognizing a location of an object includes: activating tags located around radio frequency identification (RFID) tags by transmitting power to the RFID tags in a magnetic resonance type; transmitting a call signal for identifying a mobile tag attached to an object; receiving a tag identifier of the mobile tag corresponding to the call signal; receiving location information from fixed tags around the mobile tag; and recognizing the location of the mobile tag using the location information.

Patent
25 Apr 2012
TL;DR: In this paper, a method of dynamically calibrating a given camera relative to a reference camera of a vehicle is proposed, where an overlapping region in an image frame provided by the given camera and a reference image frame is identified and selected.
Abstract: A method of dynamically calibrating a given camera relative to a reference camera of a vehicle includes identifying an overlapping region in an image frame provided by the given camera and an image frame provided by the reference camera and selecting at least a portion of an object in the overlapped region of the reference image frame. Expected pixel positions of the selected object portion in the given image frame is determined based on the location of the selected object portion in the reference image frame, and pixel positions of the selected object portion are located as detected in the given image frame. An alignment of the given camera is determined based on a comparison of the pixel positions of the selected object portion in the given image frame to the expected pixel positions of the selected object portion in the given image frame.

Journal ArticleDOI
TL;DR: Pre-experimental exposure (familiarization) to objects, habituation to treatment procedures, and the use of relative discrimination measures when using the ORT are suggested to take into consideration.

Book ChapterDOI
07 Oct 2012
TL;DR: Peculiar to this approach is the inherent ability to detect significantly occluded objects without increasing the amount of false positives, so that the operating point of the object recognition algorithm can nicely move toward a higher recall without sacrificing precision.
Abstract: We propose a novel approach for verifying model hypotheses in cluttered and heavily occluded 3D scenes. Instead of verifying one hypothesis at a time, as done by most state-of-the-art 3D object recognition methods, we determine object and pose instances according to a global optimization stage based on a cost function which encompasses geometrical cues. Peculiar to our approach is the inherent ability to detect significantly occluded objects without increasing the amount of false positives, so that the operating point of the object recognition algorithm can nicely move toward a higher recall without sacrificing precision. Our approach outperforms state-of-the-art on a challenging dataset including 35 household models obtained with the Kinect sensor, as well as on the standard 3D object recognition benchmark dataset.

Proceedings ArticleDOI
24 Dec 2012
TL;DR: This work presents a framework for segmenting unknown objects in RGB-D images suitable for robotics tasks such as object search, grasping and manipulation and shows evaluation of the relations and results on a database of different test sets, demonstrating that the approach can segment objects of various shapes in cluttered table top scenes.
Abstract: We present a framework for segmenting unknown objects in RGB-D images suitable for robotics tasks such as object search, grasping and manipulation. While handling single objects on a table is solved, handling complex scenes poses considerable problems due to clutter and occlusion. After pre-segmentation of the input image based on surface normals, surface patches are estimated using a mixture of planes and NURBS (non-uniform rational B-splines) and model selection is employed to find the best representation for the given data. We then construct a graph from surface patches and relations between pairs of patches and perform graph cut to arrive at object hypotheses segmented from the scene. The energy terms for patch relations are learned from user annotated training data, where support vector machines (SVM) are trained to classify a relation as being indicative of two patches belonging to the same object. We show evaluation of the relations and results on a database of different test sets, demonstrating that the approach can segment objects of various shapes in cluttered table top scenes.

Journal ArticleDOI
TL;DR: It is demonstrated that the context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories and can be applied to scene understanding tasks that local detectors alone cannot solve.
Abstract: There has been a growing interest in exploiting contextual information in addition to local features to detect and localize multiple object categories in an image. A context model can rule out some unlikely combinations or locations of objects and guide detectors to produce a semantically coherent interpretation of a scene. However, the performance benefit of context models has been limited because most of the previous methods were tested on data sets with only a few object categories, in which most images contain one or two object categories. In this paper, we introduce a new data set with images that contain many instances of different object categories, and propose an efficient model that captures the contextual information among more than a hundred object categories using a tree structure. Our model incorporates global image features, dependencies between object categories, and outputs of local detectors into one probabilistic framework. We demonstrate that our context model improves object recognition performance and provides a coherent interpretation of a scene, which enables a reliable image querying system by multiple object categories. In addition, our model can be applied to scene understanding tasks that local detectors alone cannot solve, such as detecting objects out of context or querying for the most typical and the least typical scenes in a data set.

Journal ArticleDOI
TL;DR: This tutorial provides a selective review of research on object-based deployment of attention, focusing primarily on behavioral studies with human observers and a review of a variety of manifestations of object effects and the factors that influence object segmentation.
Abstract: This tutorial provides a selective review of research on object-based deployment of attention. It focuses primarily on behavioral studies with human observers. The tutorial is divided into five sections. It starts with an introduction to object-based attention and a description of the three commonly used experimental paradigms in object-based attention research. These are followed by a review of a variety of manifestations of object effects and the factors that influence object segmentation. The final two sections are devoted to two key issues in object-based research: the mechanisms that give rise to the object effects and the role of space in object-based selection.