scispace - formally typeset
Search or ask a question

Showing papers on "Object model published in 2015"


Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper proposes an efficient discriminative object model which allows for an efficient implementation to enable online object tracking in real-time and exploits this knowledge to adapt the object representation beforehand so that distractors are suppressed and the risk of drifting is significantly reduced.
Abstract: In this paper, we address the problem of model-free online object tracking based on color representations. According to the findings of recent benchmark evaluations, such trackers often tend to drift towards regions which exhibit a similar appearance compared to the object of interest. To overcome this limitation, we propose an efficient discriminative object model which allows us to identify potentially distracting regions in advance. Furthermore, we exploit this knowledge to adapt the object representation beforehand so that distractors are suppressed and the risk of drifting is significantly reduced. We evaluate our approach on recent online tracking benchmark datasets demonstrating state-of-the-art results. In particular, our approach performs favorably both in terms of accuracy and robustness compared to recent tracking algorithms. Moreover, the proposed approach allows for an efficient implementation to enable online object tracking in real-time.

366 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: A new framework is presented - task-oriented modeling, learning and recognition which aims at understanding the underlying functions, physics and causality in using objects as “tools”, and any objects can be viewed as a hammer or a shovel.
Abstract: In this paper, we present a new framework - task-oriented modeling, learning and recognition which aims at understanding the underlying functions, physics and causality in using objects as “tools”. Given a task, such as, cracking a nut or painting a wall, we represent each object, e.g. a hammer or brush, in a generative spatio-temporal representation consisting of four components: i) an affordance basis to be grasped by hand; ii) a functional basis to act on a target object (the nut), iii) the imagined actions with typical motion trajectories; and iv) the underlying physical concepts, e.g. force, pressure, etc. In a learning phase, our algorithm observes only one RGB-D video, in which a rational human picks up one object (i.e. tool) among a number of candidates to accomplish the task. From this example, our algorithm learns the essential physical concepts in the task (e.g. forces in cracking nuts). In an inference phase, our algorithm is given a new set of objects (daily objects or stones), and picks the best choice available together with the inferred affordance basis, functional basis, imagined human actions (sequence of poses), and the expected physical quantity that it will produce. From this new perspective, any objects can be viewed as a hammer or a shovel, and object recognition is not merely memorizing typical appearance examples for each category but reasoning the physical mechanisms in various tasks to achieve generalization.

163 citations


Proceedings ArticleDOI
Suha Kwak1, Minsu Cho1, Ivan Laptev1, Jean Ponce1, Cordelia Schmid 
07 Dec 2015
TL;DR: In this article, the authors address the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision, and formulate the problem as a combination of two complementary processes: discovery and tracking.
Abstract: This paper addresses the problem of automatically localizing dominant objects as spatio-temporal tubes in a noisy collection of videos with minimal or even no supervision. We formulate the problem as a combination of two complementary processes: discovery and tracking. The first one establishes correspondences between prominent regions across videos, and the second one associates similar object regions within the same video. Interestingly, our algorithm also discovers the implicit topology of frames associated with instances of the same object class across different videos, a role normally left to supervisory information in the form of class labels in conventional image and video understanding methods. Indeed, as demonstrated by our experiments, our method can handle video collections featuring multiple object classes, and substantially outperforms the state of the art in colocalization, even though it tackles a broader problem with much less supervision.

126 citations


Patent
27 Apr 2015
TL;DR: In this paper, a 3D printing system comprises a coarse 3D interface to form a core and a fine 3D object shell around at least some of the 3D core.
Abstract: In at least some examples, a three-dimensional (3D) printing system comprises a coarse 3D printing interface to form a 3D object core. The 3D printing system also comprises a fine 3D printing interface to form a 3D object shell around at least some of the 3D object core. The 3D printing system also comprises a controller to receive a dataset corresponding to a 3D object model and to direct the coarse 3D printing interface to form the 3D object core based on the dataset.

115 citations


Journal ArticleDOI
TL;DR: This work gradually extends the successful deformable part model to include viewpoint information and part-level 3D geometry information, resulting in several different models with different level of expressiveness, which provide consistently better joint object localization and viewpoint estimation than the state-of-the-art multi-view and 3D object detectors on various benchmarks.
Abstract: As objects are inherently 3D, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 3D object representations have been neglected and 2D feature-based models are the predominant paradigm in object detection nowadays. While such models have achieved outstanding bounding box detection performance, they come with limited expressiveness, as they are clearly limited in their capability of reasoning about 3D shape or viewpoints. In this work, we bring the worlds of 3D and 2D object representations closer, by building an object detector which leverages the expressive power of 3D object representations while at the same time can be robustly matched to image evidence. To that end, we gradually extend the successful deformable part model [1] to include viewpoint information and part-level 3D geometry information, resulting in several different models with different level of expressiveness. We end up with a 3D object model, consisting of multiple object parts represented in 3D and a continuous appearance model. We experimentally verify that our models, while providing richer object hypotheses than the 2D object models, provide consistently better joint object localization and viewpoint estimation than the state-of-the-art multi-view and 3D object detectors on various benchmarks (KITTI [2] , 3D object classes [3] , Pascal3D+ [4] , Pascal VOC 2007 [5] , EPFL multi-view cars [6] ).

90 citations


Journal ArticleDOI
TL;DR: A feature fusion method via multi-modal graph learning for view-based 3D object retrieval via multigraph learning that demonstrates the superior performance of the proposed method, as compared to the state-of-the-art approaches.

81 citations


Patent
17 Sep 2015
TL;DR: In this paper, a method for converting 2D video to 3D video using 3D object models is presented, where the object models are obtained from 3D scanner data; planes, polygons, or surfaces may be fit to this data to generate a 3D model.
Abstract: Method for converting 2D video to 3D video using 3D object models. Embodiments of the invention obtain a 3D object model for one or more objects in a 2D video scene, such as a character. Object models may for example be derived from 3D scanner data; planes, polygons, or surfaces may be fit to this data to generate a 3D model. In each frame in which a modeled object appears, the location and orientation of the 3D model may be determined in the frame, and a depth map for the object may be generated from the model. 3D video may be generated using the depth map. Embodiments may use feature tracking to automatically determine object location and orientation. Embodiments may use rigged 3D models with degrees of freedom to model objects with parts that move relative to one another.

67 citations


Patent
12 Feb 2015
TL;DR: In this article, a system and a method for real-time monitoring and identifying defects occurring in a 3D object build via an additive manufacturing process is presented, where a plurality of functional tool heads possessing freedom of motion in arbitrary planes and approach are automatically and independently controlled based on a feedback analysis from the printing process.
Abstract: The present invention provides a system and a method for real time monitoring and identifying defects occurring in a three dimensional object build via an additive manufacturing process. Further, the present invention provides in-situ correction of such defects by a plurality of functional tool heads possessing freedom of motion in arbitrary planes and approach, where the functional tool heads are automatically and independently controlled based on a feedback analysis from the printing process, implementing analyzing techniques. Furthermore, the present invention provides a mechanism for analyzing defected data collected from detection devices and correcting tool path instructions and object model in-situ during construction of a 3D object. A build report is also generated that displays, in 3D space, the structural geometry and inherent properties of a final build object along with the features of corrected and uncorrected defects. Advantageously, the build report helps in improving 3D printing process for subsequent objects.

65 citations


Patent
27 Apr 2015
TL;DR: In this paper, a 3D printing head is used to fabricate a first portion of the 3D object by forming a plurality of successive layers of a first material, and a delivery head is also used for fabricating a second portion by dispensing onto the first part a continuous-fiber reinforced second material.
Abstract: An apparatus for forming a three-dimensional (3D) object includes a 3D printing head, for fabricating a first portion of the 3D object by forming a plurality of successive layers of a first material. The apparatus also includes a delivery head for fabricating a second portion of the 3D object by dispensing onto the first portion of the 3D object a plurality of layers of a continuous-fiber reinforced second material. Further, the delivery head includes comprising a roller for pressing the continuous-fiber rein-forced second material into place during the dispensing thereof. A controller controls the 3D printing head and the delivery head to cooperatively form the 3D object, based on a dataset corresponding to a 3D object model.

64 citations


Posted Content
TL;DR: This work presents an object tracker that is not limited to a local search window and has ability to probe efficiently the entire frame and provides improved robustness for fast moving objects as well as for ultra low-frame-rate videos.
Abstract: Most tracking-by-detection methods employ a local search window around the predicted object location in the current frame assuming the previous location is accurate, the trajectory is smooth, and the computational capacity permits a search radius that can accommodate the maximum speed yet small enough to reduce mismatches These, however, may not be valid always, in particular for fast and irregularly moving objects Here, we present an object tracker that is not limited to a local search window and has ability to probe efficiently the entire frame Our method generates a small number of "high-quality" proposals by a novel instance-specific objectness measure and evaluates them against the object model that can be adopted from an existing tracking-by-detection approach as a core tracker During the tracking process, we update the object model concentrating on hard false-positives supplied by the proposals, which help suppressing distractors caused by difficult background clutters, and learn how to re-rank proposals according to the object model Since we reduce significantly the number of hypotheses the core tracker evaluates, we can use richer object descriptors and stronger detector Our method outperforms most recent state-of-the-art trackers on popular tracking benchmarks, and provides improved robustness for fast moving objects as well as for ultra low-frame-rate videos

64 citations


Journal ArticleDOI
TL;DR: The proposed SPCNN-RBOR method overcomes the drawback of feature-based methods that inevitably includes background information into local invariant feature descriptors when keypoints locate near object boundaries and is robust for diverse complex variations, even under partial occlusion and highly cluttered environments.
Abstract: In this paper, we propose a region-based object recognition (RBOR) method to identify objects from complex real-world scenes. First, the proposed method performs color image segmentation by a simplified pulse-coupled neural network (SPCNN) for the object model image and test image, and then conducts a region-based matching between them. Hence, we name it as RBOR with SPCNN (SPCNN-RBOR). Hereinto, the values of SPCNN parameters are automatically set by our previously proposed method in terms of each object model. In order to reduce various light intensity effects and take advantage of SPCNN high resolution on low intensities for achieving optimized color segmentation, a transformation integrating normalized Red Green Blue (RGB) with opponent color spaces is introduced. A novel image segmentation strategy is suggested to group the pixels firing synchronously throughout all the transformed channels of an image. Based on the segmentation results, a series of adaptive thresholds, which is adjustable according to the specific object model is employed to remove outlier region blobs, form potential clusters, and refine the clusters in test images. The proposed SPCNN-RBOR method overcomes the drawback of feature-based methods that inevitably includes background information into local invariant feature descriptors when keypoints locate near object boundaries. A large number of experiments have proved that the proposed SPCNN-RBOR method is robust for diverse complex variations, even under partial occlusion and highly cluttered environments. In addition, the SPCNN-RBOR method works well in not only identifying textured objects, but also in less-textured ones, which significantly outperforms the current feature-based methods.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: A novel approach that can track human hands in interaction with unknown objects and is close to that of [2], although the latter assumes that the object model is known a priori.
Abstract: The analysis and the understanding of object manipulation scenarios based on computer vision techniques can be greatly facilitated if we can gain access to the full articulation of the manipulating hands and the 3D pose of the manipulated objects. Currently, there exist methods for tracking hands in interaction with objects whose 3D models are known [2]. There are also methods that can reconstruct 3D models of objects that are partially observable in each frame of a sequence [3]. However, no method can track hands in interaction with unknown objects, ie objects whose 3D model is not known a priori. In this paper we propose a novel approach that can track human hands in interaction with unknown objects. As illustrated in Fig.1, the input to the method is a sequence of RGBD frames showing the interaction of one or two hands with an unknown object. Starting with the raw depth map (left) we perform a pre-processing step and compute the scene point cloud. We employ an appropriately modified model based hand tracker [4] and temporal information to track the hand 3D positions and posture (middle bottom). In this process, a progressively built object model is also taken into account to cope with hand-object occlusions. We use the estimated fingertip positions of the hand to segment the manipulated object from the rest of the scene (middle top). The segmented object points are used to update the object position and orientation in the current frame and are integrated into the object 3D representation (right). More specifically, the work flow of the proposed approach consists of five main components linked together as shown in Fig. 2. At a first, preprocessing stage, the raw depth information from the sensor is prepared to enter the pipeline. A point cloud is computed along with the normals for each vertex. Then, the user’s hands are tracked in the scene. An articulated model for the left and right hands, with 26 degrees of freedom each, is fit to the pre-processed depth input. The current, possibly incomplete (or even empty, for the first frame) object model is incorporated to hand tracking to assist in handling hand/object occlusions. Using the computed 3D location of the user’s hands as well as the last position of the (possibly incomplete) object model, the region of the object is segmented in the input depth map. The hands are masked-out from the observation, by comparing it to the rendered hand models. Object tracking is achieved using a mutli-scale ICP [1]. The segmented object depth is used for a coarse to fine alignment with the (partially reconstructed) object model. Finally, the segmented and aligned depth data of the object with the current, partial 3D model are merged. The object’s 3D model is maintained in a voxel grid with a Truncated Signed Distance Function (TSDF) [3] representation. Experiment Proposed [2], GT model [2], Scanned model mean/median error mean/median error mean/median error Single hand, cat 0.42 / 0.39 0.47 / 0.43 0.45 / 0.43 Single hand, spray 0.65 / 0.63 0.70 / 0.53 0.63 / 0.47 Two hands, cat 0.38 / 0.34 0.33 / 0.31 0.44 / 0.39 Two hands, spray 0.59 / 0.44 0.51 / 0.38 0.62 / 0.41 Table 1: Hand tracking accuracy (in cm) measured on the synthetic datasets. The accuracy of the method is close to that of [2], although the latter assumes that the object model is known a priori.

Journal ArticleDOI
TL;DR: The proposed solution combines the efficiency of the rigid model with the benefits of a flexible object model and accurately modeling the object geometry using the polygonal lines instead of a 3-D box and separating the position and speed tracking from the geometry tracking at the estimator level.
Abstract: In this paper we present a stereovision-based approach for tracking multiple objects in crowded environments where, typically, the road lane markings are not visible and the surrounding infrastructure is not known. The proposed technique relies on measurement data provided by an intermediate occupancy grid derived from processing a stereovision-based elevation map and on free-form object delimiters extracted from this grid. Unlike other existing methods that track rigid objects using also rigid representations, we present a particle filter-based solution for tracking visual appearance-based free-form obstacle representations. At each step, the particle state is described by two components, i.e., the object's dynamic parameters and its estimated geometry. In order to solve the high-dimensionality state–space problem, a Rao–Blackwellized particle filter is used. By accurately modeling the object geometry using the polygonal lines instead of a 3-D box and, at the same time, separating the position and speed tracking from the geometry tracking at the estimator level, the proposed solution combines the efficiency of the rigid model with the benefits of a flexible object model.

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this article, the authors propose a probabilistic framework for learning category-specific object size distributions from available annotations and leverage these in conjunction with amodal completions to infer veridical sizes of objects in novel images.
Abstract: We consider the problem of enriching current object detection systems with veridical object sizes and relative depth estimates from a single image. There are several technical challenges to this, such as occlusions, lack of calibration data and the scale ambiguity between object size and distance. These have not been addressed in full generality in previous work. Here we propose to tackle these issues by building upon advances in object recognition and using recently created large-scale datasets. We first introduce the task of amodal bounding box completion, which aims to infer the the full extent of the object instances in the image. We then propose a probabilistic framework for learning category-specific object size distributions from available annotations and leverage these in conjunction with amodal completions to infer veridical sizes of objects in novel images. Finally, we introduce a focal length prediction approach that exploits scene recognition to overcome inherent scale ambiguities and demonstrate qualitative results on challenging real-world scenes.

Proceedings ArticleDOI
17 Dec 2015
TL;DR: This work presents a flexible system to reconstruct 3D models of objects captured with an RGB-D sensor that allows the user to acquire a full 3D model of the object and is directly used by state-of-the-art object instance recognition and object tracking modules.
Abstract: This work presents a flexible system to reconstruct 3D models of objects captured with an RGB-D sensor. A major advantage of the method is that unlike other modelling tools, our reconstruction pipeline allows the user to acquire a full 3D model of the object. This is achieved by acquiring several partial 3D models in different sessions—each individual session presenting the object of interest in different configurations that reveal occluded parts of the object — that are automatically merged together to reconstruct a full 3D model. In addition, the 3D models acquired by our system can be directly used by state-of-the-art object instance recognition and object tracking modules, providing object-perception capabilities to complex applications requiring these functionalities (e.g. human-object interaction analysis, robot grasping, etc.). The system does not impose constraints in the appearance of objects (textured, untextured) nor in the modelling setup (moving camera with static object or turn-table setups with static camera). The proposed reconstruction system has been used to model a large number of objects resulting in metrically accurate and visually appealing 3D models.

Posted Content
TL;DR: In this paper, the authors describe a method by which a robot can acquire an object model by capturing depth imagery of the object as a human moves it through its range of motion.
Abstract: Many functional elements of human homes and workplaces consist of rigid components which are connected through one or more sliding or rotating linkages. Examples include doors and drawers of cabinets and appliances; laptops; and swivel office chairs. A robotic mobile manipulator would benefit from the ability to acquire kinematic models of such objects from observation. This paper describes a method by which a robot can acquire an object model by capturing depth imagery of the object as a human moves it through its range of motion. We envision that in future, a machine newly introduced to an environment could be shown by its human user the articulated objects particular to that environment, inferring from these "visual demonstrations" enough information to actuate each object independently of the user. Our method employs sparse (markerless) feature tracking, motion segmentation, component pose estimation, and articulation learning; it does not require prior object models. Using the method, a robot can observe an object being exercised, infer a kinematic model incorporating rigid, prismatic and revolute joints, then use the model to predict the object's motion from a novel vantage point. We evaluate the method's performance, and compare it to that of a previously published technique, for a variety of household objects.

Journal ArticleDOI
TL;DR: This is the first work to jointly explore the view-based and model-based relevance among the 3D objects in a graph-based framework and demonstrates the effectiveness on retrieval accuracy of the proposed 3D object retrieval method.
Abstract: 3D object retrieval has attracted extensive research efforts and become an important task in recent years. It is noted that how to measure the relevance between 3D objects is still a difficult issue. Most of the existing methods employ just the model-based or view-based approaches, which may lead to incomplete information for 3D object representation. In this paper, we propose to jointly learn the view-model relevance among 3D objects for retrieval, in which the 3D objects are formulated in different graph structures. With the view information, the multiple views of 3D objects are employed to formulate the 3D object relationship in an object hypergraph structure. With the model data, the model-based features are extracted to construct an object graph to describe the relationship among the 3D objects. The learning on the two graphs is conducted to estimate the relevance among the 3D objects, in which the view/model graph weights can be also optimized in the learning process. This is the first work to jointly explore the view-based and model-based relevance among the 3D objects in a graph-based framework. The proposed method has been evaluated in three data sets. The experimental results and comparison with the state-of-the-art methods demonstrate the effectiveness on retrieval accuracy of the proposed 3D object retrieval method.

Journal ArticleDOI
TL;DR: An efficient approach capable of learning and recognizing object categories in an interactive and open-ended manner, which is able to interact with human users, learning new object categories continuously over time is presented.
Abstract: 3D object detection and recognition is increasingly used for manipulation and navigation tasks in service robots. It involves segmenting the objects present in a scene, estimating a feature descriptor for the object view and, finally, recognizing the object view by comparing it to the known object categories. This paper presents an efficient approach capable of learning and recognizing object categories in an interactive and open-ended manner. In this paper, “open-ended” implies that the set of object categories to be learned is not known in advance. The training instances are extracted from on-line experiences of a robot, and thus become gradually available over time, rather than at the beginning of the learning process. This paper focuses on two state-of-the-art questions: (1) How to automatically detect, conceptualize and recognize objects in 3D scenes in an open-ended manner? (2) How to acquire and use high-level knowledge obtained from the interaction with human users, namely when they provide category labels, in order to improve the system performance? This approach starts with a pre-processing step to remove irrelevant data and prepare a suitable point cloud for the subsequent processing. Clustering is then applied to detect object candidates, and object views are described based on a 3D shape descriptor called spin-image. Finally, a nearest-neighbor classification rule is used to predict the categories of the detected objects. A leave-one-out cross validation algorithm is used to compute precision and recall, in a classical off-line evaluation setting, for different system parameters. Also, an on-line evaluation protocol is used to assess the performance of the system in an open-ended setting. Results show that the proposed system is able to interact with human users, learning new object categories continuously over time.

Proceedings ArticleDOI
17 Dec 2015
TL;DR: The technique leverages the benefits of simple, adaptive robot grippers (which can grasp successfully without prior knowledge of the hand or the object model), with an advanced machine learning technique (Random Forests) to discriminate between different object classes.
Abstract: In this paper we present a methodology for discriminating between different objects using only a single force closure grasp with an underactuated robot hand equipped with force sensors. The technique leverages the benefits of simple, adaptive robot grippers (which can grasp successfully without prior knowledge of the hand or the object model), with an advanced machine learning technique (Random Forests). Unlike prior work in literature, the proposed methodology does not require object exploration, release or re-grasping and works for arbitrary object positions and orientations within the reach of a grasp. A two-fingered compliant, underactuated robot hand is controlled in an open-loop fashion to grasp objects with various shapes, sizes and stiffness. The Random Forests classification technique is used in order to discriminate between different object classes. The feature space used consists only of the actuator positions and the force sensor measurements at two specific time instances of the grasping process. A feature variables importance calculation procedure facilitates the identification of the most crucial features, concluding to the minimum number of sensors required. The efficiency of the proposed method is validated with two experimental paradigms involving two sets of fabricated model objects with different shapes, sizes and stiffness and a set of everyday life objects.

Posted Content
TL;DR: This paper proposes a joint solution that tackles semantic object and part segmentation simultaneously, in which higher object-level context is provided to guide part segmentsation, and more detailed part-level localization is utilized to refine object segmentation.
Abstract: Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision. In this paper, we propose a joint solution that tackles semantic object and part segmentation simultaneously, in which higher object-level context is provided to guide part segmentation, and more detailed part-level localization is utilized to refine object segmentation. Specifically, we first introduce the concept of semantic compositional parts (SCP) in which similar semantic parts are grouped and shared among different objects. A two-channel fully convolutional network (FCN) is then trained to provide the SCP and object potentials at each pixel. At the same time, a compact set of segments can also be obtained from the SCP predictions of the network. Given the potentials and the generated segments, in order to explore long-range context, we finally construct an efficient fully connected conditional random field (FCRF) to jointly predict the final object and part labels. Extensive evaluation on three different datasets shows that our approach can mutually enhance the performance of object and part segmentation, and outperforms the current state-of-the-art on both tasks.

Proceedings ArticleDOI
07 Jun 2015
TL;DR: The 3D object class detection method consists of several stages gradually enriching the object detection output with object viewpoint, keypoints and 3D shape estimates, which achieves state-of-the-art performance in simultaneous 2D bounding box and viewpoint estimation on the challenging Pascal3D+ dataset.
Abstract: Object class detection has been a synonym for 2D bounding box localization for the longest time, fueled by the success of powerful statistical learning techniques, combined with robust image representations. Only recently, there has been a growing interest in revisiting the promise of computer vision from the early days: to precisely delineate the contents of a visual scene, object by object, in 3D. In this paper, we draw from recent advances in object detection and 2D-3D object lifting in order to design an object class detector that is particularly tailored towards 3D object class detection. Our 3D object class detection method consists of several stages gradually enriching the object detection output with object viewpoint, keypoints and 3D shape estimates. Following careful design, in each stage it constantly improves the performance and achieves state-of-the-art performance in simultaneous 2D bounding box and viewpoint estimation on the challenging Pascal3D+ [50] dataset.

Proceedings ArticleDOI
17 Dec 2015
TL;DR: A system for depalletizing and a complete pipeline for detecting and localizing objects as well as verifying that the found object does not deviate from the known object model, e.g., if it is not the object to pick.
Abstract: Depalletizing is a challenging task for manipulation robots. Key to successful application are not only robustness of the approach, but also achievable cycle times in order to keep up with the rest of the process. In this paper, we propose a system for depalletizing and a complete pipeline for detecting and localizing objects as well as verifying that the found object does not deviate from the known object model, e.g., if it is not the object to pick. In order to achieve high robustness (e.g., with respect to different lighting conditions) and generality with respect to the objects to pick, our approach is based on multi-resolution surfel models. All components (both software and hardware) allow operation at high frame rates and, thus, allow for low cycle times. In experiments, we demonstrate depalletizing of automotive and other prefabricated parts with both high reliability (w.r.t. success rates) and efficiency (w.r.t. low cycle times).

Patent
16 Feb 2015
TL;DR: In this article, a system and a method for optimizing printing parameters, such as slicing parameters and tool path instructions, for additive manufacturing is presented, which includes a property analysis module that predicts and analyses properties of a filament object model, representing a constructed 3D object.
Abstract: The present invention relates to a system and a method for optimizing printing parameters, such as slicing parameters and tool path instructions, for additive manufacturing. The present invention comprises a property analysis module that predicts and analyses properties of a filament object model, representing a constructed 3D object. The filament object model is generated based on the tool path instructions and user specified object properties. Analysis includes comparing the predicted filament object model properties with the user specified property requirements; and further modifying the printing parameters in order to meet the user specified property requirements.

Journal ArticleDOI
Guofeng Wang1, Bin Wang1, Fan Zhong1, Xueying Qin1, Baoquan Chen1 
TL;DR: A new method based on global optimization for searching 3D–2D correspondence between a known 3D object model and 2D scene edges in an image is proposed, which performs favorably compared to the state-of-the-art methods in highly cluttered backgrounds.
Abstract: Textureless 3D object tracking of the object's position and orientation is a considerably challenging problem, for which a 3D model is commonly used. The 3D---2D correspondence between a known 3D object model and 2D scene edges in an image is standardly used to locate the 3D object, one of the most important problems in model-based 3D object tracking. State-of-the-art methods solve this problem by searching correspondences independently. However, this often fails in highly cluttered backgrounds, owing to the presence of numerous local minima. To overcome this problem, we propose a new method based on global optimization for searching these correspondences. With our search mechanism, a graph model based on an energy function is used to establish the relationship of the candidate correspondences. Then, the optimal correspondences can be efficiently searched with dynamic programming. Qualitative and quantitative experimental results demonstrate that the proposed method performs favorably compared to the state-of-the-art methods in highly cluttered backgrounds.

Patent
19 Feb 2015
TL;DR: In this paper, a method of interacting with a virtual object in an augmented reality space is described, which involves identifying a physical location of a device in at least one image of the AR space and generating for display a control coincident with a surface of the device.
Abstract: The technology disclosed relates to a method of interacting with a virtual object. In particular, it relates to referencing a virtual object in an augmented reality space, identifying a physical location of a device in at least one image of the augmented reality space, generating for display a control coincident with a surface of the device, sensing interactions between at least one control object and the control coincident with the surface of the device, and generating data signaling manipulations of the control coincident with the surface of the device.

Patent
16 Feb 2015
TL;DR: In this paper, an object recognition ingestion system is presented, which combines the canonical shape object along with the image data to create a model of the object, and then generates recognition descriptors from each of the model PoVs, which are combined into key frame bundles having sufficient information to allow other computing devices to recognize the object at a later time.
Abstract: An object recognition ingestion system is presented. The object ingestion system captures image data of objects, possibly in an uncontrolled setting. The image data is analyzed to determine if one or more a priori know canonical shape objects match the object represented in the image data. The canonical shape object also includes one or more reference PoVs indicating perspectives from which to analyze objects having the corresponding shape. An object ingestion engine combines the canonical shape object along with the image data to create a model of the object. The engine generates a desirable set of model PoVs from the reference PoVs, and then generates recognition descriptors from each of the model PoVs. The descriptors, image data, model PoVs, or other contextually relevant information are combined into key frame bundles having sufficient information to allow other computing devices to recognize the object at a later time.

Journal ArticleDOI
TL;DR: A stand-alone convolution surface-based modeling approach to model complex heterogeneous objects with multi-functional heterogeneities, entailing stratified sub-analytic boundary-representation, convolution material primitives, membership functions and material-potential functions is presented.
Abstract: The possibility to attain diverse applications from heterogeneous objects calls for a generic and systematic modeling approach for design, analysis and rapid manufacturing of heterogeneous objects. The available heterogeneous object modeling techniques model simple material-distributions only and just a few of them are capable of modeling heterogeneous objects with complex geometries. Even these approaches have also, at time, shown some glitches while modeling complex objects with compound and irregular material variations. This paper unfolds the development of a stand-alone convolution surface-based modeling approach to model complex heterogeneous objects with multi-functional heterogeneities, entailing stratified sub-analytic boundary-representation, convolution material primitives, membership functions and material-potential functions. One-dimensional (associative and non-associative) and compound two-and three-dimensional material-distribution schemas are formulated and outlined to model simple, compound and irregular material-distributions in simple/complex geometry objects. The paper also illustrates a few examples of modeling complex heterogeneous objects by implementing the approach using specialized languages and software tools. A material convolution surface-based approach is presented for modeling complex heterogeneous objects.Complex one-dimensional material-distributions are modeled with material primitives and field functions.Schema for compound and irregular heterogeneities in two-and three-dimensions is formulated and outlined.We report a few examples of complex heterogeneous object modeling for the validation of proposed approach.

Patent
Jianfeng Ren1, Feng Guo1, Ruiduo Yang1
18 Aug 2015
TL;DR: In this paper, a method performed by an electronic device is described, which includes obtaining a first frame of a scene and performing object recognition of at least one object within a first bounding region of the first frame.
Abstract: A method performed by an electronic device is described The method includes obtaining a first frame of a scene The method also includes performing object recognition of at least one object within a first bounding region of the first frame The method further includes performing object tracking of the at least one object within the first bounding region of the first frame The method additionally includes determining a second bounding region of a second frame based on the object tracking The second frame is subsequent to the first frame The method also includes determining whether the second bounding region is valid based on a predetermined object model

Book ChapterDOI
01 Jan 2015
TL;DR: Windows Communication Foundation is the name of the API designed specifically for the process of building distributed systems, providing a single, unified, and extendable programming object model that you can use to interact with a number of previously diverse distributed technologies.
Abstract: Windows Communication Foundation (WCF) is the name of the API designed specifically for the process of building distributed systems. Unlike other specific distributed APIs you might have used in the past (e.g., DCOM, .NET remoting, XML web services, message queuing), WCF provides a single, unified, and extendable programming object model that you can use to interact with a number of previously diverse distributed technologies.

Proceedings ArticleDOI
17 Dec 2015
TL;DR: Results show that the proposed system with concurrent learning of object categories and codebooks is capable of learning more categories, requiring less examples, and with similar accuracies, when compared to the classical Bag of Words approach using codebooks constructed offline.
Abstract: In open-ended domains, robots must continuously learn new object categories. When the training sets are created offline, it is not possible to ensure their representativeness with respect to the object categories and features the system will find when operating online. In the Bag of Words model, visual codebooks are usually constructed from training sets created offline. This might lead to non-discriminative visual words and, as a consequence, to poor recognition performance. This paper proposes a visual object recognition system which concurrently learns in an incremental and online fashion both the visual object category representations as well as the codebook words used to encode them. The codebook is defined using Gaussian Mixture Models which are updated using new object views. The approach contains similarities with the human visual object recognition system: evidence suggests that the development of recognition capabilities occurs on multiple levels and is sustained over large periods of time. Results show that the proposed system with concurrent learning of object categories and codebooks is capable of learning more categories, requiring less examples, and with similar accuracies, when compared to the classical Bag of Words approach using codebooks constructed offline.