scispace - formally typeset
Search or ask a question
Author

Manuel Martinez

Other affiliations: Carnegie Mellon University
Bio: Manuel Martinez is an academic researcher from Karlsruhe Institute of Technology. The author has contributed to research in topics: Codec & Pose. The author has an hindex of 12, co-authored 34 publications receiving 796 citations. Previous affiliations of Manuel Martinez include Carnegie Mellon University.

Papers
More filters
Journal ArticleDOI
TL;DR: MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework is presented.
Abstract: We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We address two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We achieve robust performance with Iterative Clustering Estimation (ICE), a novel algorithm that iteratively combines feature clustering with robust pose estimation. Feature clustering quickly partitions the scene and produces object hypotheses. The hypotheses are used to further refine the feature clusters, and the two steps iterate until convergence. ICE is easy to parallelize, and easily integrates single- and multi-camera object recognition and pose estimation. We also introduce a novel object hypothesis scoring function based on M-estimator theory, and a novel pose clustering algorithm that robustly handles recognition outliers. We achieve scalability and low latency with an improved feature matching algorithm for large databases, a GPU/CPU hybrid architecture that exploits parallelism at all levels, and an optimized resource scheduler. We provide extensive experimental results demonstrating state-of-the-art performance in terms of recognition, scalability, and latency in real-world robotic applications.

455 citations

Proceedings ArticleDOI
03 May 2010
TL;DR: MOPED builds on POSESEQ, a state of the art object recognition algorithm, demonstrating a massive improvement in scalability and latency without sacrificing robustness, with both algorithmic and architecture improvements.
Abstract: The latency of a perception system is crucial for a robot performing interactive tasks in dynamic human environments. We present MOPED, a fast and scalable perception system for object recognition and pose estimation. MOPED builds on POSESEQ, a state of the art object recognition algorithm, demonstrating a massive improvement in scalability and latency without sacrificing robustness. We achieve this with both algorithmic and architecture improvements, with a novel feature matching algorithm, a hybrid GPU/CPU architecture that exploits parallelism at all levels, and an optimized resource scheduler. Using the same standard hardware, we achieve up to 30x improvement on real-world scenes.

100 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This work introduces DriveAHead, a novel dataset designed to develop and evaluate head pose monitoring algorithms in real driving conditions, and presents the Head Pose Network, a deep learning model that achieves better performance than current state-of-the-art algorithms.
Abstract: Head pose monitoring is an important task for driver assistance systems, since it is a key indicator for human attention and behavior. However, current head pose datasets either lack complexity or do not adequately represent the conditions that occur while driving. Therefore, we introduce DriveAHead, a novel dataset designed to develop and evaluate head pose monitoring algorithms in real driving conditions. We provide frame-by-frame head pose labels obtained from a motion-capture system, as well as annotations about occlusions of the driver's face. To the best of our knowledge, DriveAHead is the largest publicly available driver head pose dataset, and also the only one that provides 2D and 3D data aligned at the pixel level using the Kinect v2. Existing performance metrics are based on the mean error without any consideration of the bias towards one position or another. Here, we suggest a new performance metric, named Balanced Mean Angular Error, that addresses the bias towards the forward looking position existing in driving datasets. Finally, we present the Head Pose Network, a deep learning model that achieves better performance than current state-of-the-art algorithms, and we analyze its performance when using our dataset.

58 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This work suggests a non-intrusive and cost-efficient approach to detect the sleep position based on a single depth camera which outperforms current state-of-the-art algorithms and even the contact sensor from the sleep laboratory.
Abstract: Sleep position is an important feature used to assess the quality and quantity of an individual's sleep. Furthermore, it is related to sleep disorders like sleep apnoea and snoring, and needs to be tracked in nursery homes to avoid pressure ulcers. Therefore, a gravity sensor attached to the chest is generally used to register body position during sleep studies. We suggest a non-intrusive and cost-efficient approach to detect the sleep position based on a single depth camera. Compared to alternative state-of-the-art approaches, ours require no calibration, and has been evaluated on a real setting comprising 78 patients from a sleep laboratory. We use the Bed Aligned Maps to extract a low resolution descriptor from a depth map which is aligned to the bed position, We perform classification using Convolutional Neural Networks, achieving an accuracy of 94.0%, thus outperforming current state-of-the-art algorithms and even the contact sensor from the sleep laboratory which achieves an accuracy of 91.9%.

52 citations

Proceedings Article
01 Nov 2012
TL;DR: This work presents a vision based method to estimate the respiration rate of subjects from their chest movements that is fully automated, non-invasive, robust to occlusions, and only depends on off-the-shelf hardware.
Abstract: We present a vision based method to estimate the respiration rate of subjects from their chest movements. In contrast to alternative approaches, our method is fully automated, non-invasive, robust to occlusions, and only depends on off-the-shelf hardware. We project a fixed infrared (IR) dot pattern. The dots are detected using a camera with a matching IR filter. We estimate the dots' barycenters with sub-pixel precision and we track them over a 30 seconds sliding window. We merge all trajectories using Principal Component Analysis(PCA) and use Autoregressive (AR) Spectral Analysis to estimate the respiratory rate. The system was evaluated on 9 subjects and on a range of simulated scenarios using an artificial chest.

52 citations


Cited by
More filters
Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper proposes a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise, and demonstrates through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations.
Abstract: Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone.

8,702 citations

Proceedings ArticleDOI
20 Mar 2017
TL;DR: This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator, and achieves the first successful transfer of a deep neural network trained only on simulated RGB images to the real world for the purpose of robotic control.
Abstract: Bridging the ‘reality gap’ that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to 1.5 cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.

2,079 citations

Journal ArticleDOI
TL;DR: This work presents a two-step cascaded system with two deep networks, where the top detections from the first are re-evaluated by the second, and shows that this method improves performance on an RGBD robotic grasping dataset, and can be used to successfully execute grasps on two different robotic platforms.
Abstract: We consider the problem of detecting robotic grasps in an RGB-D view of a scene containing objects. In this work, we apply a deep learning approach to solve this problem, which avoids time-consuming hand-design of features. This presents two main challenges. First, we need to evaluate a huge number of candidate grasps. In order to make detection fast and robust, we present a two-step cascaded system with two deep networks, where the top detections from the first are re-evaluated by the second. The first network has fewer features, is faster to run, and can effectively prune out unlikely candidate grasps. The second, with more features, is slower but has to run only on the top few detections. Second, we need to handle multimodal inputs effectively, for which we present a method that applies structured regularization on the weights based on multimodal group regularization. We show that our method improves performance on an RGBD robotic grasping dataset, and can be used to successfully execute grasps on two different robotic platforms.

1,144 citations

Posted Content
TL;DR: In this article, the authors use domain randomization to train a real-world object detector that is accurate to $1.5 cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures.
Abstract: Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to $1.5$cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.

966 citations

Journal ArticleDOI
TL;DR: A review of the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps and an overview of the different methodologies are provided, which draw a parallel to the classical approaches that rely on analytic formulations.
Abstract: We review the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps. We divide the approaches into three groups based on whether they synthesize grasps for known, familiar, or unknown objects. This structure allows us to identify common object representations and perceptual processes that facilitate the employed data-driven grasp synthesis technique. In the case of known objects, we concentrate on the approaches that are based on object recognition and pose estimation. In the case of familiar objects, the techniques use some form of a similarity matching to a set of previously encountered objects. Finally, for the approaches dealing with unknown objects, the core part is the extraction of specific features that are indicative of good grasps. Our survey provides an overview of the different methodologies and discusses open problems in the area of robot grasping. We also draw a parallel to the classical approaches that rely on analytic formulations.

859 citations