scispace - formally typeset
Search or ask a question

Showing papers by "Larry Matthies published in 2017"


Proceedings ArticleDOI
01 Sep 2017
TL;DR: A key element of this work is to use a deep network to integrate contextual task cues, and defer the structured-output problem of gripper pose computation to an explicit (learned) geometric model.
Abstract: We present a task-oriented grasp model, that encodes grasps that are configurationally compatible with a given task. For instance, if the task is to pour liquid from a container, the model encodes grasps that leave the opening of the container unobstructed. The model consists of two independent agents: First, a geometric grasp model that computes, from a depth image, a distribution of 6D grasp poses for which the shape of the gripper matches the shape of the underlying surface. The model relies on a dictionary of geometric object parts annotated with workable gripper poses and preshape parameters. It is learned from experience via kinesthetic teaching. The second agent is a CNN-based semantic model that identifies grasp-suitable regions in a depth image: regions where a grasp will not impede the execution of the task. The semantic model allows us to encode relationships such as “grasp from the handle.” A key element of this work is to use a deep network to integrate contextual task cues, and defer the structured-output problem of gripper pose computation to an explicit (learned) geometric model. Jointly, these two models generate grasps that are mechanically fit, and that grip on the object in a way that enables the intended task.

51 citations


Proceedings ArticleDOI
01 May 2017
TL;DR: This work presents the first unified formulation of depth perception with stereo and polarization by extending previous energy minimization formulations to include surface orientation constraints computed from the polarization channels, and applies an existing quadratic pseudo-boolean optimization method to approximate the optimal depth map.
Abstract: 3-D perception of scenes with specular surfaces is still challenging for robotics applications in urban areas, for both active and passive range sensors; there is a need for improved solutions that work without artificial illumination over a wide range of distances. The advent of cameras with microgrid polarization filter arrays, which allow acquiring four orientations of linearly polarized images simultaneously, has potential to make the use of polarization information in 3-D perception more practical. It is well-known that polarization can provide information about the orientation of specular surfaces; however, prior work with polarization for 3-D perception has had several limitations. We present the first unified formulation of depth perception with stereo and polarization by extending previous energy minimization formulations to include surface orientation constraints computed from the polarization channels. We apply an existing quadratic pseudo-boolean optimization (QPBO) method to approximate the optimal depth map. We use synthetic and real indoor/outdoor images to demonstrate that the new method achieves better results than prior methods, with fewer assumptions and limitations.

38 citations


Proceedings ArticleDOI
01 Mar 2017
TL;DR: This paper presents an efficient Gaussian Mixture Models based depth map fusion approach, introducing an online update scheme for dense representations, and achieves better accuracy than alternative image space depthmap fusion techniques at lower computational cost.
Abstract: Sensing the 3D environment of a moving robot is essential for collision avoidance. Most 3D sensors produce dense depth maps, which are subject to imperfections due to various environmental factors. Temporal fusion of depth maps is crucial to overcome those. Temporal fusion is traditionally done in 3D space with voxel data structures, but it can be approached by temporal fusion in image space, with potential benefits in reduced memory and computational cost for applications like reactive collision avoidance for micro air vehicles. In this paper, we present an efficient Gaussian Mixture Models based depth map fusion approach, introducing an online update scheme for dense representations. The environment is modeled from an ego-centric point of view, where each pixel is represented by a mixture of Gaussian inverse-depth models. Consecutive frames are related to each other by transformations obtained from visual odometry. This approach achieves better accuracy than alternative image space depth map fusion techniques at lower computational cost.

18 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: A new Gaussian Mixture Models-based disparity image fusion algorithm, with an extension to handle independently moving objects (IMO), which improves scene models in case of moving objects, where standard temporal fusion approaches cannot detect movers and introduce errors in world models due to the common static scene assumption.
Abstract: We present a novel on-board perception system for collision avoidance by micro air vehicles (MAV). An egocentric cylindrical representation is utilized to model the world using forward-looking stereo vision. This efficient representation enables a 360° field of regard, as the vehicle moves around and disparity maps are fused temporally on the cylindrical map. For this purpose, we developed a new Gaussian Mixture Models-based disparity image fusion algorithm, with an extension to handle independently moving objects (IMO). The extension improves scene models in case of moving objects, where standard temporal fusion approaches cannot detect movers and introduce errors in world models due to the common static scene assumption. The on-board implementation of the vision pipeline provides disparity maps on a 360o egocentric cylindrical surface at 10 Hz. The perception output is used in our system by real-time motion planning with collision avoidance on the MAV.

9 citations


13 Feb 2017
TL;DR: In this paper, a Titan Aerial Daughtercraft (TAD) is proposed to perform high-resolution imaging and mapping from a lander or balloon to the surface of the giant moon Titan.
Abstract: Saturns giant moon Titan has become one of the most fascinating bodies in the Solar System. Even though it is a billion miles from Earth, data from the Cassini mission reveals that Titan has a very diverse, Earth-like surface, with mountains, fluvial channels, lakes, evaporite basins, plains, dunes, and seas [Lopes 2010] (Figure 1). But unlike Earth, Titans surface likely is composed of organic chemistry products derived from complex atmospheric photochemistry [Lorenz 2008]. In addition, Titan has an active meteorological system with observed storms and precipitation-induced surface darkening suggesting a hydrocarbon cycle analogous to Earths water cycle [Turtle 2011].Titan is the richest laboratory in the solar system for studying prebiotic chemistry, which makes studying its chemistry from the surface and in the atmosphere one of the most important objectives in planetary science [Decadal 2011]. The diversity of surface features on Titan related to organic solids and liquids makes long-range mobility with surface access important [Decadal 2011]. This has not been possible to date, because mission concepts have had either no mobility (landers), no surface access (balloons and airplanes), or low maturity, high risk, and/or high development costs for this environment (e,g. large, self-sufficient, long-duration helicopters). Enabling in situ mobility could revolutionize Titan exploration, similarly to the way rovers revolutionized Mars exploration. Recent progress on several fronts has suggested that small-scale rotorcraft deployed as daughtercraft from a lander or balloon mothercraft may be an effective, affordable approach to expanding Titan surface access. This includes rapid progress on autonomous navigation capabilities of such aircraft for terrestrial applications and on miniaturization, driven by the consumer mobile electronics market, of high performance of sensors, processors, and other avionics components needed for such aircraft. Chemical analysis, for example with a mass spectrometer, will be important to any Titan surface mission. Anticipating that it may be more practical to host chemical analysis instruments on a mothership than a daughtercraft, we defined system and mission concepts that deploy a small rotorcraft, termed a Titan Aerial Daughtercraft (TAD), from a lander or balloon to perform high-resolution imaging and mapping, potentially land to acquire microscopic images or other in situ measurements, and acquire samples to return to analytical instruments on the mothership. In principle, the ability to recharge batteries in TAD from a radioisotope or other long-lived power source on the mothership could enable multiple sorties. For a lander-based mission, a variety of landing sites is conceivable, including near lake margins, in dry lake beds, or in regions of plains, dunes, or putative cryovolanic or impact melt features. Such missions may require landing with greater precision than in previous missions (Huygens) and mission studies; this could also enhance the ability of TAD to reach interesting terrain from the landing site. Precision descent may also benefit balloon missions, with or without a daughtercraft, by increasing the probability that the balloon will drift over desired terrain early in its mission. Given these potential benefits, the overall concept studied here includes brief consideration of precision descent for landing or balloon deployment, followed by one or more sorties by a rotorcraft deployed from the mothership, with the ability to return to the mothership.

5 citations


Proceedings ArticleDOI
01 May 2017
TL;DR: This work introduces an efficient sense-and-avoid pipeline that compactly represents range measurements from multiple sensors, trajectory generation, and motion planning in a 2.5-dimensional projective data structure called an egospace representation.
Abstract: Navigation of micro air vehicles (MAVs) in unknown environments is a complex sensing and trajectory generation task, particularly at high velocities. In this work, we introduce an efficient sense-and-avoid pipeline that compactly represents range measurements from multiple sensors, trajectory generation, and motion planning in a 2.5-dimensional projective data structure called an egospace representation. Egospace coordinates generalize depth image obstacle representations and are a particularly convenient choice for configuration flat mobile robots, which are differentially flat in their configuration variables and include a number of commonly used MAV plant models. After characterizing egospace obstacle avoidance for robots with trivial dynamics and establishing limits on applicability and performance, we generalize to motion planning over full configuration flat dynamics using motion primitives expressed directly in egospace coordinates. In comparison to approaches based on world coordinates, egospace uses the natural sensor geometry to combine the benefits of a multiresolution and multi-sensor representation architecture into a single simple and efficient layer.

3 citations


Proceedings ArticleDOI
01 Aug 2017
TL;DR: This work designs a new unified descriptor, called Relation History Image (RHI), which can be extracted from all the activity types it is interested in, and formulates an optimization procedure to detect and recognize activities of different types.
Abstract: The literature in computer vision is rich of works where different types of activities – single actions, two persons interactions or ego-centric activities, to name a few – have been analyzed. However, traditional methods treat such types of activities separately, while in real settings detecting and recognizing different types of activities simultaneously is necessary. We first design a new unified descriptor, called Relation History Image (RHI), which can be extracted from all the activity types we are interested in. We then formulate an optimization procedure to detect and recognize activities of different types. We assess our approach on a new dataset recorded from a robot-centric perspective as well as on publicly available datasets, and evaluate its quality compared to multiple baselines.

2 citations