scispace - formally typeset
Search or ask a question
Author

Daniel DeMenthon

Bio: Daniel DeMenthon is an academic researcher from Johns Hopkins University Applied Physics Laboratory. The author has contributed to research in topics: Pose & 3D pose estimation. The author has an hindex of 33, co-authored 86 publications receiving 5387 citations. Previous affiliations of Daniel DeMenthon include Johns Hopkins University & University of Maryland, College Park.


Papers
More filters
Journal ArticleDOI
TL;DR: Compared to classic approaches making use of Newton's method, POSIT does not require starting from an initial guess, and computes the pose using an order of magnitude fewer floating point operations; it may therefore be a useful alternative for real-time operation.
Abstract: In this paper, we describe a method for finding the pose of an object from a single image. We assume that we can detect and match in the image four or more noncoplanar feature points of the object, and that we know their relative geometry on the object. The method combines two algorithms; the first algorithm,POS (Pose from Orthography and Scaling) approximates the perspective projection with a scaled orthographic projection and finds the rotation matrix and the translation vector of the object by solving a linear system; the second algorithm,POSIT (POS with ITerations), uses in its iteration loop the approximate pose found by POS in order to compute better scaled orthographic projections of the feature points, then applies POS to these projections instead of the original image projections. POSIT converges to accurate pose measurements in a few iterations. POSIT can be used with many feature points at once for added insensitivity to measurement errors and image noise. Compared to classic approaches making use of Newton's method, POSIT does not require starting from an initial guess, and computes the pose using an order of magnitude fewer floating point operations; it may therefore be a useful alternative for real-time operation. When speed is not an issue, POSIT can be written in 25 lines or less in Mathematica; the code is provided in an Appendix.

1,195 citations

Proceedings ArticleDOI
01 Sep 1998
TL;DR: A simple video player is described that displays the keyframes sequentially and lets the user change the summarization level on the fly with a slider, and an approach to automatically selecting a summarizationlevel that provides a concise and representative set of keyframes is described.
Abstract: : A video sequence can be represented as a trajectory curve in a high dimensional feature space. This video curve can be analyzed by tools similar to those developed for planar curves. In particular, the classic binary curve splitting algorithm has been found to be a useful tool for video analysis. With a splitting condition that checks the dimensionality of the curve segment being split, the video curve can be recursively simplified and represented as a tree structure, and the frames that are found to be junctions between curve segments at different levels of the tree can be used as keyframes to summarize the video sequences at different levels of detail. These keyframes can be combined in various spatial and temporal configurations for browsing purposes. We describe a simple video player that displays the keyframes sequentially and lets the user change the summarization level on the fly with a slider. We also describe an approach to automatically selecting a summarization level that provides a concise and representative set of keyframes.

318 citations

Patent
19 Aug 1991
TL;DR: In this paper, a sensing system for monitoring the position and orientation of a rigid object is presented, where at least four point light sources are mounted on the surface of the object in a non-coplanar arrangement.
Abstract: A sensing system for monitoring the position and orientation of a rigid object (20). At least 4 point light sources (24) are mounted on the surface of the object (20) in a noncoplanar arrangement. A single electronic camera (26) captures images (59) of the point light sources (24). Locations of the images (59) of the light sources (24) are detected in each video image, and a computer runs a task using these locations to obtain close approximations of the rotation matrix and translation vector (33) of the object (20) in a camera coordinate system (74) at video rate. The object is held by an operator (90) for three-dimensional cursor (94) control and interaction with virtual reality scenes (96) on computer displays (88), and for remote interactive control of teleoperated mechanisms.

291 citations

Journal ArticleDOI
TL;DR: A new algorithm, called SoftPOSIT, for determining the pose of a 3D object from a single 2D image when correspondences between object points and image points are not known, which has an asymptotic run-time complexity that is better than previous methods by a factor of the number of image points.
Abstract: The problem of pose estimation arises in many areas of computer vision, including object recognition, object tracking, site inspection and updating, and autonomous navigation when scene models are available. We present a new algorithm, called SoftPOSIT, for determining the pose of a 3D object from a single 2D image when correspondences between object points and image points are not known. The algorithm combines the iterative softassign algorithm (Gold and Rangarajan, 1996; Gold et al., 1998) for computing correspondences and the iterative POSIT algorithm (DeMenthon and Davis, 1995) for computing object pose under a full-perspective camera model. Our algorithm, unlike most previous algorithms for pose determination, does not have to hypothesize small sets of matches and then verify the remaining image points. Instead, all possible matches are treated identically throughout the search for an optimal pose. The performance of the algorithm is extensively evaluated in Monte Carlo simulations on synthetic data under a variety of levels of clutter, occlusion, and image noise. These tests show that the algorithm performs well in a variety of difficult scenarios, and empirical evidence suggests that the algorithm has an asymptotic run-time complexity that is better than previous methods by a factor of the number of image points. The algorithm is being applied to a number of practical autonomous vehicle navigation problems including the registration of 3D architectural models of a city to images, and the docking of small robots onto larger robots.

253 citations

Journal ArticleDOI
TL;DR: This method iteratively refines up to two different pose estimates, and provides an associated quality measure for each pose, when the camera distance is large compared with the object depth, or when the accuracy of feature point extraction is low because of image noise.

249 citations


Cited by
More filters
Book
30 Sep 2010
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.

4,146 citations

Journal ArticleDOI
01 Oct 1996
TL;DR: This article provides a tutorial introduction to visual servo control of robotic manipulators by reviewing the prerequisite topics from robotics and computer vision, including a brief review of coordinate transformations, velocity representation, and a description of the geometric aspects of the image formation process.
Abstract: This article provides a tutorial introduction to visual servo control of robotic manipulators. Since the topic spans many disciplines our goal is limited to providing a basic conceptual framework. We begin by reviewing the prerequisite topics from robotics and computer vision, including a brief review of coordinate transformations, velocity representation, and a description of the geometric aspects of the image formation process. We then present a taxonomy of visual servo control systems. The two major classes of systems, position-based and image-based systems, are then discussed in detail. Since any visual servo system must be capable of tracking image features in a sequence of images, we also include an overview of feature-based and correlation-based methods for tracking. We conclude the tutorial with a number of observations on the current directions of the research field of visual servo control.

3,619 citations

Journal ArticleDOI
TL;DR: This survey reviews recent trends in video-based human capture and analysis, as well as discussing open problems for future research to achieve automatic visual analysis of human movement.

2,738 citations

Journal ArticleDOI
TL;DR: A non-iterative solution to the PnP problem—the estimation of the pose of a calibrated camera from n 3D-to-2D point correspondences—whose computational complexity grows linearly with n, which can be done in O(n) time by expressing these coordinates as weighted sum of the eigenvectors of a 12×12 matrix.
Abstract: We propose a non-iterative solution to the PnP problem--the estimation of the pose of a calibrated camera from n 3D-to-2D point correspondences--whose computational complexity grows linearly with n This is in contrast to state-of-the-art methods that are O(n 5) or even O(n 8), without being more accurate Our method is applicable for all n?4 and handles properly both planar and non-planar configurations Our central idea is to express the n 3D points as a weighted sum of four virtual control points The problem then reduces to estimating the coordinates of these control points in the camera referential, which can be done in O(n) time by expressing these coordinates as weighted sum of the eigenvectors of a 12×12 matrix and solving a small constant number of quadratic equations to pick the right weights Furthermore, if maximal precision is required, the output of the closed-form solution can be used to initialize a Gauss-Newton scheme, which improves accuracy with negligible amount of additional time The advantages of our method are demonstrated by thorough testing on both synthetic and real-data

2,598 citations

Patent
09 May 2008
TL;DR: In this article, the authors described a system for processing touch inputs with respect to a multipoint sensing device and identifying at least one multipoint gesture based on the data from the multi-point sensing device.
Abstract: Methods and systems for processing touch inputs are disclosed. The invention in one respect includes reading data from a multipoint sensing device such as a multipoint touch screen where the data pertains to touch input with respect to the multipoint sensing device, and identifying at least one multipoint gesture based on the data from the multipoint sensing device.

2,584 citations