scispace - formally typeset
Search or ask a question
Journal ArticleDOI

New Methods for Matching 3-D Objects with Single Perspective Views

01 May 1987-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 9, Iss: 3, pp 401-412
TL;DR: A new approach which consists of a model-based interpretation of a single perspective image which is valid over a wide range of perspective images and it does not require perfect low-level image segmentation.
Abstract: In this paper we analyze the ability of a computer vision system to derive properties of the three-dimensional (3-D) physical world from viewing two-dimensional (2-D) images. We present a new approach which consists of a model-based interpretation of a single perspective image. Image linear features and linear feature sets are backprojected onto the 3-D space and geometric models are then used for selecting possible solutions. The paper treats two situations: 1) interpretation of scenes resulting from a simple geometric structure (orthogonality) in which case we seek to determine the orientation of this structure relatively to the viewer (three rotations) and 2) recognition of moderately complex objects whose shapes (geometrical and topological properties) are provided in advance. The recognition technique is limited to objects containing, among others, straight edges and planar faces. In the first case the computation can be carried out by a parallel algorithm which selects the solution that has received the largest number of votes (accumulation space). In the second case an object is uniquely assigned to a set of image features through a search strategy. As a by-product, the spatial position and orientation (six degrees of freedom) of each recognized object is determined as well. The method is valid over a wide range of perspective images and it does not require perfect low-level image segmentation. It has been successfully implemented for recognizing a class of industrial parts.

Summary (1 min read)

Jump to:  and [Summary]

Summary

  • HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not.
  • The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
  • L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: inria-00589985
https://hal.inria.fr/inria-00589985
Submitted on 16 Jun 2011
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
New Methods for Matching 3D Objects with Single
Perspective Views
Radu Horaud
To cite this version:
Radu Horaud. New Methods for Matching 3D Objects with Single Perspective Views. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers,
1987, 9 (3), pp.401–412. �10.1109/TPAMI.1987.4767922�. �inria-00589985�





Citations
More filters
Patent
29 Aug 2006
TL;DR: In this paper, a set top box for interacting with broadband media streams, with an adaptive user interface, content-based media processing and/or media metadata processing, and telecommunications integration, is presented.
Abstract: An intelligent electronic appliance preferably includes a user interface, data input and/or output port, and an intelligent processor. A preferred embodiment comprises a set top box for interacting with broadband media streams, with an adaptive user interface, content-based media processing and/or media metadata processing, and telecommunications integration. An adaptive user interface models the user, by observation, feedback, and/or explicit input, and presents a user interface and/or executes functions based on the user model. A content-based media processing system analyzes media content, for example audio and video, to understand the content, for example to generate content-descriptive metadata. A media metadata processing system operates on locally or remotely generated metadata to process the media in accordance with the metadata, which may be, for example, an electronic program guide, MPEG 7 data, and/or automatically generated format. A set top box preferably includes digital trick play effects, and incorporated digital rights management features.

2,644 citations

Patent
01 Feb 1999
TL;DR: An adaptive interface for a programmable system, for predicting a desired user function, based on user history, as well as machine internal status and context, is presented for confirmation by the user, and the predictive mechanism is updated based on this feedback as mentioned in this paper.
Abstract: An adaptive interface for a programmable system, for predicting a desired user function, based on user history, as well as machine internal status and context. The apparatus receives an input from the user and other data. A predicted input is presented for confirmation by the user, and the predictive mechanism is updated based on this feedback. Also provided is a pattern recognition system for a multimedia device, wherein a user input is matched to a video stream on a conceptual basis, allowing inexact programming of a multimedia device. The system analyzes a data stream for correspondence with a data pattern for processing and storage. The data stream is subjected to adaptive pattern recognition to extract features of interest to provide a highly compressed representation that may be efficiently processed to determine correspondence. Applications of the interface and system include a video cassette recorder (VCR), medical device, vehicle control system, audio device, environmental control system, securities trading terminal, and smart house. The system optionally includes an actuator for effecting the environment of operation, allowing closed-loop feedback operation and automated learning.

1,182 citations

Journal ArticleDOI
TL;DR: The major direct solutions to the three point perspective pose estimation problems are reviewed from a unified perspective beginning with the first solution published in 1841 by a German mathematician and continuing through the solutions published in the German and then American photogrammetry literature, and most recently in the current computer vision literature.
Abstract: In this paper, the major direct solutions to the three point perspective pose estimation problems are reviewed from a unified perspective beginning with the first solution which was published in 1841 by a German mathematician, continuing through the solutions published in the German and then American photogrammetry literature, and most recently in the current computer vision literature. The numerical stability of these three point perspective solutions are also discussed. We show that even in case where the solution is not near the geometric unstable region, considerable care must be exercised in the calculation. Depending on the order of the substitutions utilized, the relative error can change over a thousand to one. This difference is due entirely to the way the calculations are performed and not due to any geometric structural instability of any problem instance. We present an analysis method which produces a numerically stable calculation.

574 citations


Additional excerpts

  • ...…(page 337 of the issue), second column, line 4 A = b2 −mc2 should be A = b2 −m2c2 Page 7 (page 337 of the issue), second column, line 6 C = −cn2 + 2c2n cos β + b2 − c2 should be C = −c2n2 + 2c2n cos β + b2 − c2 References [1] Robert M. Haralick, Chung-Nan Lee, Karsten Ottenberg, and Michael Nölle....

    [...]

  • ...Page 7 (page 337 of the issue), Equation (16) (b2 −mc2)u2 + 2(c2(cos β − n)m− b2 cos γ)u− c2n2 + 2c2n cos β + b2 − c2 = 0 should be (b2 −m2c2)u2 + 2(c2(cos β − n)m− b2 cos γ)u− c2n2 + 2c2n cos β + b2 − c2 = 0 Page 7 (page 337 of the issue), second column, line 4 A = b2 −mc2 should be A = b2 −m2c2 Page 7 (page 337 of the issue), second column, line 6 C = −cn2 + 2c2n cos β + b2 − c2 should be C = −c2n2 + 2c2n cos β + b2 − c2 References [1] Robert M. Haralick, Chung-Nan Lee, Karsten Ottenberg, and Michael Nölle....

    [...]

01 Jan 1994
TL;DR: In this paper, the major direct solutions to the three point perspective pose estimation problems are reviewed from a unified perspective beginning with the first solution which was published in 1841 by a German mathematician, continuing through the solutions published in the German and then American photogrammetry literature, and most recently in the current computer vision literature.
Abstract: In this paper, the major direct solutions to the three point perspective pose estimation problems are reviewed from a unified perspective beginning with the first solution which was published in 1841 by a German mathematician, continuing through the solutions published in the German and then American photogrammetry literature, and most recently in the current computer vision literature. The numerical stability of these three point perspective solutions are also discussed. We show that even in case where the solution is not near the geometric unstable region, considerable care must be exercised in the calculation. Depending on the order of the substitutions utilized, the relative error can change over a thousand to one. This difference is due entirely to the way the calculations are performed and not due to any geometric structural instability of any problem instance. We present an analysis method which produces a numerically stable calculation.

546 citations

Patent
29 Aug 2006
TL;DR: In this article, an enhanced interface for facilitating human input of a desired control sequence in a programmable device by employing specialized visual feedback is presented. But this interface is not suitable for the use of a video cassette recorder.
Abstract: The need for a more readily usable interface for programmable devices is widely recognized. The present invention relates to programmable sequencing devices, or, more particularly, the remote controls for consumer electronic devices. The present invention provides an enhanced interface for facilitating human input of a desired control sequence in a programmable device by employing specialized visual feedback. The present invention also relates to a new interface and method of interfacing with a programmable device, which is usable as an interface for a programmable video cassette recorder.

494 citations

References
More filters
Journal ArticleDOI
TL;DR: The theory of edge detection explains several basic psychophysical findings, and the operation of forming oriented zero-crossing segments from the output of centre-surround ∇2G filters acting on the image forms the basis for a physiological model of simple cells.
Abstract: A theory of edge detection is presented. The analysis proceeds in two parts. (1) Intensity changes, which occur in a natural image over a wide range of scales, are detected separately at different scales. An appropriate filter for this purpose at a given scale is found to be the second derivative of a Gaussian, and it is shown that, provided some simple conditions are satisfied, these primary filters need not be orientation-dependent. Thus, intensity changes at a given scale are best detected by finding the zero values of delta 2G(x,y)*I(x,y) for image I, where G(x,y) is a two-dimensional Gaussian distribution and delta 2 is the Laplacian. The intensity changes thus discovered in each of the channels are then represented by oriented primitives called zero-crossing segments, and evidence is given that this representation is complete. (2) Intensity changes in images arise from surface discontinuities or from reflectance or illumination boundaries, and these all have the property that they are spatially. Because of this, the zero-crossing segments from the different channels are not independent, and rules are deduced for combining them into a description of the image. This description is called the raw primal sketch. The theory explains several basic psychophysical findings, and the operation of forming oriented zero-crossing segments from the output of centre-surround delta 2G filters acting on the image forms the basis for a physiological model of simple cells (see Marr & Ullman 1979).

6,893 citations

Book
01 Jan 1970

959 citations

Book
01 Jan 1981
TL;DR: Modelling, prediction, description and interpretation proceed concurrently from coarse object subpart and class interpretations of images, to fine distinctions among object subclasses and more precise three dimensional quantification of objects.
Abstract: We describe model-based vision systems in terms of four components: models, prediction of image features, description of image features, and interpretation which relates image features to models. We describe details of modelling, prediction and interpretation in an implemented model-based vision system. Both generic object classes and specific objects are represented by volume models which are independent of viewpoint. We model complex real world object classes. Variations of size, structure and spatial relations within object classes can be modelled. New spatial reasoning techniques are described which are useful both for prediction within a vision system, and for planning within a manipulation system. We introduce new approaches to prediction and interpretation based on the propagation of symbolic constraints. Predictions are two pronged. First, prediction graphs provide a coarse filter for hypothesizing matches of objects to image feature. Second, they contain instructions on how to use measurements of image features to deduce three dimensional information about tentative object interpretations. Interpretation proceeds by merging local hypothesized matches, subject to consistent derived implications about the size, structure and spatial configuration of the hypothesized objects. Prediction, description and interpretation proceed concurrently from coarse object subpart and class interpretations of images, to fine distinctions among object subclasses and more precise three dimensional quantification of objects. We distinguish our implementations from the fundamental geometric operations required by our general image understanding scheme. We suggest directions for future research for improved algorithms and representations.

785 citations

Journal ArticleDOI
TL;DR: A new fast algorithm is proposed which allows for a variable number of segments iniecewise approximation as a way of feature extraction, data compaction, and noise filtering of boundaries of regions of pictures and waveforms.
Abstract: Piecewise approximation is described as a way of feature extraction, data compaction, and noise filtering of boundaries of regions of pictures and waveforms. A new fast algorithm is proposed which allows for a variable number of segments. After an arbitrary initial choice, segments are split or merged in order to drive the error norm under a prespecified bound. Results of computer experiments with cell outlines and electrocardiograms are reported.

589 citations

Journal ArticleDOI
TL;DR: In this paper, the authors show that inconsistent hypotheses about pairings between sensed points and object surfaces can be discarded efficiently by using local constraints on distances between faces, angles between face normals, and angles (relative to the surface normals) of vectors between the sensed points.
Abstract: This paper discusses how local measurements of three-dimensional positions and surface normals (recorded by a set of tactile sensors, or by three-dimensional range sensors), may be used to identify and locate objects from among a set of known objects. The objects are modeled as polyhedra having up to six degrees of freedom relative to the sensors. We show that inconsistent hypotheses about pairings between sensed points and object surfaces can be discarded efficiently by using local constraints on distances between faces, angles between face normals, and angles (relative to the surface normals) of vectors between sensed points. We show by simulation and by mathematical bounds that the number of hypotheses consistent with these constraints is small We also show how to recover the position and orientation of the object from the sensory data. The algorithm's performance on data obtained from a triangulation range sensor is illustrated.

570 citations

Frequently Asked Questions (1)
Q1. What have the authors contributed in "New methods for matching 3d objects with single perspective views" ?

HAL this paper is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not.