TL;DR: A novel calibration method is presented that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system and experimental results for perspective as well as ion-perspective imaging systems are included.
Abstract: Linear perspective projection has served as the dominant imaging model in computer vision. Recent developments in image sensing make the perspective model highly restrictive. This paper presents a general imaging model that can be used to represent an arbitrary imaging system. It is observed that all imaging systems perform a mapping from incoming scene rays to photo-sensitive elements on the image detector. This mapping can be conveniently described using a set of virtual sensing elements called raxels. Raxels include geometric, radiometric and optical properties. We present a novel calibration method that uses structured light patterns to extract the raxel parameters of an arbitrary imaging system. Experimental results for perspective as well as ion-perspective imaging systems are included.
After describing the general imaging model and its properties, the authors present a simple method for finding the parameters of the model for any arbitrary imaging system:.
It is important to note that, given the non-perspective nature of a general device, conventional calibration methods based on known scene points [251 or self-calibration techniques that use unknown scene points [51, [lo] , [151, cannot be directly applied.
Since the authors are interested in the mapping from rays to image points, they need a ruy-bused calibration method.
The authors describe a simple and effective ray-based approach that uses structured light patterns.
This method allows a user to obtain the geometric, radiometric, and optical parameters of an arbitrarily complex imaging system in a matter of minutes.
2 General Imaging Model: Geometry
If the imaging system is perspective, all the incoming light rays are projected directly onto to the detector plane through a single point, namely, the effective pinhole of the perspective system.
The goal of this section is to present a geometrical model that can represent such imaging systems.
2.1 Raxels
Each raxel includes a pixel that measures light energy and imaging optics (a lens) that collects the bundle of rays around an incoming ray.
The authors will focus on the geometric properties (locations and orientations) of raxels.
Each raxel can posses its own radiometric (brightness and wavelength) response as well as optical (point spread) properties.
These non-geometric properties will be discussed in subsequent sections.
Figure 3: (a)
It may be placed along the line of a principle ray of light entering the imaging system.
In addition to location and orientation, a raxel may have radiometric and optical parameters.
The notation for a raxel used in this paper.
t f
Multiple raxels may be located at the same point (p1 = p~ = p3), but have different directions.
The choice of intersecting the incoming rays with a refer-3Many of the arrays of photo-sensitive elements in the imaging devices described in section 1 are one or two-dimensional.
41ntensities usually do not change much along a ray (particularly when the medium is air) provided the the displacement is small with respect to the total length of the ray.
In 191 and [141, it was suggested that the plenoptic function could also be restricted to a plane.
The important thing is to choose some reference surface so that each incoming ray intersects this surface at only one point.
3 Caustics
'For example, when light refracts through shallow water of a pool, bnght curves can be seen where the caustics intersect the bottom Figure 5 :.
The caustic is a good candidate for the ray surface of an imaging system as it is closely related to the geometry of the incoming rays; the incoming ray directions are tangent to the caustic.
4.1 Local Focal Length and Point Spread
An arbitrary imaging system cannot be expected to have a single global focal length.
Each raxel may be modeled to have its own focal length.
The authors can compute each raxel's focal length by measuring its point spread function for several depths.
A flexible approach models the point spread as an elliptical Gaussian.
4.3 Complete Imaging Model
In the case of perspective projection, the essential [51 or fundamental [ 101 matrix provides the relationship between points in one image and lines in another image (of the same scene).
In the general imaging model, this correspondence need no longer be projective.
5 Finding the Model Parameters
The major axis makes an angle 1c, with the x-axis in the image.
Each raxel has two focal lengths, f a , hThe angle $ is only defined if the major and minor axis have different.
In section the authors described how to compute The authors model for a known optical system.
In contrast, their goal in this section lengths.
Figure 7: (a)
If these positions are known, the direction of the ray qf may be determined for each pixel.
Now, the authors construct a calibration environment where the geometric and radiometric parameters can be efficiently estimated.
If a display has N locations, the authors can make each point distinct in logN images using simple grey coding or bit coding.
The authors may then compute the fall-off function across all the points.
The authors compute both the radiometric response function and the falloff from seventeen uniform brightness levels.
5.1 Experimental Apparatus
The laptop was oriented so as to give the maximum screen resolution along the axis of symmetry.
Figure 8 (b) shows a sample binary pattern as seen from the parabolic catadioptric system.
The perspective imaging system, consisting of just the camera itself, can be seen in Figure 1 l(a).
TL;DR: Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images and takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene.
Abstract: Humans perceive the three-dimensional structure of the world with apparent ease. However, despite all of the recent advances in computer vision research, the dream of having a computer interpret an image at the same level as a two-year old remains elusive. Why is computer vision such a challenging problem and what is the current state of the art? Computer Vision: Algorithms and Applications explores the variety of techniques commonly used to analyze and interpret images. It also describes challenging real-world applications where vision is being successfully used, both for specialized applications such as medical imaging, and for fun, consumer-level tasks such as image editing and stitching, which students can apply to their own personal photos and videos. More than just a source of recipes, this exceptionally authoritative and comprehensive textbook/reference also takes a scientific approach to basic vision problems, formulating physical models of the imaging process before inverting them to produce descriptions of a scene. These problems are also analyzed using statistical models and solved using rigorous engineering techniques Topics and features: structured to support active curricula and project-oriented courses, with tips in the Introduction for using the book in a variety of customized courses; presents exercises at the end of each chapter with a heavy emphasis on testing algorithms and containing numerous suggestions for small mid-term projects; provides additional material and more detailed mathematical topics in the Appendices, which cover linear algebra, numerical techniques, and Bayesian estimation theory; suggests additional reading at the end of each chapter, including the latest research in each sub-field, in addition to a full Bibliography at the end of the book; supplies supplementary course material for students at the associated website, http://szeliski.org/Book/. Suitable for an upper-level undergraduate or graduate-level course in computer science or engineering, this textbook focuses on basic techniques that work under real-world conditions and encourages students to push their creative boundaries. Its design and exposition also make it eminently suitable as a unique reference to the fundamental techniques and current research literature in computer vision.
TL;DR: A scheme is developed for classifying the types of motion perceived by a humanlike robot and equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented.
Abstract: A scheme is developed for classifying the types of motion perceived by a humanlike robot. It is assumed that the robot receives visual images of the scene using a perspective system model. Equations, theorems, concepts, clues, etc., relating the objects, their positions, and their motion to their images on the focal plane are presented. >
TL;DR: In this article, the basic motion models underlying alignment and stitching algorithms are described, and effective direct (pixel-based) and feature-based alignment algorithms, and blending algorithms used to produce seamless mosaics.
Abstract: This tutorial reviews image alignment and image stitching algorithms. Image alignment algorithms can discover the correspondence relationships among images with varying degrees of overlap. They are ideally suited for applications such as video stabilization, summarization, and the creation of panoramic mosaics. Image stitching algorithms take the alignment estimates produced by such registration algorithms and blend the images in a seamless manner, taking care to deal with potential problems such as blurring or ghosting caused by parallax and scene movement as well as varying image exposures. This tutorial reviews the basic motion models underlying alignment and stitching algorithms, describes effective direct (pixel-based) and feature-based alignment algorithms, and describes blending algorithms used to produce seamless mosaics. It ends with a discussion of open research problems in the area.
TL;DR: MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework is presented.
Abstract: We present MOPED, a framework for Multiple Object Pose Estimation and Detection that seamlessly integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We address two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We achieve robust performance with Iterative Clustering Estimation (ICE), a novel algorithm that iteratively combines feature clustering with robust pose estimation. Feature clustering quickly partitions the scene and produces object hypotheses. The hypotheses are used to further refine the feature clusters, and the two steps iterate until convergence. ICE is easy to parallelize, and easily integrates single- and multi-camera object recognition and pose estimation. We also introduce a novel object hypothesis scoring function based on M-estimator theory, and a novel pose clustering algorithm that robustly handles recognition outliers. We achieve scalability and low latency with an improved feature matching algorithm for large databases, a GPU/CPU hybrid architecture that exploits parallelism at all levels, and an optimized resource scheduler. We provide extensive experimental results demonstrating state-of-the-art performance in terms of recognition, scalability, and latency in real-world robotic applications.
455 citations
Cites methods from "A general imaging model and a metho..."
...The techniques we consider are the generalized camera (Grossberg and Nayar 2001) and the pose averaging (Viksten et al....
TL;DR: The discrete structure from motion equations for generalized cameras is derived, and the corollaries to epipolar geometry are illustrated, which gives constraints on the optimal design of panoramic imaging systems constructed from multiple cameras.
Abstract: We illustrate how to consider a network of cameras as a single generalized camera in a framework proposed by Nayar (2001). We derive the discrete structure from motion equations for generalized cameras, and illustrate the corollaries to epipolar geometry. This formal mechanism allows one to use a network of cameras as if they were a single imaging device, even when they do not share a common center of projection. Furthermore, an analysis of structure from motion algorithms for this imaging model gives constraints on the optimal design of panoramic imaging systems constructed from multiple cameras.
323 citations
Cites background from "A general imaging model and a metho..."
...Many natural camera systems, including catadioptric systems made with conical mirrors and incorrectly aligned lenses in standard cameras, have a set of viewpoints well characterized by a caustic [25]....
[...]
...The main contribution of this paper is to express a multi-camera system in this framework and then to derive the structure from motion constraint equations for this model....
TL;DR: In this paper, the authors discuss various topics about optics, such as geometrical theories, image forming instruments, and optics of metals and crystals, including interference, interferometers, and diffraction.
Abstract: The book is comprised of 15 chapters that discuss various topics about optics, such as geometrical theories, image forming instruments, and optics of metals and crystals. The text covers the elements of the theories of interference, interferometers, and diffraction. The book tackles several behaviors of light, including its diffraction when exposed to ultrasonic waves.
TL;DR: In this paper, a two-stage technique for 3D camera calibration using TV cameras and lenses is described, aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters.
Abstract: A new technique for three-dimensional (3D) camera calibration for machine vision metrology using off-the-shelf TV cameras and lenses is described. The two-stage technique is aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters. The two-stage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art. A critical review of the state of the art is given in the beginning. A theoretical framework is established, supported by comprehensive proof in five appendixes, and may pave the way for future research on 3D robotics vision. Test results using real data are described. Both accuracy and speed are reported. The experimental results are analyzed and compared with theoretical prediction. Recent effort indicates that with slight modification, the two-stage calibration can be done in real time.
TL;DR: A new technique for three-dimensional camera calibration for machine vision metrology using off-the-shelf TV cameras and lenses using two-stage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art.
Abstract: A new technique for three-dimensional (3D) camera calibration for machine vision metrology using off-the-shelf TV cameras and lenses is described. The two-stage technique is aimed at efficient computation of camera external position and orientation relative to object reference coordinate system as well as the effective focal length, radial lens distortion, and image scanning parameters. The two-stage technique has advantage in terms of accuracy, speed, and versatility over existing state of the art. A critical review of the state of the art is given in the beginning. A theoretical framework is established, supported by comprehensive proof in five appendixes, and may pave the way for future research on 3D robotics vision. Test results using real data are described. Both accuracy and speed are reported. The experimental results are analyzed and compared with theoretical prediction. Recent effort indicates that with slight modification, the two-stage calibration can be done in real time.
TL;DR: This paper describes a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views, and describes a compression system that is able to compress the light fields generated by more than a factor of 100:1 with very little loss of fidelity.
Abstract: A number of techniques have been proposed for flying through scenes by redisplaying previously rendered or digitized views. Techniques have also been proposed for interpolating between views by warping input images, using depth information or correspondences between multiple images. In this paper, we describe a simple and robust method for generating new views from arbitrary camera positions without depth information or feature matching, simply by combining and resampling the available images. The key to this technique lies in interpreting the input images as 2D slices of a 4D function the light field. This function completely characterizes the flow of light through unobstructed space in a static scene with fixed illumination. We describe a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views. We hav e created light fields from large arrays of both rendered and digitized images. The latter are acquired using a video camera mounted on a computer-controlled gantry. Once a light field has been created, new views may be constructed in real time by extracting slices in appropriate directions. Since the success of the method depends on having a high sample rate, we describe a compression system that is able to compress the light fields we have generated by more than a factor of 100:1 with very little loss of fidelity. We also address the issues of antialiasing during creation, and resampling during slice extraction. CR Categories: I.3.2 [Computer Graphics]: Picture/Image Generation — Digitizing and scanning, Viewing algorithms; I.4.2 [Computer Graphics]: Compression — Approximate methods Additional keywords: image-based rendering, light field, holographic stereogram, vector quantization, epipolar analysis
TL;DR: Robot Vision as discussed by the authors is a broad overview of the field of computer vision, using a consistent notation based on a detailed understanding of the image formation process, which can provide a useful and current reference for professionals working in the fields of machine vision, image processing, and pattern recognition.
Abstract: From the Publisher:
This book presents a coherent approach to the fast-moving field of computer vision, using a consistent notation based on a detailed understanding of the image formation process. It covers even the most recent research and will provide a useful and current reference for professionals working in the fields of machine vision, image processing, and pattern recognition.
An outgrowth of the author's course at MIT, Robot Vision presents a solid framework for understanding existing work and planning future research. Its coverage includes a great deal of material that is important to engineers applying machine vision methods in the real world. The chapters on binary image processing, for example, help explain and suggest how to improve the many commercial devices now available. And the material on photometric stereo and the extended Gaussian image points the way to what may be the next thrust in commercialization of the results in this area.
Chapters in the first part of the book emphasize the development of simple symbolic descriptions from images, while the remaining chapters deal with methods that exploit these descriptions. The final chapter offers a detailed description of how to integrate a vision system into an overall robotics system, in this case one designed to pick parts out of a bin.
The many exercises complement and extend the material in the text, and an extensive bibliography will serve as a useful guide to current research.
Errata (164k PDF)