scispace - formally typeset
Search or ask a question

Showing papers on "Orientation (computer vision) published in 1989"


Journal ArticleDOI
TL;DR: In this paper, an edge operator based on two-dimensional spatial moments is presented, which can be implemented for virtually any size of window and has been shown to locate edges in digitized images to a twentieth of a pixel.
Abstract: Recent results in precision measurements using computer vision are presented. An edge operator based on two-dimensional spatial moments is given. The operator can be implemented for virtually any size of window and has been shown to locate edges in digitized images to a twentieth of a pixel. This accuracy is unaffected by additive or multiplicative changes to the data values. The precision is achieved by correcting for many of the deterministic errors caused by nonideal edge profiles using a lookup table to correct the original estimates of edge orientation and location. This table is generated using a synthesized edge which is located at various subpixel locations and various orientations. The operator is extended to accommodate nonideal edge profiles and rectangularly sampled pixels. The technique is applied to the measurement of imaged machined metal parts. Theoretical and experimental noise analyses show that the operator has relatively small bias in the presence of noise. >

311 citations


Journal ArticleDOI
01 Oct 1989
TL;DR: In this article, a vision module is used to guide an eye-in-hand robot through general servoing and tracking problems using off-the-shelf image processing equipment.
Abstract: The authors present a vision module which is able to guide an eye-in-hand robot through general servoing and tracking problems using off-the-shelf image-processing equipment. The vision module uses the location of binary image features from a camera on the robot's end-effector to control the position and one degree of orientation of the robot manipulator. A unique feature-based trajectory generator provides smooth motion between the actual image features and the desired image features even with asynchronous and discontinuous vision updates. By performing the trajectory generation in image feature space, image-processing constraints such as the feature extraction time can be accounted for when determining the appropriate segmentation and acceleration times of the trajectory. Experimental results of a PUMA robot tracking objects with vision feedback are discussed. >

306 citations


Journal ArticleDOI
TL;DR: The contribution of this work is to quantify and justify the functional relationships between image features and filter parameters so that the design process can be easily modified for different conditions of noise and scale.

284 citations


Journal ArticleDOI
W.T. Miller1
01 Jul 1989
TL;DR: A practical neural network-based learning control system is described that is applicable to complex robotic systems involving multiple feedback sensors and multiple command variables.
Abstract: A practical neural network-based learning control system is described that is applicable to complex robotic systems involving multiple feedback sensors and multiple command variables. In the controller, one network is used to learn to reproduce the nonlinear relationship between the sensor outputs and the system command variables over particular regions of the system state space. The learned information is used to predict the command signals required to produce desired changes in the sensor outputs. A second network is used to learn to reproduce the nonlinear relationship between the system command variables and the changes in the video sensor outputs. The learned information from this network is then used to predict the next set of video parameters, effectively compensating for the image processing delays. The results of learning experiments using a General Electric P-5 manipulator are presented. These experiments involved control of the position and orientation of an object in the field of view of a video camera mounted on the end of the robot arm, using moving objects with arbitrary orientation relative to the robot. No a priori knowledge of the robot kinematics or of the object speed of orientation relative to the robot was assumed. Image parameter uncertainty and control system tracking error in the video image were found to converge to low values within a few trials. >

262 citations


Journal ArticleDOI
TL;DR: The authors describe a hybrid approach to the problem of image segmentation in range data analysis, where hybrid refers to a combination of both region- and edge-based considerations.
Abstract: The authors describe a hybrid approach to the problem of image segmentation in range data analysis, where hybrid refers to a combination of both region- and edge-based considerations. The range image of 3-D objects is divided into surface primitives which are homogeneous in their intrinsic differential geometric properties and do not contain discontinuities in either depth of surface orientation. The method is based on the computation of partial derivatives, obtained by a selective local biquadratic surface fit. Then, by computing the Gaussian and mean curvatures, an initial region-gased segmentation is obtained in the form of a curvature sign map. Two additional initial edge-based segmentations are also computed from the partial derivatives and depth values, namely, jump and roof-edge maps. The three image maps are then combined to produce the final segmentation. Experimental results obtained for both synthetic and real range data of polyhedral and curved objects are given. >

257 citations


Proceedings ArticleDOI
14 May 1989
TL;DR: The authors are prototyping a legged vehicle, the Ambler, for an exploratory mission on another planet, conceivably Mars, where it is to traverse uncharted areas and collect material samples and present an algorithm for constructing an elevation map from a single range image.
Abstract: The authors are prototyping a legged vehicle, the Ambler, for an exploratory mission on another planet, conceivably Mars, where it is to traverse uncharted areas and collect material samples. They describe how the rover can construct from range imagery a geometric terrain representation, i.e., elevation map that includes uncertainty, unknown areas, and local features. First, they present an algorithm for constructing an elevation map from a single range image. By virtue of working in spherical-polar space, the algorithm is independent of the desired map resolution and the orientation of the sensor, unlike algorithms that work in Cartesian space. Secondly, the authors present a two-stage matching technique (feature matching followed by iconic matching) that identifies the transformation T corresponding to the vehicle displacement between two viewing positions. Thirdly, to support legged locomotion over rough terrain, they describe methods for evaluating regions of the constructed elevation maps as footholds. >

231 citations


Journal ArticleDOI
TL;DR: The purpose of the frameless stereotaxic operating microscope is to display computed tomography or other image data in the operating microscope in the correct scale, orientation, and position without the use of a stereOTaxic frame.
Abstract: The purpose of the frameless stereotaxic operating microscope is to display computed tomography (CT) or other image data in the operating microscope in the correct scale, orientation, and position without the use of a stereotaxic frame. A nonimaging ultrasonic rangefinder allows the position of the operating microscope and the position of the patient to be determined. Discrete fiducial points on the patient's external anatomy are located in both image space and operating room space, linking the image data and the operating room. Physician-selected image information, e.g. tumor contours or guidance to predetermined targets, is projected through the optics of the operating microscope using a miniature cathode ray tube and a beam splitter. Projected images superpose the surgical field, reconstructed from image data to match the focal plane of the operating microscope. The algorithms on which the system is based are described, and the sources and effects of errors are discussed. The system's performance is simulated, providing an estimate of accuracy. Two phantoms are used to measure accuracy experimentally. Clinical results and observations are given. >

205 citations


Journal ArticleDOI
TL;DR: A quantitative evaluation shows that the edge detector developed robust enough to perform well over a wide range of signal-to-noise ratios performs at least as well—and in most cases much better—than edge detectors.
Abstract: An edge detection scheme is developed robust enough to perform well over a wide range of signal-to-noise ratios. It is based upon the detection of zero crossings in the output image of a nonlinear Laplace filter. Specific characterizations of the nonlinear Laplacian are its adaptive orientation to the direction of the gradient and its inherent masks which permit the development of approximately circular (isotropic) filters. We have investigated the relation between the locally optimal filter parameters, smoothing size, and filter size, and the SNR of the image to be processed. A quantitative evaluation shows that our edge detector performs at least as well—and in most cases much better—than edge detectors. At very low signal-to-noise ratios, our edge detector is superior to all others tested.

188 citations


Journal ArticleDOI
TL;DR: The results of the experiment indicate that several assumption of certain formal models for perception of shape from shading are not psychologically valid, and an alternative approach offered by Koenderink and van Doorn may be more psychologically accurate, as it avoids all three assumptions.
Abstract: Observers judged the slants and tilts of numerous regions within shaded images of ellipsoid surfaces that varied in shape, orientation, surface reflectance, and direction of illumination. The perceived three-dimensional structure of each surface was calculated from these judgments. Much of the error in observers' responses resulted from a tendency to perceive surfaces whose axes were aligned with the display screen. The presence of specular highlights or cast shadows, in contrast, had no effect on performance. The results of the experiment indicate that several assumption of certain formal models for perception of shape from shading are not psychologically valid. The most notable of these assumptions are that the visual system initially assumes that all surfaces have Lambertian reflectance and that illuminant direction must be known before shape detection can proceed. These assumptions are often accompanied by a third assumption that surface orientation is detected locally, and global shape determined by smoothing over local surface orientation estimates. The present experiment indicates that an alternative approach offered by Koenderink and van Doorn may be more psychologically accurate, as it avoids all three assumptions.

174 citations


Proceedings ArticleDOI
14 May 1989
TL;DR: An algorithm to search a tree of interpretations efficiently to determine the solution poses(s) is developed, taking into account errors in the landmark directions extracted by image processing.
Abstract: Following and extending the approach of K. Sugihara (1988), the author assumes that a mobile robot is equipped with a single camera and a map marking the positions in its environment of landmarks. The robot moves on a flat surface, acquires one image, extracts vertical edges from it, and computes the directions to visible landmarks. The problem is to determine the robot's position and orientation (pose) by establishing the correspondence between landmark directions and points in the map. This approach capitalizes on the excellent angular resolution of standard CCD cameras, while avoiding the feature-correspondence and 3D reconstruction problems. The problem is formulated as a search in a tree of interpretations (pairings of landmark directions and landmark points), and an algorithm to search the tree efficiently to determine the solution poses(s) is developed, taking into account errors in the landmark directions extracted by image processing. Quantitative results from simulations and experiments with real imagery are presented. >

140 citations


Journal ArticleDOI
TL;DR: In this paper, it is argued that our visual knowledge of smoothly curved surfaces can also be defined in terms of local, non-metric order relations, and that relative depth judgments between any two surface regions should be dramatically influenced by monotonicity of depth change along the intervening portions of the surface through which they are separated.
Abstract: In theoretical analyses of visual form perception, it is often assumed that the 3-dimensional structures of smoothly curved surfaces are perceptually represented as point-by-point mappings of metric depth and/or orientation relative to the observer. This article describes an alternative theory in which it is argued that our visual knowledge of smoothly curved surfaces can also be defined in terms of local, nonmetric order relations. A fundamental prediction of this analysis is that relative depth judgments between any two surface regions should be dramatically influenced by monotonicity of depth change (or lack of it) along the intervening portions of the surface through which they are separated. This prediction is confirmed in a series of experiments using surfaces depicted with either shading or texture. Additional experiments are reported, moreover, that demonstrate that smooth occlusion contours are a primary source of information about the ordinal structure of a surface and that the depth extrema in between contours can be optically specified by differences in luminance at the points of occlusion.

Proceedings ArticleDOI
01 Dec 1989
TL;DR: The authors present an approach for explicitly relating the shape of image contours to models of curved three-dimensional objects used for object recognition and positioning and readily extends to parameterized models.
Abstract: The authors present an approach for explicitly relating the shape of image contours to models of curved three-dimensional objects. This relationship is used for object recognition and positioning. Object models consist of collections of parametric surface patches and their intersection curves; this includes nearly all representations used in computer-aided geometric design and computer vision. The image contours considered are the projections of surface discontinuities and occluding contours. Elimination theory provides a method for constructing the implicity equation of the image contours of an object observed under orthographic or perspective projection. This equation is parameterized by the object's position and orientation with respect to the observer. Determining these parameters is reduced to a fitting problem between the theoretical contour and the observed data points. The approach readily extends to parameterized models. It has been implemented for a simple world composed of various surfaces of revolution and successfully tested on several real images. >

Patent
Richard G. Casey1, David R. Ferguson1
02 Oct 1989
TL;DR: In this paper, a computer-implemented method is proposed to extract character data from printed forms, which consists only of lines in the master form, and the resulting image can then be displayed, each mask corresponding to a field where data would be located in a filled-in form.
Abstract: A computer-implemented method operable with conventional OCR scanning equipment and software, extracts character data from printed forms. A blank master form is scanned and its digital image stored. Clusters of ON bits of the master form image are first recognized as part of a line and then connected to form lines. All of the lines in the master form image are then identified by row and column start position and column end position, thereby creating a master-form-description. The resulting image, which consists only of lines in the master form, can then be displayed. Regions or masks in the displayed image of master form lines are then created, each mask corresponding to a field where data would be located in a filled-in form. Each data mask is spaced from nearby lines by a predetermined data margin, referred to as D. A filled-in or data form is then scanned and lines are also recognized and identified in a similar manner to create a data-form-description. The data-form-description is compared with the master-form-description by computing the horizontal and vertical offsets and skew of the two forms relative to one another. The created data masks, whose orientation with respect to the master form has been previously determined, are then transposed into the data form image using the computed values of horizontal and vertical offsets and skew. In this manner, the data masks are correctly located on the data form so that the actual data values in the data form reside within the corresponding data masks. Routines are then implemented for detecting extraneous data intruding into the data masks and for growing the masks, i.e. enlarging the masks to capture data which may extend beyond the perimeter of the masks. Thus, the data masks are adaptive in that they are grown if data does not lie entirely within the perimeter of the masks. During the mask growth routine, lines which are part of the background form are detected and removed by line removal algorithms. Following the removal of extraneous data from the masks, the growth of the masks to capture data, and any subsequent line removal, the remaining data from the masks is extracted and transferred to a new file. The new file then contains only data comprising characters of the data values in the desired regions, which can then be operated on by conventional OCR software to identify the specific character values.

Proceedings ArticleDOI
14 Nov 1989
TL;DR: A novel closed-formed solution for the 3D position and orientation of a circular features and the3D position of a spherical feature is described.
Abstract: Mathematics is presented for using monocular model-based vision to find 3D positions of circular and spherical model features, and, for the circular case, orientations as well. Monocular model-based vision here refers to the use of a single projective image of modeled objects to solve for the 3D positions and orientations of the objects in the scene. A novel closed-formed solution for the 3D position and orientation of a circular features and the 3D position of a spherical feature is described. There are two general solutions for the circular feature bu there is only one solution when the surface normal of the circular feature passes through the center of projection. There is only one solution for the circular case. Advantages of this method are: (1) it handles spherical as well as circular features; (2) it has a closed-form solution; (3) it gives only the necessary number of solutions (no redundant solutions); (4) it uses simple mathematics involving 3D analytic geometry; and (5) it is geometrically intuitive. >

Journal ArticleDOI
TL;DR: The authors outline design criteria for the extraction of orientation and velocity information, and present a variety of tools useful in the construction of simple linear filters using a hierarchical parallel-processing scheme.
Abstract: As a step towards the early measurement of visual primitives, the authors outline design criteria for the extraction of orientation and velocity information, and present a variety of tools useful in the construction of simple linear filters. A hierarchical parallel-processing scheme is used in which nodes compute a weighted sum of inputs from within a small spatio-temporal neighborhood. The resulting scheme is easily analyzed and provides mechanisms sensitive to narrow ranges of both image velocity and orientation. The hierarchical approach in combination with separability in the first levels yields an efficient implementation. >

Patent
01 Sep 1989
TL;DR: In this paper, an ultrasonic imaging subsystem is provided for producing signals representative of two-dimensional images of sections of the body, the subsystem including a scanning transducer that is moveable to determine the body to be imaged, the image representative signals are stored as arrays of digital pixel values.
Abstract: The disclosure is directed to an apparatus and method for producing a three-dimensional image representation of a body. The body can be animate or inanimate; i.e. the invention can be used to obtain 3D image representations of for example, parts of the human body or any object that one wishes to view or measure. The 3D image representations can be used to produce 2D displays of sections, contours, etc., or displays having 3D perspective, such as wire-frame type illustrations. This facilitates automatic computation of areas or volumes. In a disclosed embodiment, an ultrasonic imaging subsystem is provided for producing signals representative of two-dimensional images of sections of the body, the subsystem including a scanning transducer that is moveable to determine the section of the body to be imaged. The image representative signals are stored as arrays of digital pixel values. A three-dimensional acoustic digitizer subsystem is provided for deriving and storing information representative of the position and orientation of the transducer during the scanning of an associated section of the body. A voxel space storage is provided for storing a three-dimensional array of voxel values. The arrays of digital pixel values are projected into the voxel space storage, the voxel locations which correspond to projected pixel locations of a given pixel array being determined as a function of the stored position and orientation information associated with the section of the body from which the given pixel array was obtained.

Proceedings ArticleDOI
14 May 1989
TL;DR: In this article, a resolved motion rate control scheme is used to update the robot's pose on the basis of the position of three features in the camera's image, depending on the condition and sensitivity of the differential relationship between the image features and the control parameters.
Abstract: The authors investigate the selection of image features to be used to control visually the position and orientation (pose) of the end-effector of an eye-in-hand robot relative to a workpiece. A resolved-motion-rate control scheme is used to update the robot's pose on the basis of the position of three features in the camera's image. The selection of these three features depends on the condition and sensitivity of the differential relationship between the image features and the control parameters. Both computer simulations and laboratory experiments on a PUMA robot arm were conducted to verify the performance of the feature selection criteria. Experimentally, the PUMA robot arm, with a CCD (charged-coupled device) camera mounted on its end effector, was able to track a randomly moving carburetor gasket with a visual feedback cycle time of 70 ms. >

Journal ArticleDOI
TL;DR: An algorithm to recover three-dimensional shape, i.e., surface orientation and relative depth from a single segmented image, is presented and a variational formulation of line drawing and shading constraints in a common framework is developed.
Abstract: An algorithm to recover three-dimensional shape, i.e., surface orientation and relative depth from a single segmented image is presented. It is assumed that the scene is composed of opaque regular solid objects bounded by piecewise smooth surfaces with no markings or textures. It is also assumed that the reflectance map R(n) is known. For the canonical case of Lambertian surfaces illuminated by a point light source, this implies knowing the light-source direction. A variational formulation of line drawing and shading constraints in a common framework is developed. The global constraints are partitioned into constraint sets corresponding to the faces, edges and vertices in the scene. For a face, the constraints are given by Horn's image irradiance equation. A variational formulation of the constraints at an edge both from the known direction of the image curve corresponding to the edge and shading is developed. At a vertex, the constraints are modeled by a system of nonlinear equations. An algorithm is presented to solve this system of constraints. >

Journal ArticleDOI
TL;DR: A previously developed model of texture segmentation using multiple spatially and spectrally localized filters, known as Gabor filters, to the analysis of textures composed of elementary image features known as textons is applied.

Journal ArticleDOI
TL;DR: An algorithm for calculating the biovolume of cells with simple shapes, such as bacteria, flagellates, and simple ciliates, from a 2-dimensional digital image, which accounts for irregularities in cell shape that conventional methods based on length, width, and geometrical formulas do not.
Abstract: This paper describes an algorithm for calculating the biovolume of cells with simple shapes, such as bacteria, flagellates, and simple ciliates, from a 2-dimensional digital image. The method can be adapted to any image analysis system which allows access to the binary cell image—(i.e.), the pixels, or (x, y) points, composing the cell. The cell image is rotated to a standard orientation (horizontal), and a solid of revolution is calculated by digital integration. Verification and a critical assessment of the method are presented. The algorithm accounts for irregularities in cell shape that conventional methods based on length, width, and geometrical formulas do not.

Patent
27 Feb 1989
TL;DR: In this paper, an edge enhancement is done to obtain a first and a second threshold level which are dilated to eliminate noise edges, and an edge correlation is then performed to determine orientation of the article.
Abstract: A method for automatically inspecting articles or work pieces uses a solid state television camera to obtain images of the article. An edge enhancement is done to obtain a first and a second threshold level which are dilated to eliminate noise edges. An edge correlation is then performed to determine orientation of the article. Gray scale or gray level imaging of the surface of the device is then performed and compared against a known device surface in order to determine acceptability of the device under inspection. The gray scale imaging techniques used are robust gray scale image processing that are tolerant to significant variations in image luminance or image degradation and is capable of identifying and classifying image features consistently while in the presence of such luminance variations. The data is then statistically characterized and the results used for real time statistical process control of the work piece process.

Proceedings ArticleDOI
15 Aug 1989
TL;DR: A model for the perception of distortions in pictures that consists of an adaptive input stage realized as a ROG (Ratio of Gaussian) pyramid, and a further decomposition by orientation selective filters including a saturating nonlinearity acting at each point of the filter outputs.
Abstract: A model for the perception of distortions in pictures is suggested. It consists of two main parts: an adaptive input stage realized as a ROG (Ratio of Gaussian) pyramid also suited for applications in image coding and computer vision, and a further decomposition by orientation selective filters including a saturating nonlinearity acting at each point of the filter outputs. The output values for each point of each filter are regarded as feature vector of the internal representation of the input picture. The difference between the internal representations of original and distorted picture is evaluated as norm of the difference vector. Due to local nonlinearities this operation explains periodic and aperiodic masking effects.

Journal ArticleDOI
TL;DR: In this paper, the mean direction of a random p-dimensional unit vector X with an arbitrary unimodal distribution on the p sphere is estimated by shading spherical surfaces using pivotal statistics.
Abstract: Methods are proposed for constructing bootstrap confidence regions for the mean direction of a random p-dimensional unit vector X with an arbitrary unimodal distribution on the p sphere. The approach of this article differs from that of other authors in that it is based on pivotal statistics. A general pivotal method is introduced that produces a wide variety of confidence regions on general p-dimensional spheres; included are confidence cones and likelihood-based regions. It can readily be modified to incorporate extra assumptions about the underlying distribution, such as rotational symmetry. The general method leads to confidence pictures, which present information about the estimated posterior likelihood of mean orientation by shading spherical surfaces. An application is given to a sample of spherical cross-bed measurements. The methods extend to the case where X has random length, and to calculation of confidence regions for reference directions of axial bipolar or girdle distributions.

Journal ArticleDOI
TL;DR: This paper deals with the problem of locating a rigid object and estimating its motion in three dimensions by determining the position and orientation of the object at each instant when an image is captured by a camera, and recovering the motion of theobject between consecutive frames.
Abstract: This paper deals with the problem of locating a rigid object and estimating its motion in three dimensions. This involves determining the position and orientation of the object at each instant when an image is captured by a camera, and recovering the motion of the object between consecutive frames. In the implementation scheme used here, a sequence of camera images, digitized at the sample instants, is used as the initial input data. Measurements are made of the locations of certain features (e.g., maximum curvature points of an image contour, corners, edges, etc.) on the 2-D images. To measure the feature locations a matching algorithm is used, which produces correspondences between the features in the image and the object. Using the measured feature locations on the image, an algorithm is developed to solve the location and motion problem. The algorithm is an extended Kalman filter modeled for this application.

Journal ArticleDOI
TL;DR: Two simple methods are given for obtaining the surface shape using a projected grid, which appear to be superior to methods based on shape-from-shading because the results are comparable, yet the equipment setup is simpler and the processing is not very sensitive to object reflectance.
Abstract: Two simple methods are given for obtaining the surface shape using a projected grid. After the camera is calibrated to the 3-D workspace, the only input date needed for the computation of surface normals are grid intersect points in a single 2-D image. The first method performs nonlinear computations based on the distortion of the lengths of the grid edges and does not require a full calibration matrix. The second method requires that a full parallel projection model of the imaging is available, which enables it to compute 3-D normals using simple linear computations. The linear method performed better overall in the experiments, but both methods produced normals within 4-8 degrees of known 3-D directions. These methods appear to be superior to methods based on shape-from-shading because the results are comparable, yet the equipment setup is simpler and the processing is not very sensitive to object reflectance. >

Journal ArticleDOI
TL;DR: The template matching procedure works directly in conjunction with unscaled intensity images, independently of position and orientation, and may be adapted to the case of overlapping objects by introducing the concept of local symmetry.
Abstract: A template matching procedure based on symmetry detection of planar images is presented. The method works directly in conjunction with unscaled intensity images, independently of position and orientation, and may be adapted to the case of overlapping objects by introducing the concept of local symmetry. Moreover, owing to the moderate amount of needed local processing, the algorithm is particularly noise insensitive.

Proceedings ArticleDOI
04 Jun 1989
TL;DR: The authors present a method for texture segmentation that does not assume any prior knowledge about either the type of textures or the number of textured regions present in the image and uses the similarity of the descriptors to determine the existence of texture regions.
Abstract: The authors present a method for texture segmentation that does not assume any prior knowledge about either the type of textures or the number of textured regions present in the image. Local orientation and spatial frequencies are used as the key parameters for classifying texture. The information is obtained by creating a local multifrequency multiorientation channel decomposition of the image, with the width of each frequency band constant on a logarithmic scale. This decomposition is implemented by applying a set of Gabor-like functions that were modified to have a decreased frequency selectivity when the filter's center frequency increases. The set of filter outputs is then used to create robust texture descriptors. The segmentation algorithm uses the similarity of the descriptors to determine the existence of texture regions and to outline their border rather than concentrating on segregating the textures. The method has been applied to image containing natural textures, resulting in a good segmentation of the texture regions. >

Book
01 Aug 1989
TL;DR: In this paper, a computational model for the 3D interpretation of a 2D view based on contour classification and contour interpretation is proposed, which combines a generic surface description well suited for visual tasks with a model of the image formation process.
Abstract: In this paper we suggest a computational model for the 3D interpretation of a 2D view based on contour classification and contour interpretation. We concentrate on those contours arising from discontinuities in surface orientation. We combine a generic surface description well suited for visual tasks with a model of the image formation process in order to derive image contour configurations that are likely to be interpreted in terms of surface contours. Next we describe a computer algorithm which attempts to interpret image contours on the following grounds. First, an image analysis process produces a description in terms of contours and relationships between them. Second, among these contours, we select those which form a desired configuration. Third, the selected contours are combined with constraints available with the image formation process in order to be interpreted in terms of discontinuities in surface orientation. As a consequence, there is a dramatic reduction in the number of possible orientations of the associated scene surfaces.

Journal ArticleDOI
TL;DR: This study investigates the ability of observers to see the 3-D shape of an object using motion cues, so called structure-from-motion (SFM), and shows that the human visual system integrates motion information spatially and temporally as part of the process for computing SFM.
Abstract: Although it is appreciated that humans can use a number of visual cues to perceive the three-dimensional (3-D) shape of an object, for example, luminance, orientation, binocular disparity, and motion, the exact mechanisms employed are not known (De Yoe and Van Essen 1988). An important approach to understanding the computations performed by the visual system is to develop algorithms (Marr 1982) or neural network models (Lehky and Sejnowski 1988; Siegel 1987) that are capable of computing shape from specific cues in the visual image. In this study we investigated the ability of observers to see the 3-D shape of an object using motion cues, so called structure-from-motion (SFM). We measured human performance in a two-alternative forced choice task using novel dynamic random-dot stimuli with limited point lifetimes. We show that the human visual system integrates motion information spatially and temporally (across several point lifetimes) as part of the process for computing SFM. We conclude that SFM algorithms must include surface interpolation to account for human performance. Our experiments also provide evidence that local velocity information, and not position information derived from discrete views of the image (as proposed by some algorithms), is used to solve the SFM problem by the human visual system.

Journal ArticleDOI
TL;DR: The components of the Sheffield Artificial Intelligence Vision Research Unit (AIVRU) three-dimensional vision system, which currently supports model-based object recognition and location, are described and its potential for robotics applications is demonstrated by its guidance of a Universal Machine Intelligence robot arm in a pick-and-place task.
Abstract: The components of the Sheffield Artificial Intelligence Vision Research Unit (AIVRU) three-dimensional (3D) vision sys tem, which currently supports model-based object recognition and location, are described. Its potential for robotics applica tions is demonstrated by its guidance of a Universal Machine Intelligence robot arm in a pick-and-place task. The system comprises (1) the recovery of a sparse depth map using edge- based, passive stereo triangulation; (2) the grouping, descrip tion, and segmentation of edge segments to recover a 3D representation of the scene geometry in terms of straight lines and circular arcs; (3) the statistical combination of 3D de scriptions for object model creation from multiple stereo views and the propagation of constraints for within-view re finement ; and (4) the matching of 3D wireframe object models to 3D scene descriptions in order to recover an initial estimate of their position and orientation.