# Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 1990"

••

TL;DR: A new definition of scale-space is suggested, and a class of algorithms used to realize a diffusion process is introduced, chosen to vary spatially in such a way as to encourage intra Region smoothing rather than interregion smoothing.

Abstract: A new definition of scale-space is suggested, and a class of algorithms used to realize a diffusion process is introduced. The diffusion coefficient is chosen to vary spatially in such a way as to encourage intraregion smoothing rather than interregion smoothing. It is shown that the 'no new maxima should be generated at coarse scales' property of conventional scale space is preserved. As the region boundaries in the approach remain sharp, a high-quality edge detector which successfully exploits global information is obtained. Experimental results are shown on a number of images. Parallel hardware implementations are made feasible because the algorithm involves elementary, local operations replicated over the image. >

12,560 citations

••

TL;DR: It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks, which helps improve the performance and training of neural networks for classification.

Abstract: Several means for improving the performance and training of neural networks for classification are proposed Crossvalidation is used as a tool for optimizing network parameters and architecture It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks >

3,891 citations

••

Brown University

^{1}TL;DR: The use of natural symmetries (mirror images) in a well-defined family of patterns (human faces) is discussed within the framework of the Karhunen-Loeve expansion, which results in an extension of the data and imposes even and odd symmetry on the eigenfunctions of the covariance matrix.

Abstract: The use of natural symmetries (mirror images) in a well-defined family of patterns (human faces) is discussed within the framework of the Karhunen-Loeve expansion This results in an extension of the data and imposes even and odd symmetry on the eigenfunctions of the covariance matrix, without increasing the complexity of the calculation The resulting approximation of faces projected from outside of the data set onto this optimal basis is improved on average >

2,686 citations

••

TL;DR: A systematic reconstruction-based method for deciding the highest-order ZERNike moments required in a classification problem is developed and the superiority of Zernike moment features over regular moments and moment invariants was experimentally verified.

Abstract: The problem of rotation-, scale-, and translation-invariant recognition of images is discussed. A set of rotation-invariant features are introduced. They are the magnitudes of a set of orthogonal complex moments of the image known as Zernike moments. Scale and translation invariance are obtained by first normalizing the image with respect to these parameters using its regular geometrical moments. A systematic reconstruction-based method for deciding the highest-order Zernike moments required in a classification problem is developed. The quality of the reconstructed image is examined through its comparison to the original one. The orthogonality property of the Zernike moments, which simplifies the process of image reconstruction, make the suggest feature selection approach practical. Features of each order can also be weighted according to their contribution to the reconstruction process. The superiority of Zernike moment features over regular moments and moment invariants was experimentally verified. >

1,971 citations

••

TL;DR: An interpretation of image texture as a region code, or carrier of region information, is emphasized and examples are given of both types of texture processing using a variety of real and synthetic textures.

Abstract: A computational approach for analyzing visible textures is described. Textures are modeled as irradiance patterns containing a limited range of spatial frequencies, where mutually distinct textures differ significantly in their dominant characterizing frequencies. By encoding images into multiple narrow spatial frequency and orientation channels, the slowly varying channel envelopes (amplitude and phase) are used to segregate textural regions of different spatial frequency, orientation, or phase characteristics. Thus, an interpretation of image texture as a region code, or carrier of region information, is emphasized. The channel filters used, known as the two-dimensional Gabor functions, are useful for these purposes in several senses: they have tunable orientation and radial frequency bandwidths and tunable center frequencies, and they optimally achieve joint resolution in space and in spatial frequency. By comparing the channel amplitude responses, one can detect boundaries between textures. Locating large variations in the channel phase responses allows discontinuities in the texture phase to be detected. Examples are given of both types of texture processing using a variety of real and synthetic textures. >

1,582 citations

••

TL;DR: A description of the transferable belief model, which is used to quantify degrees of belief based on belief functions, is given and a set of axioms justifying Dempster's rule for the combination of belief functions induced by two distinct evidences is presented.

Abstract: A description of the transferable belief model, which is used to quantify degrees of belief based on belief functions, is given. The impact of open- and closed-world assumption on conditioning is discussed. The nature of the frame of discernment on which a degree of belief will be established is discussed. A set of axioms justifying Dempster's rule for the combination of belief functions induced by two distinct evidences is presented. >

1,152 citations

••

TL;DR: The optimization problem is set up as a discrete multistage decision process and is solved by a time-delayed discrete dynamic programming algorithm, and a parallel procedure for decreasing computational costs is discussed.

Abstract: Dynamic programming is discussed as an approach to solving variational problems in vision. Dynamic programming ensures global optimality of the solution, is numerically stable, and allows for hard constraints to be enforced on the behavior of the solution within a natural and straightforward structure. As a specific example of the approach's efficacy, applying dynamic programming to the energy-minimizing active contours is described. The optimization problem is set up as a discrete multistage decision process and is solved by a time-delayed discrete dynamic programming algorithm. A parallel procedure for decreasing computational costs is discussed. >

1,014 citations

••

TL;DR: The state of the art of online handwriting recognition during a period of renewed activity in the field is described, based on an extensive review of the literature, including journal articles, conference proceedings, and patents.

Abstract: This survey describes the state of the art of online handwriting recognition during a period of renewed activity in the field. It is based on an extensive review of the literature, including journal articles, conference proceedings, and patents. Online versus offline recognition, digitizer technology, and handwriting properties and recognition problems are discussed. Shape recognition algorithms, preprocessing and postprocessing techniques, experimental systems, and commercial products are examined. >

922 citations

••

TL;DR: The proper way to apply the scale-space theory to discrete signals and discrete images is by discretization of the diffusion equation, not the convolution integral.

Abstract: A basic and extensive treatment of discrete aspects of the scale-space theory is presented. A genuinely discrete scale-space theory is developed and its connection to the continuous scale-space theory is explained. Special attention is given to discretization effects, which occur when results from the continuous scale-space theory are to be implemented computationally. The 1D problem is solved completely in an axiomatic manner. For the 2D problem, the author discusses how the 2D discrete scale space should be constructed. The main results are as follows: the proper way to apply the scale-space theory to discrete signals and discrete images is by discretization of the diffusion equation, not the convolution integral; the discrete scale space obtained in this way can be described by convolution with the kernel, which is the discrete analog of the Gaussian kernel, a scale-space implementation based on the sampled Gaussian kernel might lead to undesirable effects and computational problems, especially at fine levels of scale; the 1D discrete smoothing transformations can be characterized exactly and a complete catalogue is given; all finite support 1D discrete smoothing transformations arise from repeated averaging over two adjacent elements (the limit case of such an averaging process is described); and the symmetric 1D discrete smoothing kernels are nonnegative and unimodal, in both the spatial and the frequency domain. >

687 citations

••

TL;DR: In this paper, a method for recovery of compact volumetric models for shape representation of single-part objects in computer vision is introduced, where the model recovery is formulated as a least-squares minimization of a cost function for all range points belonging to a single part.

Abstract: A method for recovery of compact volumetric models for shape representation of single-part objects in computer vision is introduced. The models are superquadrics with parametric deformations (bending, tapering, and cavity deformation). The input for the model recovery is three-dimensional range points. Model recovery is formulated as a least-squares minimization of a cost function for all range points belonging to a single part. During an iterative gradient descent minimization process, all model parameters are adjusted simultaneously, recovery position, orientation, size, and shape of the model, such that most of the given range points lie close to the model's surface. A specific solution among several acceptable solutions, where are all minima in the parameter space, can be reached by constraining the search to a part of the parameter space. The many shallow local minima in the parameter space are avoided as a solution by using a stochastic technique during minimization. Results using real range data show that the recovered models are stable and that the recovery procedure is fast. >

596 citations

••

TL;DR: A method that combines region growing and edge detection for image segmentation is presented and is thought that the success in the tool images is because the objects shown occupy areas of many pixels, making it is easy to select parameters to separate signal information from noise.

Abstract: A method that combines region growing and edge detection for image segmentation is presented. The authors start with a split-and merge algorithm wherein the parameters have been set up so that an over-segmented image results. Region boundaries are then eliminated or modified on the basis of criteria that integrate contrast with boundary smoothness, variation of the image gradient along the boundary, and a criterion that penalizes for the presence of artifacts reflecting the data structure used during segmentation (quadtree in this case). The algorithms were implemented in the C language on a Sun 3/160 workstation running under the Unix operating system. Simple tool images and aerial photographs were used to test the algorithms. The impression of human observers is that the method is very successful on the tool images and less so on the aerial photograph images. It is thought that the success in the tool images is because the objects shown occupy areas of many pixels, making it is easy to select parameters to separate signal information from noise. >

••

TL;DR: A novel kind of language model which reflects short-term patterns of word use by means of a cache component (analogous to cache memory in hardware terminology) is presented and contains a 3g-gram component of the traditional type.

Abstract: Speech-recognition systems must often decide between competing ways of breaking up the acoustic input into strings of words. Since the possible strings may be acoustically similar, a language model is required; given a word string, the model returns its linguistic probability. Several Markov language models are discussed. A novel kind of language model which reflects short-term patterns of word use by means of a cache component (analogous to cache memory in hardware terminology) is presented. The model also contains a 3g-gram component of the traditional type. The combined model and a pure 3g-gram model were tested on samples drawn from the Lancaster-Oslo/Bergen (LOB) corpus of English text. The relative performance of the two models is examined, and suggestions for the future improvements are made. >

••

TL;DR: A statistical framework is used for finding boundaries and for partitioning scenes into homogeneous regions and incorporates a measure of disparity between certain spatial features of block pairs of pixel gray levels, using the Kolmogorov-Smirnov nonparametric measures of difference between the distributions of these features.

Abstract: A statistical framework is used for finding boundaries and for partitioning scenes into homogeneous regions. The model is a joint probability distribution for the array of pixel gray levels and an array of labels. In boundary finding, the labels are binary, zero, or one, representing the absence or presence of boundary elements. In partitioning, the label values are generic: two labels are the same when the corresponding scene locations are considered to belong to the same region. The distribution incorporates a measure of disparity between certain spatial features of block pairs of pixel gray levels, using the Kolmogorov-Smirnov nonparametric measures of difference between the distributions of these features. The number of model parameters is minimized by forbidding label configurations, which are assigned probability zero. The maximum a posteriori estimator of boundary placements and partitionings is examined. The forbidden states introduce constraints into the calculation of these configurations. Stochastic relaxation methods are extended to accommodate constrained optimization. >

••

TL;DR: A recursive filtering structure is proposed that drastically reduces the computational effort required for smoothing, performing the first and second directional derivatives, and carrying out the Laplacian of an image.

Abstract: A recursive filtering structure is proposed that drastically reduces the computational effort required for smoothing, performing the first and second directional derivatives, and carrying out the Laplacian of an image. These operations are done with a fixed number of multiplications and additions per output point independently of the size of the neighborhood considered. The key to the approach is, first, the use of an exponentially based filter family and, second, the use of the recursive filtering. Applications to edge detection problems and multiresolution techniques are considered, and an edge detector allowing the extraction of zero-crossings of an image with only 14 operations per output element at any resolution is proposed. Various experimental results are shown. >

••

TL;DR: The method of Fourier descriptors is extended to produce a set of normalized coefficients which are invariant under any affine transformation (translation, rotation, scaling, and shearing) and allows considerable robustness when applied to images of objects which rotate in all three dimensions.

Abstract: The method of Fourier descriptors is extended to produce a set of normalized coefficients which are invariant under any affine transformation (translation, rotation, scaling, and shearing). The method is based on a parameterized boundary description which is transformed to the Fourier domain and normalized there to eliminate dependencies on the affine transformation and on the starting point. Invariance to affine transforms allows considerable robustness when applied to images of objects which rotate in all three dimensions, as is demonstrated by processing silhouettes of aircraft maneuvering in three-space. >

••

Philips

^{1}TL;DR: It is shown theoretically and experimentally that the outputs of the MLP approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities.

Abstract: The statistical use of a particular classic form of a connectionist system, the multilayer perceptron (MLP), is described in the context of the recognition of continuous speech. A discriminant hidden Markov model (HMM) is defined, and it is shown how a particular MLP with contextual and extra feedback input units can be considered as a general form of such a Markov model. A link between these discriminant HMMs, trained along the Viterbi algorithm, and any other approach based on least mean square minimization of an error function (LMSE) is established. It is shown theoretically and experimentally that the outputs of the MLP (when trained along the LMSE or the entropy criterion) approximate the probability distribution over output classes conditioned on the input, i.e. the maximum a posteriori probabilities. Results of a series of speech recognition experiments are reported. The possibility of embedding MLP into HMM is described. Relations with other recurrent networks are also explained. >

••

TL;DR: A method for the determination of camera location from two-dimensional to three-dimensional (3-D) straight line or point correspondences is presented and results can be obtained in the presence of noise if more than the minimum required number of correspondences are used.

Abstract: A method for the determination of camera location from two-dimensional (2-D) to three-dimensional (3-D) straight line or point correspondences is presented. With this method, the computations of the rotation matrix and the translation vector of the camera are separable. First, the rotation matrix is found by a linear algorithm using eight or more line correspondences, or by a nonlinear algorithm using three or more line correspondences, where the line correspondences are either given or derived from point correspondences. Then, the translation vector is obtained by solving a set of linear equations based on three or more line correspondences, or two or more point correspondences. Eight 2-D to 3-D line correspondences or six 2-D to 3-D point correspondences are needed for the linear approach; three 2-D to 3-D line or point correspondences for the nonlinear approach. Good results can be obtained in the presence of noise if more than the minimum required number of correspondences are used. >

••

TL;DR: It is argued that haphazardly applying generalized Hough transform methods to complex recognition tasks is risky, as the probability of false positives can be very high.

Abstract: Object recognition from sensory data involves, in part, determining the pose of a model with respect to a scene. A common method for finding an object's pose is the generalized Hough transform, which accumulates evidence for possible coordinate transformations in a parameter space whose axes are the quantized transformation parameters. Large clusters of similar transformations in that space are taken as evidence of a correct match. A theoretical analysis of the behavior of such methods is presented. The authors derive bounds on the set of transformations consistent with each pairing of data and model features, in the presence of noise and occlusion in the image. Bounds are provided on the likelihood of false peaks in the parameter space, as a function of noise, occlusion, and tessellation effects. It is argued that haphazardly applying such methods to complex recognition tasks is risky, as the probability of false positives can be very high. >

••

TL;DR: By performing real-time measurements of the time durations between the keystrokes when a password is entered and using pattern-recognition algorithms, three online recognition systems were devised and tested.

Abstract: An approach to securing access to computer systems is described. By performing real-time measurements of the time durations between the keystrokes when a password is entered and using pattern-recognition algorithms, three online recognition systems were devised and tested. Two types of passwords were considered: phrases and individual names. A fixed phrase was used in the identification system. Individual names were used as a password in the verification system and in the overall recognition system. All three systems were tested and evaluated. The identification system used 10 volunteers and gave an indecision error of 1.2%. The verification system used 26 volunteers and gave an error of 8.1% in rejecting valid users and an error of 2.8% in accepting invalid users. The overall recognition system used 32 volunteers and gave an error of 3.1% in rejecting valid users and an error of 0.5% in accepting invalid users. >

••

TL;DR: A coding scheme is presented based on a single fixed binary encoded illumination pattern, which contains all the information required to identify the individual strikes visible in the camera image and a prototype measurement system based on this coding principle is presented.

Abstract: The problem of strike identification in range image acquisition systems based on triangulation with periodically structured illumination is discussed. A coding scheme is presented based on a single fixed binary encoded illumination pattern, which contains all the information required to identify the individual strikes visible in the camera image. Every sample point indicated by the light pattern is made identifiable by means of a binary signature, which is locally shared among its closest neighbors. The applied code is derived from pseudonoise sequences, and it is optimized so that it can make the identification fault-tolerant to the largest extent. A prototype measurement system based on this coding principle is presented. Experimental results obtained with the measurement system are also presented. >

••

Yale University

^{1}TL;DR: A system using two Polaroid transducers is described that correctly discriminates between corners and planes for inclination angles within +or-10 degrees of the transducer orientation, allowing the system to operate over an extended range.

Abstract: A multitransducer, pulse/echo-ranging system is described that differentiates corner and plane reflectors by exploiting the physical properties of sound propagation. The amplitudes and ranges of reflected signals for the different transmitter and receiver pairs are processed to determine whether the reflecting object is a plane or a right-angle corner. In addition, the angle of inclination of the reflector with respect to the transducer orientation can be measured. Reflected signal amplitude and range values, as functions of inclination angle, provide the motivation for the differentiation algorithm. A system using two Polaroid transducers is described that correctly discriminates between corners and planes for inclination angles within +or-10 degrees of the transducer orientation. The two-transducer system is extended to a multitransducer array, allowing the system to operate over an extended range. An analysis comparing processing effort to estimation accuracy is performed. >

••

TL;DR: Direct analytical methods are discussed for solving Poisson equations of the general form Delta u=f on a rectangular domain and experiments indicate that results comparable to those using multigrid can be obtained in a very small number of iterations.

Abstract: Direct analytical methods are discussed for solving Poisson equations of the general form Delta u=f on a rectangular domain. Some embedding techniques that may be useful when boundary conditions (obtained from stereo and occluding boundary) are defined on arbitrary contours are described. The suggested algorithms are computationally efficient owing to the use of fast orthogonal transforms. Applications to shape from shading, lightness and optical flow problems are also discussed. A proof for the existence and convergence of the flow estimates is given. Experiments using synthetic images indicate that results comparable to those using multigrid can be obtained in a very small number of iterations. >

••

TL;DR: A computer algorithm which segments gray-scale images into regions of interest (objects) has been developed that can provide the basis for scene analysis (including shape-parameter calculation) or surface-based, shaded-graphics display.

Abstract: A computer algorithm which segments gray-scale images into regions of interest (objects) has been developed. These regions can provide the basis for scene analysis (including shape-parameter calculation) or surface-based, shaded-graphics display. The algorithm creates a tree structure for image description by defining a linking relationship between pixels in successively blurred versions of the initial image. The image is described in terms of nested light and dark regions. This algorithm, successfully implemented in one, two, and three dimensions, can theoretically work with any number of dimensions. The interactive postprocessing developed technique selects regions from the descriptive tree for display in several ways: pointing to a branch of the image description tree, specifying by sliders the range of scale and/or intensity of all regions which should be displayed, and pointing (on the original image) to any pixel in the desired region. The algorithm has been applied to approximately 15 computer tomography (CT) images of the abdomen. >

••

TL;DR: An approach for explicitly relating the shape of image contours to models of curved three-dimensional objects is presented and readily extends to parameterized models.

Abstract: An approach for explicitly relating the shape of image contours to models of curved three-dimensional objects is presented. This relationship is used for object recognition and positioning. Object models consist of collections of parametric surface patches and their intersection curves; this includes nearly all representations used in computer-aided geometric design and computer vision. The image contours considered are the projections of surface discontinuities and occluding contours. Elimination theory provides a method for constructing the implicit equation of these contours for an object observed under orthographic or perspective projection. This equation is parameterized by the object's position and orientation with respect to the observer. Determining these parameters is reduced to a fitting problem between the theoretical contour and the observed data points. The proposed approach readily extends to parameterized models. It has been implemented for a simple world composed of various surfaces of revolution and tested on several real images. >

••

TL;DR: An algorithm has been developed to locate and separate text strings of various font sizes, styles, and orientations by applying the Hough transform to the centroids of connected components in the image.

Abstract: A system for interpretation of images of paper-based line drawings is described. Since a typical drawing contains both text strings and graphics, an algorithm has been developed to locate and separate text strings of various font sizes, styles, and orientations. This is accomplished by applying the Hough transform to the centroids of connected components in the image. The graphics in the segmented image are processed to represent thin entities by their core-lines and thick objects by their boundaries. The core-lines and boundaries are segmented into straight line segments and curved lines. The line segments and their interconnections are analyzed to locate minimum redundancy loops which are adequate to generate a succinct description of the graphics. Such a description includes the location and attributes of simple polygonal shapes, circles, and interconnecting lines, and a description of the spatial relationships and occlusions among them. Hatching and filling patterns are also identified. The performance of the system is evaluated using several test images, and the results are presented. The superiority of these algorithms in generating meaningful interpretations of graphics, compared to conventional data compression schemes, is clear from these results. >

••

TL;DR: A two-stage method of image segmentation based on gray level cooccurrence matrices that robustly segments an image into homogeneous areas and generates an edge map is described and extends easily to general edge operators.

Abstract: A two-stage method of image segmentation based on gray level cooccurrence matrices is described. An analysis of the distributions within a cooccurrence matrix defines an initial pixel classification into both region and interior or boundary designations. Local consistency of pixel classification is then implemented by minimizing the entropy of local information, where region information is expressed via conditional probabilities estimated from the cooccurrence matrices, and boundary information via conditional probabilities which are determined a priori. The method robustly segments an image into homogeneous areas and generates an edge map. The technique extends easily to general edge operators. An example is given for the Canny operator. Applications to synthetic and forward-looking infrared (FLIR) images are given. >

••

TL;DR: In this article, a method for distinguishing metal and dielectric material surfaces from the polarization characteristics of specularly reflected light is introduced, which is completely passive and requires only the sensing of transmitted radiance of reflected light through a polarizing filter positioned in multiple orientations in front of a camera sensor.

Abstract: A computationally simple yet powerful method for distinguishing metal and dielectric material surfaces from the polarization characteristics of specularly reflected light is introduced. The method is completely passive, requiring only the sensing of transmitted radiance of reflected light through a polarizing filter positioned in multiple orientations in front of a camera sensor. Precise positioning of lighting is not required. An advantage of using a polarization-based method for material classification is its immunity to color variations, which so commonly exist on uniform material samples. A simple polarization-reflectance model, called the Fresnel reflectance model, is developed. The fundamental assumptions are that the diffuse component of reflection is completely unpolarized and that the polarization state of the specular component of reflection is dictated by the Fresnel reflection coefficients. The material classification method presented results axiomatically from the Fresnel reflectance model, by estimating the polarization Fresnel ratio. No assumptions are required about the functional form of the diffuse and specular components of reflection. The method is demonstrated on some common objects consisting of metal and dielectric parts. >

••

TL;DR: An application of the syntactic method to electrocardiogram (ECG) pattern recognition and parameter measurement is presented and the performance of the resultant system has been evaluated using an annotated standard ECG library.

Abstract: An application of the syntactic method to electrocardiogram (ECG) pattern recognition and parameter measurement is presented. Solutions to the related problems of primitive pattern selection, primitive pattern extraction, linguistic representation, and pattern grammar formulation are given. Attribute grammars are used as the model for the pattern grammar because of their descriptive power, founded upon their ability to handle syntactic as well as semantic information. This approach has been implemented and the performance of the resultant system has been evaluated using an annotated standard ECG library. >

••

TL;DR: A functional minimization algorithm utilizing overlapping local charts to refine surface points and curvature estimates is presented, and an implementation as an iterative constraint satisfaction procedure based on local surface smoothness properties is developed.

Abstract: Early image understanding seeks to derive analytic representations from image intensities. The authors present steps towards this goal by considering the inference of surfaces from three-dimensional images. Only smooth surfaces are considered and the focus is on the coupled problems of inferring the trace points (the points through which the surface passes) and estimating the associated differential structure given by the principal curvature and direction fields over the estimated smooth surfaces. Computation of these fields is based on determining an atlas of local charts or parameterizations at estimated surface points. Algorithm robustness and the stability of results are essential for analyzing real images; to this end, the authors present a functional minimization algorithm utilizing overlapping local charts to refine surface points and curvature estimates, and develop an implementation as an iterative constraint satisfaction procedure based on local surface smoothness properties. Examples of the recovery of local structure are presented for synthetic images degraded by noise and for clinical magnetic resonance images. >

••

TL;DR: Experimental results in the area of texture segmentation and Gestalt grouping using the Wigner distribution are presented, proving the feasibility of using s/sf representations for low-level (early, preattentive) vision.

Abstract: The generic issue of clustering/grouping is addressed. Recent research, both in computer and human vision, suggests the use of joint spatial/spatial-frequency (s/sf) representations. The spectrogram, the difference of Gaussians representation, the Gabor representation, and the Wigner distribution are discussed and compared. It is noted that the Wigner distribution gives superior joint resolution. Experimental results in the area of texture segmentation and Gestalt grouping using the Wigner distribution are presented, proving the feasibility of using s/sf representations for low-level (early, preattentive) vision. >