scispace - formally typeset
Search or ask a question

Showing papers on "Feature extraction published in 1996"


Journal ArticleDOI
TL;DR: Comparisons with other multiresolution texture features using the Brodatz texture database indicate that the Gabor features provide the best pattern retrieval accuracy.
Abstract: Image content based retrieval is emerging as an important research area with application to digital libraries and multimedia databases. The focus of this paper is on the image processing aspects and in particular using texture information for browsing and retrieval of large image data. We propose the use of Gabor wavelet features for texture analysis and provide a comprehensive experimental evaluation. Comparisons with other multiresolution texture features using the Brodatz texture database indicate that the Gabor features provide the best pattern retrieval accuracy. An application to browsing large air photos is illustrated.

4,017 citations


Journal ArticleDOI
TL;DR: This paper presents an overview of feature extraction methods for off-line recognition of segmented (isolated) characters in terms of invariance properties, reconstructability and expected distortions and variability of the characters.

1,376 citations


Journal ArticleDOI
TL;DR: A novel observation model based on motion compensated subsampling is proposed for a video sequence and Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence.
Abstract: The human visual system appears to be capable of temporally integrating information in a video sequence in such a way that the perceived spatial resolution of a sequence appears much higher than the spatial resolution of an individual frame. While the mechanisms in the human visual system that do this are unknown, the effect is not too surprising given that temporally adjacent frames in a video sequence contain slightly different, but unique, information. This paper addresses the use of both the spatial and temporal information present in a short image sequence to create a single high-resolution video frame. A novel observation model based on motion compensated subsampling is proposed for a video sequence. Since the reconstruction problem is ill-posed, Bayesian restoration with a discontinuity-preserving prior image model is used to extract a high-resolution video still given a short low-resolution sequence. Estimates computed from a low-resolution image sequence containing a subpixel camera pan show dramatic visual and quantitative improvements over bilinear, cubic B-spline, and Bayesian single frame interpolations. Visual and quantitative improvements are also shown for an image sequence containing objects moving with independent trajectories. Finally, the video frame extraction algorithm is used for the motion-compensated scan conversion of interlaced video data, with a visual comparison to the resolution enhancement obtained from progressively scanned frames.

1,058 citations


Proceedings ArticleDOI
TL;DR: The Virage engine provides an open framework for developers to 'plug-in' primitives to solve specific image management problems and can be utilized to address high-level problems as well, such as automatic, unsupervised keyword assignment, or image classification.
Abstract: Until recently, the management of large image databases has relied exclusively on manually entered alphanumeric annotations. Systems are beginning to emerge in both the research and commercial sectors based on 'content-based' image retrieval, a technique which explicitly manages image assets by directly representing their visual attributes. The Virage image search engine provides an open framework for building such systems. The Virage engine expresses visual features as image 'primitives.' Primitives can be very general (such as color, shape, or texture) or quite domain specific (face recognition, cancer cell detection, etc.). The basic philosophy underlying this architecture is a transformation from the data-rich representation of explicit image pixels to a compact, semantic-rich representation of visually salient characteristics. In practice, the design of such primitives is non-trivial, and is driven by a number of conflicting real-world constraints (e.g. computation time vs. accuracy). The virage engine provides an open framework for developers to 'plug-in' primitives to solve specific image management problems. The architecture has been designed to support both static images and video in a unified paradigm. The infrastructure provided by the Virage engine can be utilized to address high-level problems as well, such as automatic, unsupervised keyword assignment, or image classification.

921 citations


Journal ArticleDOI
TL;DR: H holistic approaches that avoid segmentation by recognizing entire character strings as units are described, including methods that partition the input image into subimages, which are then classified.
Abstract: Character segmentation has long been a critical area of the OCR process. The higher recognition rates for isolated characters vs. those obtained for words and connected character strings well illustrate this fact. A good part of recent progress in reading unconstrained printed and written text may be ascribed to more insightful handling of segmentation. This paper provides a review of these advances. The aim is to provide an appreciation for the range of techniques that have been developed, rather than to simply list sources. Segmentation methods are listed under four main headings. What may be termed the "classical" approach consists of methods that partition the input image into subimages, which are then classified. The operation of attempting to decompose the image into classifiable units is called "dissection." The second class of methods avoids dissection, and segments the image either explicitly, by classification of prespecified windows, or implicitly by classification of subsets of spatial features collected from the image as a whole. The third strategy is a hybrid of the first two, employing dissection together with recombination rules to define potential segments, but using classification to select from the range of admissible segmentation possibilities offered by these subimages. Finally, holistic approaches that avoid segmentation by recognizing entire character strings as units are described.

880 citations


Proceedings ArticleDOI
18 Jun 1996
TL;DR: A new space-sweep approach to true multi-image matching is presented that simultaneously determines 2D feature correspondences and the 3D positions of feature points in the scene.
Abstract: The problem of determining feature correspondences across multiple views is considered. The term "true multi-image" matching is introduced to describe techniques that make full and efficient use of the geometric relationships between multiple images and the scene. A true multi-image technique must generalize to any number of images, be of linear algorithmic complexity in the number of images, and use all the images in an equal manner. A new space-sweep approach to true multi-image matching is presented that simultaneously determines 2D feature correspondences and the 3D positions of feature points in the scene. The method is illustrated on a seven-image matching example from the aerial image domain.

653 citations


Journal ArticleDOI
TL;DR: Several variants of the algorithm are developed that avoid classical regularization while imposing several global cohesiveness constraints, and this is a novel approach that has the advantage of guaranteeing that solutions minimize the original cost function and preserve discontinuities.

533 citations


Journal ArticleDOI
TL;DR: A methodological framework is developed and algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture are developed.
Abstract: We develop a methodological framework and present a few different ways in which dynamic programming and compact representations can be combined to solve large scale stochastic control problems. In particular, we develop algorithms that employ two types of feature-based compact representations; that is, representations that involve feature extraction and a relatively simple approximation architecture. We prove the convergence of these algorithms and provide bounds on the approximation error. As an example, one of these algorithms is used to generate a strategy for the game of Tetris. Furthermore, we provide a counter-example illustrating the difficulties of integrating compact representations with dynamic programming, which exemplifies the shortcomings of certain simple approaches.

527 citations


Proceedings ArticleDOI
03 Oct 1996
TL;DR: A new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour, is presented, which obtains classification performance that is close to human performance on the task.
Abstract: The paper explores several statistical pattern recognition techniques to classify utterances according to their emotional content. The authors have recorded a corpus containing emotional speech with over a 1000 utterances from different speakers. They present a new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour. To make maximal use of the limited amount of training data available, they introduce a novel pattern recognition technique: majority voting of subspace specialists. Using this technique, they obtain classification performance that is close to human performance on the task.

521 citations


Proceedings ArticleDOI
14 Oct 1996
TL;DR: In this paper, a parametrized image motion of planar patches is constrained to enforce articulated motion and is solved for directly using a robust estimation technique, which provides a rich and concise description of the activity that can be used for recognition.
Abstract: We extend the work of Black and Yacoob (1995) on the tracking and recognition of human facial expressions using parametrized models of optical flow to deal with the articulated motion of human limbs. We define a "card-board person model" in which a person's limbs are represented by a set of connected planar patches. The parametrized image motion of these patches in constrained to enforce articulated motion and is solved for directly using a robust estimation technique. The recovered motion parameters provide a rich and concise description of the activity that can be used for recognition. We propose a method for performing view-based recognition of human activities from the optical flow parameters that extends previous methods to cope with the cyclical nature of human motion. We illustrate the method with examples of tracking human legs of long image sequences.

505 citations


Journal ArticleDOI
TL;DR: The corner detection scheme introduced in this paper can provide accurate information about the corners and accurately locate the templates in relation to the eye images and greatly reduce the processing time for the templates.

Journal ArticleDOI
TL;DR: Linear predictive (LP) analysis, the first step of feature extraction, is discussed, and various robust cepstral features derived from LP coefficients are described, including the afJine transform, which is a feature transformation approach that integrates mismatch to simultaneously combat both channel and noise distortion.
Abstract: The future commercialization of speaker- and speech-recognition technology is impeded by the large degradation in system performance due to environmental differences between training and testing conditions. This is known as the "mismatched condition." Studies have shown [l] that most contemporary systems achieve good recognition performance if the conditions during training are similar to those during operation (matched conditions). Frequently, mismatched conditions axe present in which the performance is dramatically degraded as compared to the ideal matched conditions. A common example of this mismatch is when training is done on clean speech and testing is performed on noise- or channel-corrupted speech. Robust speech techniques [2] attempt to maintain the performance of a speech processing system under such diverse conditions of operation. This article presents an overview of current speaker-recognition systems and the problems encountered in operation, and it focuses on the front-end feature extraction process of robust speech techniques as a method of improvement. Linear predictive (LP) analysis, the first step of feature extraction, is discussed, and various robust cepstral features derived from LP coefficients are described. Also described is the afJine transform, which is a feature transformation approach that integrates mismatch to simultaneously combat both channel and noise distortion.

Journal ArticleDOI
TL;DR: A more sophisticated handwriting recognition system that achieves a writer independent recognition rate of 94.5% on 3,823 unconstrained handwritten word samples from 18 writers covering a 32 word vocabulary is built.
Abstract: Hidden Markov model (HMM) based recognition of handwriting is now quite common, but the incorporation of HMM's into a complex stochastic language model for handwriting recognition is still in its infancy. We have taken advantage of developments in the speech processing field to build a more sophisticated handwriting recognition system. The pattern elements of the handwriting model are subcharacter stroke types modeled by HMMs. These HMMs are concatenated to form letter models, which are further embedded in a stochastic language model. In addition to better language modeling, we introduce new handwriting recognition features of various kinds. Some of these features have invariance properties, and some are segmental, covering a larger region of the input pattern. We have achieved a writer independent recognition rate of 94.5% on 3,823 unconstrained handwritten word samples from 18 writers covering a 32 word vocabulary.

Journal ArticleDOI
TL;DR: The authors show that very good diagnostic rates can be obtained using unconventional classifiers trained on actual patient data, and set to have minimum classification error, ease of implementation and learning, and the flexibility for future modifications.
Abstract: Visual criteria for diagnosing diffused liver diseases from ultrasound images can be assisted by computerized tissue classification. Feature extraction algorithms are proposed in this paper to extract the tissue characterization parameters from liver images. The resulting parameter set is further processed to obtain the minimum number of parameters which represent the most discriminating pattern space for classification. This preprocessing step has been applied to over 120 distinct pathology-investigated cases to obtain the learning data for classification. The extracted features are divided into independent training and test sets, and are used to develop and compare both statistical and neural classifiers. The optimal criteria for these classifiers are set to have minimum classification error, ease of implementation and learning, and the flexibility for future modifications. Various algorithms of classification based on statistical and neural network methods are presented and tested. The authors show that very good diagnostic rates can be obtained using unconventional classifiers trained on actual patient data.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: A Gabor feature representation for textured images is proposed, and its performance in pattern retrieval is evaluated on a large texture image database, and these features compare favorably with other existing texture representations.
Abstract: This paper addresses two important issues related to texture pattern retrieval: feature extraction and similarity search. A Gabor feature representation for textured images is proposed, and its performance in pattern retrieval is evaluated on a large texture image database. These features compare favorably with other existing texture representations. A simple hybrid neural network algorithm is used to learn the similarity by simple clustering in the texture feature space. With learning similarity the performance of similar pattern retrieval improves significantly. An important aspect of this work is its application to real image data. Texture feature extraction with similarity learning is used to search through large aerial photographs. Feature clustering enables efficient search of the database as our experimental results indicate.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: The results demonstrate that even in the absence of multiple training examples for each class, it is sometimes possible to infer from a statistical model of training data, a significantly improved distance function for use in pattern recognition.
Abstract: We consider the problem of feature-based face recognition in the setting where only a single example of each face is available for training. The mixture-distance technique we introduce achieves a recognition rate of 95% on a database of 685 people in which each face is represented by 30 measured distances. This is currently the best recorded recognition rate for a feature-based system applied to a database of this size. By comparison, nearest neighbor search using Euclidean distance yields 84%. In our work a novel distance function is constructed based on local second order statistics as estimated by modeling the training data as a mixture of normal densities. We report on the results from mixtures of several sizes. We demonstrate that a flat mixture of mixtures performs as well as the best model and therefore represents an effective solution to the model selection problem. A mixture perspective is also taken for individual Gaussians to choose between first order (variance) and second order (covariance) models. Here an approximation to flat combination is proposed and seen to perform well in practice. Our results demonstrate that even in the absence of multiple training examples for each class, it is sometimes possible to infer from a statistical model of training data, a significantly improved distance function for use in pattern recognition.

Patent
Katsumasa Onda1
09 Apr 1996
TL;DR: In this article, a matching operation is performed by comparing the image features within one window and corresponding image features on the right image, and a histogram in each block is created from disparities obtained by the matching operation based on one-dimensional windows involving pixels of a concerned block, and the specific disparity just corresponding to the peak of the obtained histogram is identified as a valid disparity representing the concerned block.
Abstract: In the image pickup phase (A), right and left images are taken in through two image-pickup devices (S101, S102). Then, in the next feature extraction phase (B), right and left images are respectively subjected to feature extraction (S103, S104). Thereafter, in the succeeding matching phase (C), the extracted features of right and left images are compared to check how they match with each other (step S105). More specifically, in the matching phase (C), a one-dimensional window is set, this one-dimensional window is shifted along the left image in accordance with a predetermined scanning rule so as to successively set overlapped one-dimensional windows, and a matching operation is performed by comparing the image features within one window and corresponding image features on the right image. Subsequently, in the disparity determination phase (D), the left image is dissected or divided into plural blocks each having a predetermined size, a histogram in each block is created from disparities obtained by the matching operation based on one-dimensional windows involving pixels of a concerned block, and a specific disparity just corresponding to the peak of thus obtained histogram is identified as a valid disparity representing the concerned block (S106).

Proceedings ArticleDOI
07 May 1996
TL;DR: It is demonstrated that the binary texture features provide excellent performance in image query response time while providing highly effective texture discriminability, accuracy in spatial localization and capability for extraction from compressed data representations.
Abstract: Digital image and video libraries require new algorithms for the automated extraction and indexing of salient image features. Texture features provide one important cue for the visual perception and discrimination of image content. We propose a new approach for automated content extraction that allows for efficient database searching using texture features. The algorithm automatically extracts texture regions from image spatial-frequency data which are represented by binary texture feature vectors. We demonstrate that the binary texture features provide excellent performance in image query response time while providing highly effective texture discriminability, accuracy in spatial localization and capability for extraction from compressed data representations. We present the binary texture feature extraction and indexing technique and examine searching by texture on a database of 500 images.

Proceedings ArticleDOI
18 Jun 1996
TL;DR: Four measures of image organizational change are proposed which can be used to monitor construction activity based on the thesis that the progress of construction will see a change in the individual image feature attributes as well as an evolution in the relationships among these features.
Abstract: We propose four measures of image organizational change which can be used to monitor construction activity The measures are based on the thesis that the progress of construction will see a change in the individual image feature attributes as well as an evolution in the relationships among these features This change in the relationship is captured by the eigenvalues and eigenvectors of the relation graph embodying the organization among the image features We demonstrate the ability of the measures to differentiate between no development, the onset of construction, and full development, on the available real test image set

Proceedings ArticleDOI
TL;DR: The VAMSplit R-tree provided better overall performance than all competing structures the authors tested for main memory and secondary memory applications, and modest improvements relative to optimized k-d tree variants.
Abstract: Efficient indexing support is essential to allow content-based image and video databases using similarity-based retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6-100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized k-d tree we call the VAM k-d tree, and provide algorithms to create an optimized R-tree we call the VAMSplit R-tree. We found that the VAMSplit R-tree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*-tree and SS-tree in secondary memory applications, and modest improvements relative to optimized k-d tree variants.© (1996) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Journal ArticleDOI
TL;DR: A neural network texture classification method is proposed that is introduced as a generalization of the multichannel filtering method, and successfully applied in the tasks of locating barcodes in the images and segmenting a printed page into text, graphics, and background.
Abstract: A neural network texture classification method is proposed in this paper. The approach is introduced as a generalization of the multichannel filtering method. Instead of using a general filter bank, a neural network is trained to find a minimal set of specific filters, so that both the feature extraction and classification tasks are performed by the same unified network. The authors compute the error rates for different network parameters, and show the convergence speed of training and node pruning algorithms. The proposed method is demonstrated in several texture classification experiments. It is successfully applied in the tasks of locating barcodes in the images and segmenting a printed page into text, graphics, and background. Compared with the traditional multichannel filtering method, the neural network approach allows one to perform the same texture classification or segmentation task more efficiently. Extensions of the method, as well as its limitations, are discussed in the paper.

Proceedings ArticleDOI
14 Oct 1996
TL;DR: Ten different feature vectors are tested in a gesture recognition task which utilizes 3D data gathered in real-time from stereo video cameras, and HMMs for learning and recognition of gestures.
Abstract: Ten different feature vectors are tested in a gesture recognition task which utilizes 3D data gathered in real-time from stereo video cameras, and HMMs for learning and recognition of gestures. Results indicate velocity features are superior to positional features, and partial rotational invariance is sufficient for good performance.

Proceedings ArticleDOI
C. Podilchuk1, Xiaoyu Zhang2
07 May 1996
TL;DR: An automatic face recognition system which is VQ-based is described and the effects of feature selection, feature dimensionality and codebook size on recognition performance in the VQ framework are examined.
Abstract: Face recognition has many applications ranging from security access to video indexing by content. We describe an automatic face recognition system which is VQ-based and examine the effects of feature selection, feature dimensionality and codebook size on recognition performance in the VQ framework. In particular, we examine DCT-based feature vectors in such a system. DCT-based feature vectors have the additional appeal that the recognition can be performed directly on the bitstream of compressed images which are DCT-based. The system described consists of three parts: a preprocessing step to segment the face, the feature selection process and the classification. Recognition rates for a database of 500 images shows promising results.

Journal ArticleDOI
TL;DR: An algorithm that generates a vertically aligned stereo pair by warped resampling is described, which uses grey scale image matching between the components of the stereo pair but confined to feature points.
Abstract: The assumption that epipolar lines are parallel to image scan lines is made in many algorithms for stereo analysis. If valid, it enables the search for corresponding image features to be confined to one dimension and, hence, simplified. An algorithm that generates a vertically aligned stereo pair by warped resampling is described. The method uses grey scale image matching between the components of the stereo pair but confined to feature points.

Patent
30 Sep 1996
TL;DR: The Rainbow Stereo 3D Camera as mentioned in this paper exploits the projected color light with a spatially distributed wavelength spectrum on the surface of objects in the scene to provide a high speed, low-cost, multi-mode 3D surface profile measurement method.
Abstract: The target of the present invention is to provide a high speed, low-cost, multi-mode three-dimensional (3D) surface profile measurement method. The proposed Rainbow Stereo 3D Camera exploits the projected color light with a spatially distributed wavelength spectrum on the surface of objects in the scene. Multiple color imaging sensors separated by a baseline distance are used to capture stereo pair images of the scene at camera's frame rate. The 3D depth values are calculated using triangulation principle by finding pixels corresponding to a common color feature in both images. Unlike conventional stereo correspondence matching algorithms, which requires the feature extraction from a group of pixels, the proposed method utilizes a projected rainbow color pattern as unique landmarks of each pixel for correspondence registration. Essentially, the colors of pixels in a pair of stereo images are used as a "token" to perform the stereo match. Searching corresponding point in a pair of stereo images becomes a straightforward pixel-to-pixel color matching. A simple and efficient 3D triangulation algorithm can be formulated to generate full frames of 3D images in high speed.

Journal ArticleDOI
TL;DR: The nature of ISAR imaging of ships, and single-frame and multiple-frame techniques for segmentation, feature extraction, and classification are described and results are shown which illustrate a capability for automatic recognition of ISar ship imagery.
Abstract: Inverse synthetic aperture radar (ISAR) produces images of ships at sea which human operators can be trained to recognize. Because ISAR uses the ship's own varying angular motions (roll, pitch, and yaw) for cross-range resolution, the viewing aspect and cross-range scale factor are continually changing on time scales of a few seconds. This and other characteristics of ISAR imaging make the problem of automatic recognition of ISAR images quite distinct from the recognition of optical images. The nature of ISAR imaging of ships, and single-frame and multiple-frame techniques for segmentation, feature extraction, and classification are described. Results are shown which illustrate a capability for automatic recognition of ISAR ship imagery.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: A real-time wide-baseline stereo person tracking system which can self-calibrate itself from watching a moving person and can subsequently track people's head and hands with RIMS errors of 1-2 cm in translation and 2 degrees in rotation.
Abstract: We describe a method for estimation of 3D geometry from 2D blob features. Blob features are clusters of similar pixels in the image plane and can arise from similarity of color, texture, motion and other signal-based metrics. The motivation for considering such features comes from recent successes in real-time extraction and tracking of such blob features in complex cluttered scenes in which traditional feature finders fail, e.g. scenes containing moving people. We use nonlinear modeling and a combination of iterative and recursive estimation methods to recover 3D geometry from blob correspondences across multiple images. The 3D geometry includes the 3D shapes, translations, and orientations of blobs and the relative orientation of the cameras. Using this technique, we have developed a real-time wide-baseline stereo person tracking system which can self-calibrate itself from watching a moving person and can subsequently track people's head and hands with RIMS errors of 1-2 cm in translation and 2 degrees in rotation. The blob formulation is efficient and reliable, running at 20-30 Hz on a pair of SGI Indy R4400 workstations with no special hardware.

Proceedings ArticleDOI
16 Sep 1996
TL;DR: This paper performs face localization based on the observation that human faces are characterized by their oval shape and skin-color, also in the case of varying light conditions, and segment faces by evaluating shape and color information.
Abstract: Recognition of human faces out of still images or image sequences is a research field of fast increasing interest. At first, facial regions and facial features like eyes and mouth have to be extracted. In the present paper we propose an approach that copes with problems of these first two steps. We perform face localization based on the observation that human faces are characterized by their oval shape and skin-color, also in the case of varying light conditions. For that we segment faces by evaluating shape and color (HSV) information. Then face hypotheses are verified by searching for facial features inside of the face-like regions. This is done by applying morphological operations and minima localization to intensity images.

Proceedings ArticleDOI
TL;DR: This paper considers the detection of areas of interest and edges in images compressed using the discrete cosine transform (DCT) and shows how a measure based on certain DCT coefficients of a block can provide an indication of underlying activity.
Abstract: This paper examines the issue of direct extraction of low level features from compressed images. Specifically, we consider the detection of areas of interest and edges in images compressed using the discrete cosine transform (DCT). For interest areas, we show how a measure based on certain DCT coefficients of a block can provide an indication of underlying activity. For edges, we show using an ideal edge model how the relative values of different DCT coefficients of a block can be used to estimate the strength and orientation of an edge. Our experimental results indicate that coarse edge information from compressed images can be extracted up to 20 times faster than conventional edge detectors.

Journal ArticleDOI
TL;DR: This method provides the physician with nonsubjective numerical values for four criteria of malignancy based on the shape and the size analysis of the observed cells and more specifically on the use of geodesy.
Abstract: Presents a new method for automatic recognition of cancerous tissues from an image of a microscopic section. Based on the shape and the size analysis of the observed cells, this method provides the physician with nonsubjective numerical values for four criteria of malignancy. This automatic approach is based on mathematical morphology, and more specifically on the use of geodesy. This technique is used first to remove the background noise from the image and then to operate a segmentation of the nuclei of the cells and an analysis of their shape, their size and their texture. From the values of the extracted criteria, an automatic classification of the image (cancerous or not) is finally operated.