scispace - formally typeset
Search or ask a question

Showing papers on "Feature (computer vision) published in 2004"


Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations


Journal ArticleDOI
TL;DR: A new technique coined two-dimensional principal component analysis (2DPCA) is developed for image representation that is based on 2D image matrices rather than 1D vectors so the image matrix does not need to be transformed into a vector prior to feature extraction.
Abstract: In this paper, a new technique coined two-dimensional principal component analysis (2DPCA) is developed for image representation. As opposed to PCA, 2DPCA is based on 2D image matrices rather than 1D vectors so the image matrix does not need to be transformed into a vector prior to feature extraction. Instead, an image covariance matrix is constructed directly using the original image matrices, and its eigenvectors are derived for image feature extraction. To test 2DPCA and evaluate its performance, a series of experiments were performed on three face image databases: ORL, AR, and Yale face databases. The recognition rate across all trials was higher using 2DPCA than PCA. The experimental results also indicated that the extraction of image features is computationally more efficient using 2DPCA than PCA.

3,439 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: This paper examines (and improves upon) the local image descriptor used by SIFT, and demonstrates that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation.
Abstract: Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid (June 2003) recently evaluated a variety of approaches and identified the SIFT [D. G. Lowe, 1999] algorithm as being the most resistant to common image deformations. This paper examines (and improves upon) the local image descriptor used by SIFT. Like SIFT, our descriptors encode the salient aspects of the image gradient in the feature point's neighborhood; however, instead of using SIFT's smoothed weighted histograms, we apply principal components analysis (PCA) to the normalized gradient patch. Our experiments demonstrate that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation. We also present results showing that using these descriptors in an image retrieval application results in increased accuracy and faster matching.

3,325 citations


Journal ArticleDOI
TL;DR: This paper identifies some promising techniques for image retrieval according to standard principles and examines implementation procedures for each technique and discusses its advantages and disadvantages.

1,910 citations


Journal ArticleDOI
TL;DR: Quantitative evaluation and comparison show that the proposed Bayesian framework for foreground object detection in complex environments provides much improved results.
Abstract: This paper addresses the problem of background modeling for foreground object detection in complex environments. A Bayesian framework that incorporates spectral, spatial, and temporal features to characterize the background appearance is proposed. Under this framework, the background is represented by the most significant and frequent features, i.e., the principal features , at each pixel. A Bayes decision rule is derived for background and foreground classification based on the statistics of principal features. Principal feature representation for both the static and dynamic background pixels is investigated. A novel learning method is proposed to adapt to both gradual and sudden "once-off" background changes. The convergence of the learning process is analyzed and a formula to select a proper learning rate is derived. Under the proposed framework, a novel algorithm for detecting foreground objects from complex environments is then established. It consists of change detection, change classification, foreground segmentation, and background maintenance. Experiments were conducted on image sequences containing targets of interest in a variety of environments, e.g., offices, public buildings, subway stations, campuses, parking lots, airports, and sidewalks. Good results of foreground detection were obtained. Quantitative evaluation and comparison with the existing method show that the proposed method provides much improved results.

1,120 citations


Journal ArticleDOI
TL;DR: In this article, a new nonlinear process monitoring technique based on kernel principal component analysis (KPCA) is developed, which can efficiently compute principal components in high-dimensional feature spaces by means of integral operators and nonlinear kernel functions.

927 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: This work shows how it can do both automatic image annotation and retrieval (using one word queries) from images and videos using a multiple Bernoulli relevance model, which significantly outperforms previously reported results on the task of image and video annotation.
Abstract: Retrieving images in response to textual queries requires some knowledge of the semantics of the picture. Here, we show how we can do both automatic image annotation and retrieval (using one word queries) from images and videos using a multiple Bernoulli relevance model. The model assumes that a training set of images or videos along with keyword annotations is provided. Multiple keywords are provided for an image and the specific correspondence between a keyword and an image is not provided. Each image is partitioned into a set of rectangular regions and a real-valued feature vector is computed over these regions. The relevance model is a joint probability distribution of the word annotations and the image feature vectors and is computed using the training set. The word probabilities are estimated using a multiple Bernoulli model and the image feature probabilities using a non-parametric kernel density estimate. The model is then used to annotate images in a test set. We show experiments on both images from a standard Corel data set and a set of video key frames from NIST's video tree. Comparative experiments show that the model performs better than a model based on estimating word probabilities using the popular multinomial distribution. The results also show that our model significantly outperforms previously reported results on the task of image and video annotation.

815 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.
Abstract: This paper gives an algorithm for detecting and reading text in natural images. The algorithm is intended for use by blind and visually impaired subjects walking through city scenes. We first obtain a dataset of city images taken by blind and normally sighted subjects. From this dataset, we manually label and extract the text regions. Next we perform statistical analysis of the text regions to determine which image features are reliable indicators of text and have low entropy (i.e. feature response is similar for all text images). We obtain weak classifiers by using joint probabilities for feature responses on and off text. These weak classifiers are used as input to an AdaBoost machine learning algorithm to train a strong classifier. In practice, we trained a cascade with 4 strong classifiers containing 79 features. An adaptive binarization and extension algorithm is applied to those regions selected by the cascade classifier. Commercial OCR software is used to read the text or reject it as a non-text region. The overall algorithm has a success rate of over 90% (evaluated by complete detection and reading of the text) on the test set and the unread text is typically small and distant from the viewer.

686 citations


Journal ArticleDOI
TL;DR: This paper proposes the concept of feature saliency and introduces an expectation-maximization algorithm to estimate it, in the context of mixture-based clustering, and extends the criterion and algorithm to simultaneously estimate the feature saliencies and the number of clusters.
Abstract: Clustering is a common unsupervised learning technique used to discover group structure in a set of data. While there exist many algorithms for clustering, the important issue of feature selection, that is, what attributes of the data should be used by the clustering algorithms, is rarely touched upon. Feature selection for clustering is difficult because, unlike in supervised learning, there are no class labels for the data and, thus, no obvious criteria to guide the search. Another important problem in clustering is the determination of the number of clusters, which clearly impacts and is influenced by the feature selection issue. In this paper, we propose the concept of feature saliency and introduce an expectation-maximization (EM) algorithm to estimate it, in the context of mixture-based clustering. Due to the introduction of a minimum message length model selection criterion, the saliency of irrelevant features is driven toward zero, which corresponds to performing feature selection. The criterion and algorithm are then extended to simultaneously estimate the feature saliencies and the number of clusters.

655 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: A new method for the modeling and subtraction of scenes that consist of static or quasi-static structures that exhibits a persistent dynamic behavior in time is proposed and extensive experiments demonstrate the utility and performance of the proposed approach.
Abstract: Background modeling is an important component of many vision systems. Existing work in the area has mostly addressed scenes that consist of static or quasi-static structures. When the scene exhibits a persistent dynamic behavior in time, such an assumption is violated and detection performance deteriorates. In this paper, we propose a new method for the modeling and subtraction of such scenes. Towards the modeling of the dynamic characteristics, optical flow is computed and utilized as a feature in a higher dimensional space. Inherent ambiguities in the computation of features are addressed by using a data-dependent bandwidth for density estimation using kernels. Extensive experiments demonstrate the utility and performance of the proposed approach.

648 citations


Journal ArticleDOI
TL;DR: This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough set-based methodologies, and several approaches to feature selection based on rough set theory are experimentally compared.
Abstract: Semantics-preserving dimensionality reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition, and signal processing. This has found successful application in tasks that involve data sets containing huge numbers of features (in the order of tens of thousands), which would be impossible to process further. Recent examples include text processing and Web content classification. One of the many successful applications of rough set theory has been to this feature selection area. This paper reviews those techniques that preserve the underlying semantics of the data, using crisp and fuzzy rough set-based methodologies. Several approaches to feature selection based on rough set theory are experimentally compared. Additionally, a new area in feature selection, feature grouping, is highlighted and a rough set-based feature grouping technique is detailed.

Journal ArticleDOI
TL;DR: It is shown that an efficient face detection system does not require any costly local preprocessing before classification of image areas, and provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases.
Abstract: In this paper, we present a novel face detection approach based on a convolutional neural architecture, designed to robustly detect highly variable face patterns, rotated up to /spl plusmn/20 degrees in image plane and turned up to /spl plusmn/60 degrees, in complex real world images. The proposed system automatically synthesizes simple problem-specific feature extractors from a training set of face and nonface patterns, without making any assumptions or using any hand-made design concerning the features to extract or the areas of the face pattern to analyze. The face detection procedure acts like a pipeline of simple convolution and subsampling modules that treat the raw input image as a whole. We therefore show that an efficient face detection system does not require any costly local preprocessing before classification of image areas. The proposed scheme provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases. We present extensive experimental results illustrating the efficiency of the proposed approach on difficult test sets and including an in-depth sensitivity analysis with respect to the degrees of variability of the face patterns.

Patent
31 Aug 2004
TL;DR: In this article, the authors present a system for obtaining information about occupancy of a compartment in a movable object in which at least first and second optical imagers obtain images of a common area of the compartment and spaced apart from one another.
Abstract: System and method for obtaining information about occupancy of a compartment in a movable object in which at least first and second optical imagers obtain images of a common area of the compartment and spaced apart from one another. Processing circuitry derives information from the images obtained by the imagers. A light source may illuminate the common area of the compartment and be interposed between the imagers. The processing circuitry can include a microprocessor with at least one pattern recognition algorithm and be arranged to determine the distance between the imagers and an object in the common area by locating a specific feature in the common area by first locating the feature in only the image obtained by one imager, then determining the location of the same feature in the image obtained by another imager, and determining the distance of the feature from the imagers by triangulation.

Journal ArticleDOI
TL;DR: A view-based approach to recognize humans from their gait by employing a hidden Markov model (HMM) and the statistical nature of the HMM lends overall robustness to representation and recognition.
Abstract: We propose a view-based approach to recognize humans from their gait. Two different image features have been considered: the width of the outer contour of the binarized silhouette of the walking person and the entire binary silhouette itself. To obtain the observation vector from the image features, we employ two different methods. In the first method, referred to as the indirect approach, the high-dimensional image feature is transformed to a lower dimensional space by generating what we call the frame to exemplar (FED) distance. The FED vector captures both structural and dynamic traits of each individual. For compact and effective gait representation and recognition, the gait information in the FED vector sequences is captured in a hidden Markov model (HMM). In the second method, referred to as the direct approach, we work with the feature vector directly (as opposed to computing the FED) and train an HMM. We estimate the HMM parameters (specifically the observation probability B) based on the distance between the exemplars and the image features. In this way, we avoid learning high-dimensional probability density functions. The statistical nature of the HMM lends overall robustness to representation and recognition. The performance of the methods is illustrated using several databases.

Proceedings ArticleDOI
27 Jun 2004
TL;DR: It is proved that an efficient, globally optimal algorithm exists for the co- embedding problem and an important sub-family of correspondence functions can be reduced to co-embedding prototypes and segments to N-D Euclidean space.
Abstract: We present an unsupervised technique for detecting unusual activity in a large video set using many simple features. No complex activity models and no supervised feature selections are used. We divide the video into equal length segments and classify the extracted features into prototypes, from which a prototype-segment co-occurrence matrix is computed. Motivated by a similar problem in document-keyword analysis, we seek a correspondence relationship between prototypes and video segments which satisfies the transitive closure constraint. We show that an important sub-family of correspondence functions can be reduced to co-embedding prototypes and segments to N-D Euclidean space. We prove that an efficient, globally optimal algorithm exists for the co-embedding problem. Experiments on various real-life videos have validated our approach.

Book ChapterDOI
23 May 2004
TL;DR: In this article, a feature-based steganalytic method for JPEG images is proposed, where the features are calculated as an L 1 norm of the difference between a specific macroscopic functional calculated from the stego image and the same functional obtained from a decompressed, cropped, and recompressed stegos image.
Abstract: In this paper, we introduce a new feature-based steganalytic method for JPEG images and use it as a benchmark for comparing JPEG steganographic algorithms and evaluating their embedding mechanisms. The detection method is a linear classifier trained on feature vectors corresponding to cover and stego images. In contrast to previous blind approaches, the features are calculated as an L1 norm of the difference between a specific macroscopic functional calculated from the stego image and the same functional obtained from a decompressed, cropped, and recompressed stego image. The functionals are built from marginal and joint statistics of DCT coefficients. Because the features are calculated directly from DCT coefficients, conclusions can be drawn about the impact of embedding modifications on detectability. Three different steganographic paradigms are tested and compared. Experimental results reveal new facts about current steganographic methods for JPEGs and new de-sign principles for more secure JPEG steganography.

Journal ArticleDOI
TL;DR: The proposed feature selection scheme has shown to provide more accurate defect classification with fewer feature inputs than using all features initially considered relevant, and confirms its utility as an effective tool for machine health assessment.
Abstract: The sensitivity of various features that are characteristic of a machine defect may vary considerably under different operating conditions. Hence it is critical to devise a systematic feature selection scheme that provides guidance on choosing the most representative features for defect classification. This paper presents a feature selection scheme based on the principal component analysis (PCA) method. The effectiveness of the scheme was verified experimentally on a bearing test bed, using both supervised and unsupervised defect classification approaches. The objective of the study was to identify the severity level of bearing defects, where no a priori knowledge on the defect conditions was available. The proposed scheme has shown to provide more accurate defect classification with fewer feature inputs than using all features initially considered relevant. The result confirms its utility as an effective tool for machine health assessment.

Journal ArticleDOI
TL;DR: Recursive Feature Elimination and Zero-Norm Optimization which are based on the training of support vector machines (SVM) can provide more accurate solutions than standard filter methods for feature selection for EEG channels.
Abstract: Designing a brain computer interface (BCI) system one can choose from a variety of features that may be useful for classifying brain activity during a mental task. For the special case of classifying electroencephalogram (EEG) signals we propose the usage of the state of the art feature selection algorithms Recursive Feature Elimination and Zero-Norm Optimization which are based on the training of support vector machines (SVM) . These algorithms can provide more accurate solutions than standard filter methods for feature selection . We adapt the methods for the purpose of selecting EEG channels. For a motor imagery paradigm we show that the number of used channels can be reduced significantly without increasing the classification error. The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks. Furthermore we show how time dependent task specific information can be visualized.

Book ChapterDOI
30 Aug 2004
TL;DR: In this paper, the authors propose a cardinality-based notation for feature modeling, which integrates a number of existing extensions of previous approaches, and then introduce and motivate the novel concept of staged configuration.
Abstract: Feature modeling is an important approach to capturing commonalities and variabilities in system families and product lines. In this paper, we propose a cardinality-based notation for feature modeling, which integrates a number of existing extensions of previous approaches. We then introduce and motivate the novel concept of staged configuration. Staged configuration can be achieved by the stepwise specialization of feature models. This is important because in a realistic development process, different groups and different people eliminate product variability in different stages. We also indicate how cardinality-based feature models and their specialization can be given a precise formal semantics.

Journal ArticleDOI
TL;DR: This work proposes a general approach for the design of 2D feature detectors from a class of steerable functions based on the optimization of a Canny-like criterion that yields operators that have a better orientation selectivity than the classical gradient or Hessian-based detectors.
Abstract: We propose a general approach for the design of 2D feature detectors from a class of steerable functions based on the optimization of a Canny-like criterion. In contrast with previous computational designs, our approach is truly 2D and provides filters that have closed-form expressions. It also yields operators that have a better orientation selectivity than the classical gradient or Hessian-based detectors. We illustrate the method with the design of operators for edge and ridge detection. We present some experimental results that demonstrate the performance improvement of these new feature detectors. We propose computationally efficient local optimization algorithms for the estimation of feature orientation. We also introduce the notion of shape-adaptable feature detection and use it for the detection of image corners.

Journal ArticleDOI
01 Jun 2004
TL;DR: An automated system that is developed to recognize facial gestures in static, frontal- and/or profile-view color face images using rule-based reasoning and a recognition rate of 86% is achieved.
Abstract: Automatic recognition of facial gestures (i.e., facial muscle activity) is rapidly becoming an area of intense interest in the research field of machine vision. In this paper, we present an automated system that we developed to recognize facial gestures in static, frontal- and/or profile-view color face images. A multidetector approach to facial feature localization is utilized to spatially sample the profile contour and the contours of the facial components such as the eyes and the mouth. From the extracted contours of the facial features, we extract ten profile-contour fiducial points and 19 fiducial points of the contours of the facial components. Based on these, 32 individual facial muscle actions (AUs) occurring alone or in combination are recognized using rule-based reasoning. With each scored AU, the utilized algorithm associates a factor denoting the certainty with which the pertinent AU has been scored. A recognition rate of 86% is achieved.

Journal ArticleDOI
TL;DR: The most relevant experiences devoted to the use of infrared thermography in three main fields, i.e. thermo-fluid dynamics, technology and cultural heritage, which have been performed in the department the authors belong to are described in this article.
Abstract: Infrared thermography transforms the thermal energy, emitted by objects in the infrared band of the electromagnetic spectrum, into a visible image. This feature represents a great potentiality to be exploited in many fields, but this technique is still not adequately enclosed in industrial instrumentation because of a lack of adequate knowledge; at first sight, it seems too expensive and difficult to use. The aim of the present paper is to shortly overview existing work and to describe the most relevant experiences devoted to the use of infrared thermography in three main fields, i.e. thermo-fluid dynamics, technology and cultural heritage, which have been performed in the department the authors belong to. Results may be regarded from two points of view, either as validating infrared thermography as a full measurement instrument, or as presenting infrared thermography as a novel technique able to deal with several requirements, which are difficult to perform with other techniques. This study is also an attempt to give indications for a synergic use of the different thermographic methods and sharing experiences in the different fields.

Proceedings ArticleDOI
18 Dec 2004
TL;DR: A novel face detection approach using improved local binary patterns (ILBP) as facial representation that considers both local shape and texture information instead of raw grayscale information and it is robust to illumination variation.
Abstract: In this paper, we present a novel face detection approach using improved local binary patterns (ILBP) as facial representation. ILBP feature is an improvement of LBP feature that considers both local shape and texture information instead of raw grayscale information and it is robust to illumination variation. We model the face and non-face class using multivariable Gaussian model and classify them under Bayesian framework. Extensive experiments show that the proposed method has an encouraging performance.

Journal ArticleDOI
TL;DR: In this paper, a simple linear interpolation technique is proposed in order to derive absorption-band position, depth and asymmetry from hyperspectral image data, which can be used to bridge the gap between field geochemistry and remote sensing.

Journal ArticleDOI
TL;DR: The main characteristic of the proposed methodology is that, by applying multivariate pattern classification methods, it can detect subtle and spatially complex patterns of morphological group differences which are often not detectable by voxel-based morphometric methods, because these methods analyze morphological measurements voxels-by-voxel and do not consider the entirety of the data simultaneously.

Proceedings ArticleDOI
02 Nov 2004
TL;DR: A method for handling multiple hypotheses for potential edge-locations that is similar in speed to approaches that consider only single hypotheses and therefore much faster than conventional multiple-hypothesis ones is proposed.
Abstract: We present an effective way to combine the information provided by edges and by feature points for the purpose of robust real-time 3-D tracking. This lets our tracker handle both textured and untextured objects. As it can exploit more of the image information, it is more stable and less prone to drift that purely edge or feature-based ones. We start with a feature-point based tracker we developed in earlier work and integrate the ability to take edge-information into account. Achieving optimal performance in the presence of cluttered or textured backgrounds, however, is far from trivial because of the many spurious edges that bedevil typical edge-detectors. We overcome this difficulty by proposing a method for handling multiple hypotheses for potential edge-locations that is similar in speed to approaches that consider only single hypotheses and therefore much faster than conventional multiple-hypothesis ones. This results in a real-time 3-D tracking algorithm that exploits both texture and edge information without being sensitive to misleading background information and that does not drift over time.

Proceedings ArticleDOI
17 May 2004
TL;DR: This paper presents a frequency analysis-based method for instantaneous estimation of class separability, without the need for any training, and builds detectors for the most promising candidates, their receiver operating characteristics confirming the estimates.
Abstract: Vision-based hand gesture interfaces require fast and extremely robust hand detection. Here, we study view-specific hand posture detection with an object recognition method proposed by Viola and Jones. Training with this method is computationally very expensive, prohibiting the evaluation of many hand appearances for their suitability to detection. In this paper, we present a frequency analysis-based method for instantaneous estimation of class separability, without the need for any training. We built detectors for the most promising candidates, their receiver operating characteristics confirming the estimates. Next, we found that classification accuracy increases with a more expressive feature type. Lastly, we show that further optimization of training parameters yields additional detection rate improvements. In summary, we present a systematic approach to building an extremely robust hand appearance detector, providing an important step towards easily deployable and reliable vision-based hand gesture interfaces.

Journal ArticleDOI
TL;DR: This paper presents the extension of spatiotemporal PLS to fMRI, explaining the theoretical foundation and application to an fMRI study of auditory and visual perceptual memory and several unique observations by ST-PLS, including enhanced statistical power.

Journal ArticleDOI
TL;DR: The proposed approach to personal verification using the thermal images of palm-dorsa vein patterns is valid and effective for vein-pattern verification and introduces a logical and reasonable method to select a trained threshold for verification.
Abstract: A novel approach to personal verification using the thermal images of palm-dorsa vein patterns is presented in this paper. The characteristics of the proposed method are that no prior knowledge about the objects is necessary and the parameters can be set automatically. In our work, an infrared (IR) camera is adopted as the input device to capture the thermal images of the palm-dorsa. In the proposed approach, two of the finger webs are automatically selected as the datum points to define the region of interest (ROI) on the thermal images. Within each ROI, feature points of the vein patterns (FPVPs) are extracted by modifying the basic tool of watershed transformation based on the properties of thermal images. According to the heat conduction law (the Fourier law), multiple features can be extracted from each FPVP for verification. Multiresolution representations of images with FPVPs are obtained using multiple multiresolution filters (MRFs) that extract the dominant points by filtering miscellaneous features for each FPVP. A hierarchical integrating function is then applied to integrate multiple features and multiresolution representations. The former is integrated by an inter-to-intra personal variation ratio and the latter is integrated by a positive Boolean function. We also introduce a logical and reasonable method to select a trained threshold for verification. Experiments were conducted using the thermal images of palm-dorsas and the results are satisfactory with an acceptable accuracy rate (FRR:2.3% and FAR:2.3%). The experimental results demonstrate that our proposed approach is valid and effective for vein-pattern verification.

Journal ArticleDOI
TL;DR: This paper shows that an appropriate assignment of feature-weight can improve the performance of fuzzy c-means clustering and is given by learning according to the gradient descent technique.