Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 2006"

PDF

Open Access

Journal Article•DOI•

Face Description with Local Binary Patterns: Application to Face Recognition

[...]

Timo Ahonen¹, Abdenour Hadid¹, Matti Pietikäinen¹•Institutions (1)

01 Dec 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features that is assessed in the face recognition problem under different challenges.

...read moreread less

Abstract: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features. The face image is divided into several regions from which the LBP feature distributions are extracted and concatenated into an enhanced feature vector to be used as a face descriptor. The performance of the proposed method is assessed in the face recognition problem under different challenges. Other applications and several extensions are also discussed

...read moreread less

5,563 citations

Journal Article•DOI•

One-shot learning of object categories

[...]

Li Fei-Fei¹, Rob Fergus, Pietro Perona•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is found that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.

...read moreread less

Abstract: Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by maximum likelihood (ML) and maximum a posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.

...read moreread less

2,976 citations

Journal Article•DOI•

Random Walks for Image Segmentation

[...]

Leo Grady¹•Institutions (1)

Princeton University¹

01 Nov 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel method is proposed for performing multilabel, interactive image segmentation using combinatorial analogues of standard operators and principles from continuous potential theory, allowing it to be applied in arbitrary dimension on arbitrary graphs.

...read moreread less

Abstract: A novel method is proposed for performing multilabel, interactive image segmentation. Given a small number of pixels with user-defined (or predefined) labels, one can analytically and quickly determine the probability that a random walker starting at each unlabeled pixel will first reach one of the prelabeled pixels. By assigning each pixel to the label for which the greatest probability is calculated, a high-quality image segmentation may be obtained. Theoretical properties of this algorithm are developed along with the corresponding connections to discrete potential theory and electrical circuits. This algorithm is formulated in discrete space (i.e., on a graph) using combinatorial analogues of standard operators and principles from continuous potential theory, allowing it to be applied in arbitrary dimension on arbitrary graphs

...read moreread less

2,610 citations

Journal Article•DOI•

Rotation Forest: A New Classifier Ensemble Method

[...]

Juan J. Rodríguez, Ludmila I. Kuncheva, Carlos J. Alonso

01 Oct 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work examined the rotation forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with bagging, AdaBoost, and random forest and prompted an investigation into diversity-accuracy landscape of the ensemble models.

...read moreread less

Abstract: We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and principal component analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest". Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the rotation forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with bagging, AdaBoost, and random forest. The results were favorable to rotation forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that rotation forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and random forest, and more diverse than these in bagging, sometimes more accurate as well

...read moreread less

1,708 citations

Journal Article•DOI•

Individual recognition using gait energy image

[...]

Ju Han¹, Bir Bhanu¹•Institutions (1)

University of California, Riverside¹

01 Feb 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Experimental results show that the proposed GEI is an effective and efficient gait representation for individual recognition, and the proposed approach achieves highly competitive performance with respect to the published gait recognition approaches.

...read moreread less

Abstract: In this paper, we propose a new spatio-temporal gait representation, called Gait Energy Image (GEI), to characterize human walking properties for individual recognition by gait. To address the problem of the lack of training templates, we also propose a novel approach for human recognition by combining statistical gait features from real and synthetic templates. We directly compute the real templates from training silhouette sequences, while we generate the synthetic templates from training sequences by simulating silhouette distortion. We use a statistical approach for learning effective features from real and synthetic templates. We compare the proposed GEI-based gait recognition approach with other gait recognition approaches on USF HumanID Database. Experimental results show that the proposed GEI is an effective and efficient gait representation for individual recognition, and the proposed approach achieves highly competitive performance with respect to the published gait recognition approaches

...read moreread less

1,670 citations

Journal Article•DOI•

A texture-based method for modeling the background and detecting moving objects

[...]

Marko Heikkilä¹, Matti Pietikäinen¹•Institutions (1)

University of Oulu¹

01 Apr 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel and efficient texture-based method for modeling the background and detecting moving objects from a video sequence that provides many advantages compared to the state-of-the-art.

...read moreread less

Abstract: This paper presents a novel and efficient texture-based method for modeling the background and detecting moving objects from a video sequence. Each pixel is modeled as a group of adaptive local binary pattern histograms that are calculated over a circular region around the pixel. The approach provides us with many advantages compared to the state-of-the-art. Experimental results clearly justify our model.

...read moreread less

1,355 citations

Journal Article•DOI•

Adaptive support-weight approach for correspondence search

[...]

Kuk-Jin Yoon¹, In So Kweon¹•Institutions (1)

KAIST¹

01 Apr 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new window-based method for correspondence search using varying support-weights based on color similarity and geometric proximity to reduce the image ambiguity and outperforms other local methods on standard stereo benchmarks.

...read moreread less

Abstract: We present a new window-based method for correspondence search using varying support-weights. We adjust the support-weights of the pixels in a given support window based on color similarity and geometric proximity to reduce the image ambiguity. Our method outperforms other local methods on standard stereo benchmarks.

...read moreread less

1,267 citations

Journal Article•DOI•

On-road vehicle detection: a review

[...]

Zehang Sun, George Bebis, Ronald Hugh Miller¹•Institutions (1)

Ford Motor Company¹

01 May 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A review of recent vision-based on-road vehicle detection systems where the camera is mounted on the vehicle rather than being fixed such as in traffic/driveway monitoring systems is presented.

...read moreread less

Abstract: Developing on-board automotive driver assistance systems aiming to alert drivers about driving environments, and possible collision with other vehicles has attracted a lot of attention lately. In these systems, robust and reliable vehicle detection is a critical step. This paper presents a review of recent vision-based on-road vehicle detection systems. Our focus is on systems where the camera is mounted on the vehicle rather than being fixed such as in traffic/driveway monitoring systems. First, we discuss the problem of on-road vehicle detection using optical sensors followed by a brief review of intelligent vehicle research worldwide. Then, we discuss active and passive sensors to set the stage for vision-based vehicle detection. Methods aiming to quickly hypothesize the location of vehicles in an image as well as to verify the hypothesized locations are reviewed next. Integrating detection with tracking is also reviewed to illustrate the benefits of exploiting temporal continuity for vehicle detection. Finally, we present a critical overview of the methods discussed, we assess their potential for future deployment, and we present directions for future research.

...read moreread less

1,181 citations

Journal Article•DOI•

Convergent Tree-Reweighted Message Passing for Energy Minimization

[...]

Vladimir Kolmogorov¹•Institutions (1)

University College London¹

01 Oct 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The sequential tree-reweighted message passing (STE-TRW) algorithm as discussed by the authors is a modification of Tree-Reweighted Maximum Product Message Passing (TRW), which was proposed by Wainwright et al.

...read moreread less

Abstract: Algorithms for discrete energy minimization are of fundamental importance in computer vision. In this paper, we focus on the recent technique proposed by Wainwright et al. (Nov. 2005)- tree-reweighted max-product message passing (TRW). It was inspired by the problem of maximizing a lower bound on the energy. However, the algorithm is not guaranteed to increase this bound - it may actually go down. In addition, TRW does not always converge. We develop a modification of this algorithm which we call sequential tree-reweighted message passing. Its main property is that the bound is guaranteed not to decrease. We also give a weak tree agreement condition which characterizes local maxima of the bound with respect to TRW algorithms. We prove that our algorithm has a limit point that achieves weak tree agreement. Finally, we show that, our algorithm requires half as much memory as traditional message passing approaches. Experimental results demonstrate that on certain synthetic and real problems, our algorithm outperforms both the ordinary belief propagation and tree-reweighted algorithm in (M. J. Wainwright, et al., Nov. 2005). In addition, on stereo problems with Potts interactions, we obtain a lower energy than graph cuts

...read moreread less

1,116 citations

Journal Article•DOI•

Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval

[...]

Dacheng Tao¹, Xiaoou Tang, Xuelong Li, Xindong Wu•Institutions (1)

University of London¹

01 Jul 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An asymmetric bagging and random subspace SVM (ABRS-SVM) is built to solve three problems and further improve the relevance feedback performance.

...read moreread less

Abstract: Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: 1) an SVM classifier is unstable on a small-sized training set, 2) SVM's optimal hyperplane may be biased when the positive feedback samples are much less than the negative feedback samples, and 3) overfitting happens because the number of feature dimensions is much higher than the size of the training set. In this paper, we develop a mechanism to overcome these problems. To address the first two problems, we propose an asymmetric bagging-based SVM (AB-SVM). For the third problem, we combine the random subspace method and SVM for relevance feedback, which is named random subspace SVM (RS-SVM). Finally, by integrating AB-SVM and RS-SVM, an asymmetric bagging and random subspace SVM (ABRS-SVM) is built to solve these three problems and further improve the relevance feedback performance

...read moreread less

916 citations

Journal Article•DOI•

Recovering 3D human pose from monocular images

[...]

Ankur Agarwal¹, Bill Triggs¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Jan 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A learning-based method for recovering 3D human body pose from single images and monocular image sequences, embedded in a novel regressive tracking framework, using dynamics from the previous state estimate together with a learned regression value to disambiguate the pose.

...read moreread less

Abstract: We describe a learning-based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labeling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogram-of-shape-contexts descriptors. We evaluate several different regression methods: ridge regression, relevance vector machine (RVM) regression, and support vector machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. The loss of depth and limb labeling information often makes the recovery of 3D pose from single silhouettes ambiguous. To handle this, the method is embedded in a novel regressive tracking framework, using dynamics from the previous state estimate together with a learned regression value to disambiguate the pose. We show that the resulting system tracks long sequences stably. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated for several representations of full body pose, both quantitatively on independent but similar test data and qualitatively on real image sequences. Mean angular errors of 4-6/spl deg/ are obtained for a variety of walking motions.

...read moreread less

Journal Article•DOI•

Keypoint recognition using randomized trees

[...]

Vincent Lepetit¹, Pascal Fua¹•Institutions (1)

École Normale Supérieure¹

01 Sep 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A keypoint-based approach is developed that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem, which shifts much of the computational burden to a training phase, without sacrificing recognition performance.

...read moreread less

Abstract: In many 3D object-detection and pose-estimation problems, runtime performance is of critical importance. However, there usually is time to train the system, which we would show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. As a result, the resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. This reduction in runtime computational complexity is our first contribution. Our second contribution is to show that, in this context, a simple and fast keypoint detector suffices to support detection and tracking even under large perspective and scale variations. While earlier methods require a detector that can be expected to produce very repeatable results, in general, which usually is very time-consuming, we simply find the most repeatable object keypoints for the specific target object during the training phase. We have incorporated these ideas into a real-time system that detects planar, nonplanar, and deformable objects. It then estimates the pose of the rigid ones and the deformations of the others

...read moreread less

Journal Article•DOI•

A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses

[...]

Juho Kannala¹, Sami S. Brandt•Institutions (1)

University of Oulu¹

01 Aug 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A generic camera model is proposed, which is suitable for fish-eye lens cameras as well as for conventional and wide-angle lens cameras, and a calibration method for estimating the parameters of the model is presented.

...read moreread less

Abstract: Fish-eye lenses are convenient in such applications where a very wide angle of view is needed, but their use for measurement purposes has been limited by the lack of an accurate, generic, and easy-to-use calibration procedure. We hence propose a generic camera model, which is suitable for fish-eye lens cameras as well as for conventional and wide-angle lens cameras, and a calibration method for estimating the parameters of the model. The achieved level of calibration accuracy is comparable to the previously reported state-of-the-art

...read moreread less

Journal Article•DOI•

MILES: Multiple-Instance Learning via Embedded Instance Selection

[...]

Yixin Chen, Jinbo Bi, James Z. Wang

01 Dec 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes a learning method, MILES (multiple-instance learning via embedded instance selection), which converts the multiple- instance learning problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels.

...read moreread less

Abstract: Multiple-instance problems arise from the situations where training class labels are attached to sets of samples (named bags), instead of individual samples within each bag (called instances). Most previous multiple-instance learning (MIL) algorithms are developed based on the assumption that a bag is positive if and only if at least one of its instances is positive. Although the assumption works well in a drug activity prediction problem, it is rather restrictive for other applications, especially those in the computer vision area. We propose a learning method, MILES (multiple-instance learning via embedded instance selection), which converts the multiple-instance learning problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels. MILES maps each bag into a feature space defined by the instances in the training bags via an instance similarity measure. This feature mapping often provides a large number of redundant or irrelevant features. Hence, 1-norm SVM is applied to select important features as well as construct classifiers simultaneously. We have performed extensive experiments. In comparison with other methods, MILES demonstrates competitive classification accuracy, high computation efficiency, and robustness to labeling uncertainty

...read moreread less

Journal Article•DOI•

Optimal Surface Segmentation in Volumetric Images-A Graph-Theoretic Approach

[...]

Kang Li, Xiaodong Wu, Danny Z. Chen, Milan Sonka

01 Jan 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An optimal surface detection method capable of simultaneously detecting multiple interacting surfaces, in which the optimality is controlled by the cost functions designed for individual surfaces and by several geometric constraints defining the surface smoothness and interrelations is developed.

...read moreread less

Abstract: Efficient segmentation of globally optimal surfaces representing object boundaries in volumetric data sets is important and challenging in many medical image analysis applications. We have developed an optimal surface detection method capable of simultaneously detecting multiple interacting surfaces, in which the optimality is controlled by the cost functions designed for individual surfaces and by several geometric constraints defining the surface smoothness and interrelations. The method solves the surface segmentation problem by transforming it into computing a minimum s{\hbox{-}} t cut in a derived arc-weighted directed graph. The proposed algorithm has a low-order polynomial time complexity and is computationally efficient. It has been extensively validated on more than 300 computer-synthetic volumetric images, 72 CT-scanned data sets of different-sized plexiglas tubes, and tens of medical images spanning various imaging modalities. In all cases, the approach yielded highly accurate results. Our approach can be readily extended to higher-dimensional image segmentation.

...read moreread less

Journal Article•DOI•

Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization

[...]

Stephane Lafon¹, Ann B. Lee²•Institutions (2)

Google¹, Carnegie Mellon University²

01 Sep 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is shown that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusionspace and a precise measure of the performance of general clustering algorithms.

...read moreread less

Abstract: We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithms

...read moreread less

Journal Article•DOI•

Multisurface proximal support vector machine classification via generalized eigenvalues

[...]

Olvi L. Mangasarian¹, E.W. Wild¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Tests on simple examples as well as on a number of public data sets show the advantages of the proposed approach in both computation time and test set correctness.

...read moreread less

Abstract: A new approach to support vector machine (SVM) classification is proposed wherein each of two data sets are proximal to one of two distinct planes that are not parallel to each other. Each plane is generated such that it is closest to one of the two data sets and as far as possible from the other data set. Each of the two nonparallel proximal planes is obtained by a single MATLAB command as the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem. Classification by proximity to two distinct nonlinear surfaces generated by a nonlinear kernel also leads to two simple generalized eigenvalue problems. The effectiveness of the proposed method is demonstrated by tests on simple examples as well as on a number of public data sets. These examples show the advantages of the proposed approach in both computation time and test set correctness.

...read moreread less

Journal Article•DOI•

A coherent computational approach to model bottom-up visual attention

[...]

O. Le Meur, P. Le Callet¹, D. Barba¹, Dominique Thoreau•Institutions (1)

Institut de Recherche en Communications et Cybernétique de Nantes¹

01 May 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents a coherent computational approach to the modeling of the bottom-up visual attention, mainly based on the current understanding of the HVS behavior, which includes Contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions.

...read moreread less

Abstract: Visual attention is a mechanism which filters out redundant visual information and detects the most relevant parts of our visual field. Automatic determination of the most visually relevant areas would be useful in many applications such as image and video coding, watermarking, video browsing, and quality assessment. Many research groups are currently investigating computational modeling of the visual attention system. The first published computational models have been based on some basic and well-understood human visual system (HVS) properties. These models feature a single perceptual layer that simulates only one aspect of the visual system. More recent models integrate complex features of the HVS and simulate hierarchical perceptual representation of the visual input. The bottom-up mechanism is the most occurring feature found in modern models. This mechanism refers to involuntary attention (i.e., salient spatial visual features that effortlessly or involuntary attract our attention). This paper presents a coherent computational approach to the modeling of the bottom-up visual attention. This model is mainly based on the current understanding of the HVS behavior. Contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions are some of the features implemented in this model. The performances of this algorithm are assessed by using natural images and experimental measurements from an eye-tracking system. Two adequate well-known metrics (correlation coefficient and Kullbacl-Leibler divergence) are used to validate this model. A further metric is also defined. The results from this model are finally compared to those from a reference bottom-up model.

...read moreread less

Journal Article•DOI•

On the removal of shadows from images

[...]

Graham D. Finlayson¹, S.D. Hordley¹, Cheng Lu², Mark S. Drew•Institutions (2)

Norwich University¹, Simon Fraser University²

01 Jan 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper shows how to recover a 3D, full color shadow-free image representation by first (with the help of the 2D representation) identifying shadow edges and proposing a method to reintegrate this thresholded edge map, thus deriving the sought-after 3D shadow- free image.

...read moreread less

Abstract: This paper is concerned with the derivation of a progression of shadow-free image representations. First, we show that adopting certain assumptions about lights and cameras leads to a 1D, gray-scale image representation which is illuminant invariant at each image pixel. We show that as a consequence, images represented in this form are shadow-free. We then extend this 1D representation to an equivalent 2D, chromaticity representation. We show that in this 2D representation, it is possible to relight all the image pixels in the same way, effectively deriving a 2D image representation which is additionally shadow-free. Finally, we show how to recover a 3D, full color shadow-free image representation by first (with the help of the 2D representation) identifying shadow edges. We then remove shadow edges from the edge-map of the original image by edge in-painting and we propose a method to reintegrate this thresholded edge map, thus deriving the sought-after 3D shadow-free image.

...read moreread less

Journal Article•DOI•

An Experimental Study on Pedestrian Classification

[...]

S. Munder¹, Dariu M. Gavrila¹•Institutions (1)

Daimler AG¹

01 Nov 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An in-depth experimental study on pedestrian classification; multiple feature-classifier combinations are examined with respect to their ROC performance and efficiency and show that the novel combination of SVMs with LRF features performs best.

...read moreread less

Abstract: Detecting people in images is key for several important application domains in computer vision. This paper presents an in-depth experimental study on pedestrian classification; multiple feature-classifier combinations are examined with respect to their ROC performance and efficiency. We investigate global versus local and adaptive versus nonadaptive features, as exemplified by PCA coefficients, Haar wavelets, and local receptive fields (LRFs). In terms of classifiers, we consider the popular support vector machines (SVMs), feedforward neural networks, and k-nearest neighbor classifier. Experiments are performed on a large data set consisting of 4,000 pedestrian and more than 25,000 nonpedestrian (labeled) images captured in outdoor urban environments. Statistically meaningful results are obtained by analyzing performance variances caused by varying training and test sets. Furthermore, we investigate how classification performance and training sample size are correlated. Sample size is adjusted by increasing the number of manually labeled training data or by employing automatic bootstrapping or cascade techniques. Our experiments show that the novel combination of SVMs with LRF features performs best. A boosted cascade of Haar wavelets can, however, reach quite competitive results, at a fraction of computational cost. The data set used in this paper is made public, establishing a benchmark for this important problem

...read moreread less

Journal Article•DOI•

Full-frame video stabilization with motion inpainting

[...]

Yasuyuki Matsushita¹, Eyal Ofek¹, Weina Ge², Xiaoou Tang, Heung-Yeung Shum - Show less +1 more•Institutions (2)

Microsoft¹, Pennsylvania State University²

01 Jul 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes a practical and robust approach of video stabilization that produces full-frame stabilized videos with good visual quality and develops a complete video stabilizer which can naturally keep the original image quality in the stabilized videos.

...read moreread less

Abstract: Video stabilization is an important video enhancement technology which aims at removing annoying shaky motion from videos. We propose a practical and robust approach of video stabilization that produces full-frame stabilized videos with good visual quality. While most previous methods end up with producing smaller size stabilized videos, our completion method can produce full-frame videos by naturally filling in missing image parts by locally aligning image data of neighboring frames. To achieve this, motion inpainting is proposed to enforce spatial and temporal consistency of the completion in both static and dynamic image areas. In addition, image quality in the stabilized video is enhanced with a new practical deblurring algorithm. Instead of estimating point spread functions, our method transfers and interpolates sharper image pixels of neighboring frames to increase the sharpness of the frame. The proposed video completion and deblurring methods enabled us to develop a complete video stabilizer which can naturally keep the original image quality in the stabilized videos. The effectiveness of our method is confirmed by extensive experiments over a wide variety of videos

...read moreread less

Journal Article•DOI•

A system for learning statistical motion patterns

[...]

Weiming Hu¹, Xuejuan Xiao¹, Zhouyu Fu¹, Dan Xie¹, Tieniu Tan¹, Stephen J. Maybank - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

01 Sep 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.

...read moreread less

Abstract: Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy k-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction

...read moreread less

Journal Article•DOI•

Three-Dimensional Model-Based Object Recognition and Segmentation in Cluttered Scenes

[...]

Ajmal Mian, Mohammed Bennamoun, Robyn Owens

01 Oct 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel 3D model-based algorithm is presented which performs viewpoint independent recognition of free-form objects and their segmentation in the presence of clutter and occlusions automatically and efficiently and is superior in terms of recognition rate and efficiency.

...read moreread less

Abstract: Viewpoint independent recognition of free-form objects and their segmentation in the presence of clutter and occlusions is a challenging task. We present a novel 3D model-based algorithm which performs this task automatically and efficiently. A 3D model of an object is automatically constructed offline from its multiple unordered range images (views). These views are converted into multidimensional table representations (which we refer to as tensors). Correspondences are automatically established between these views by simultaneously matching the tensors of a view with those of the remaining views using a hash table-based voting scheme. This results in a graph of relative transformations used to register the views before they are integrated into a seamless 3D model. These models and their tensor representations constitute the model library. During online recognition, a tensor from the scene is simultaneously matched with those in the library by casting votes. Similarity measures are calculated for the model tensors which receive the most votes. The model with the highest similarity is transformed to the scene and, if it aligns accurately with an object in the scene, that object is declared as recognized and is segmented. This process is repeated until the scene is completely segmented. Experiments were performed on real and synthetic data comprised of 55 models and 610 scenes and an overall recognition rate of 95 percent was achieved. Comparison with the spin images revealed that our algorithm is superior in terms of recognition rate and efficiency

...read moreread less

Journal Article•DOI•

Offline Arabic handwriting recognition: a survey

[...]

L.M. Lorigo¹, Venu Govindaraju¹•Institutions (1)

State University of New York System¹

01 May 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper is the first survey to focus on Arabic handwriting recognition and the first Arabic character recognition survey to provide recognition rates and descriptions of test data for the approaches discussed.

...read moreread less

Abstract: The automatic recognition of text on scanned images has enabled many applications such as searching for words in large volumes of documents, automatic sorting of postal mail, and convenient editing of previously printed documents. The domain of handwriting in the Arabic script presents unique technical challenges and has been addressed more recently than other domains. Many different methods have been proposed and applied to various types of images. This paper provides a comprehensive review of these methods. It is the first survey to focus on Arabic handwriting recognition and the first Arabic character recognition survey to provide recognition rates and descriptions of test data for the approaches discussed. It includes background on the field, discussion of the methods, and future research directions.

...read moreread less

Journal Article•DOI•

Model-based hand tracking using a hierarchical Bayesian filter

[...]

Bjorn Stenger¹, Arasanathan Thayananthan², Philip H. S. Torr, Roberto Cipolla•Institutions (2)

Toshiba¹, University of Cambridge²

01 Sep 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper sets out a tracking framework, which is applied to the recovery of three-dimensional hand motion from an image sequence, and handles the issues of initialization, tracking, and recovery in a unified way.

...read moreread less

Abstract: This paper sets out a tracking framework, which is applied to the recovery of three-dimensional hand motion from an image sequence. The method handles the issues of initialization, tracking, and recovery in a unified way. In a single input image with no prior information of the hand pose, the algorithm is equivalent to a hierarchical detection scheme, where unlikely pose candidates are rapidly discarded. In image sequences, a dynamic model is used to guide the search and approximate the optimal filtering equations. A dynamic model is given by transition probabilities between regions in parameter space and is learned from training data obtained by capturing articulated motion. The algorithm is evaluated on a number of image sequences, which include hand motion with self-occlusion in front of a cluttered background

...read moreread less

Journal Article•DOI•

Total variation models for variable lighting face recognition

[...]

Terrence Chen¹, Wotao Yin², Xiang Sean Zhou³, Dorin Comaniciu, Thomas S. Huang - Show less +1 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Columbia University², IEEE Computer Society³

01 Sep 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The logarithmic total variation (LTV) model is presented, which has the ability to factorize a single face image and obtain the illumination invariant facial structure, which is then used for face recognition.

...read moreread less

Abstract: In this paper, we present the logarithmic total variation (LTV) model for face recognition under varying illumination, including natural lighting conditions, where we rarely know the strength, direction, or number of light sources. The proposed LTV model has the ability to factorize a single face image and obtain the illumination invariant facial structure, which is then used for face recognition. Our model is inspired by the SQI model but has better edge-preserving ability and simpler parameter selection. The merit of this model is that neither does it require any lighting assumption nor does it need any training. The LTV model reaches very high recognition rates in the tests using both Yale and CMU PIE face databases as well as a face database containing 765 subjects under outdoor lighting conditions

...read moreread less

Journal Article•DOI•

Activity Recognition of Assembly Tasks Using Body-Worn Microphones and Accelerometers

[...]

Jamie A. Ward, Paul Lukowicz, Gerhard Tröster, Thad Starner

01 Oct 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work describes a method for the recognition of activities that are characterized by a hand motion and an accompanying sound using microphones and three-axis accelerometers mounted at two positions on the user's arms using on-body sensing.

...read moreread less

Abstract: In order to provide relevant information to mobile users, such as workers engaging in the manual tasks of maintenance and assembly, a wearable computer requires information about the user's specific activities. This work focuses on the recognition of activities that are characterized by a hand motion and an accompanying sound. Suitable activities can be found in assembly and maintenance work. Here, we provide an initial exploration into the problem domain of continuous activity recognition using on-body sensing. We use a mock "wood workshop" assembly task to ground our investigation. We describe a method for the continuous recognition of activities (sawing, hammering, filing, drilling, grinding, sanding, opening a drawer, tightening a vise, and turning a screwdriver) using microphones and three-axis accelerometers mounted at two positions on the user's arms. Potentially "interesting" activities are segmented from continuous streams of data using an analysis of the sound intensity detected at the two different locations. Activity classification is then performed on these detected segments using linear discriminant analysis (LDA) on the sound channel and hidden Markov models (HMMs) on the acceleration data. Four different methods at classifier fusion are compared for improving these classifications. Using user-dependent training, we obtain continuous average recall and precision rates (for positive activities) of 78 percent and 74 percent, respectively. Using user-independent training (leave-one-out across five users), we obtain recall rates of 66 percent and precision rates of 63 percent. In isolation, these activities were recognized with accuracies of 98 percent, 87 percent, and 95 percent for the user-dependent, user-independent, and user-adapted cases, respectively

...read moreread less

Journal Article•DOI•

Random Multispace Quantization as an Analytic Mechanism for BioHashing of Biometric and Random Identity Inputs

[...]

Andrew Beng Jin Teoh¹, Alwyn Goh, David Chek Ling Ngo•Institutions (1)

Multimedia University¹

01 Dec 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The previously presented biometric-hash framework prescribes the integration of external randomness with user-specific biometrics, resulting in bitstring outputs with security characteristics comparable to cryptographic ciphers or hashes, which are explained in this paper as arising from the random multispace quantization of biometric and external random inputs.

...read moreread less

Abstract: Biometric analysis for identity verification is becoming a widespread reality. Such implementations necessitate large-scale capture and storage of biometric data, which raises serious issues in terms of data privacy and (if such data is compromised) identity theft. These problems stem from the essential permanence of biometric data, which (unlike secret passwords or physical tokens) cannot be refreshed or reissued if compromised. Our previously presented biometric-hash framework prescribes the integration of external (password or token-derived) randomness with user-specific biometrics, resulting in bitstring outputs with security characteristics (i.e., noninvertibility) comparable to cryptographic ciphers or hashes. The resultant BioHashes are hence cancellable, i.e., straightforwardly revoked and reissued (via refreshed password or reissued token) if compromised. BioHashing furthermore enhances recognition effectiveness, which is explained in this paper as arising from the random multispace quantization (RMQ) of biometric and external random inputs

...read moreread less

Journal Article•DOI•

Generic object recognition with boosting

[...]

Andreas Opelt¹, Axel Pinz¹, Michael Fussenegger¹, Peter Auer²•Institutions (2)

Graz University of Technology¹, University of Leoben²

01 Mar 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity, and uses Boosting to learn a subset of feature vectors (weak hypotheses) and to combine them into one final hypothesis for each visual category.

...read moreread less

Abstract: This paper explores the power and the limitations of weakly supervised categorization. We present a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity. A variety of local descriptors can be applied to form a set of feature vectors for each local region. Boosting is used to learn a subset of such feature vectors (weak hypotheses) and to combine them into one final hypothesis for each visual category. This combination of individual extractors and descriptors leads to recognition rates that are superior to other approaches which use only one specific extractor/descriptor setting. To explore the limitation of our system, we had to set up new, highly complex image databases that show the objects of interest at varying scales and poses, in cluttered background, and under considerable occlusion. We obtain classification results up to 81 percent ROC-equal error rate on the most complex of our databases. Our approach outperforms all comparable solutions on common databases.

...read moreread less

Journal Article•DOI•

Nonsmooth nonnegative matrix factorization (nsNMF)

[...]

Alberto Pascual-Montano¹, José María Carazo, Kieko Kochi, Dietrich Lehmann, Roberto D. Pascual-Marqui - Show less +1 more•Institutions (1)

Complutense University of Madrid¹

01 Mar 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Comparisons to previously published methods show that the new nsNMF method has some advantages in keeping faithfulness to the data in the achieving a high degree of sparseness for both the estimated basis and the encoding vectors and in better interpretability of the factors.

...read moreread less

Abstract: We propose a novel nonnegative matrix factorization model that aims at finding localized, part-based, representations of nonnegative multivariate data items. Unlike the classical nonnegative matrix factorization (NMF) technique, this new model, denoted "nonsmooth nonnegative matrix factorization" (nsNMF), corresponds to the optimization of an unambiguous cost function designed to explicitly represent sparseness, in the form of nonsmoothness, which is controlled by a single parameter. In general, this method produces a set of basis and encoding vectors that are not only capable of representing the original data, but they also extract highly focalized patterns, which generally lend themselves to improved interpretability. The properties of this new method are illustrated with several data sets. Comparisons to previously published methods show that the new nsNMF method has some advantages in keeping faithfulness to the data in the achieving a high degree of sparseness for both the estimated basis and the encoding vectors and in better interpretability of the factors.

...read moreread less

Collapse