Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 2008"

PDF

Open Access

Journal Article•DOI•

Stereo Processing by Semiglobal Matching and Mutual Information

[...]

01 Feb 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper describes the Semi-Global Matching (SGM) stereo method, which uses a pixelwise, Mutual Information based matching cost for compensating radiometric differences of input images and demonstrates a tolerance against a wide range of radiometric transformations.

...read moreread less

Abstract: This paper describes the semiglobal matching (SGM) stereo method. It uses a pixelwise, mutual information (Ml)-based matching cost for compensating radiometric differences of input images. Pixelwise matching is supported by a smoothness constraint that is usually expressed as a global cost function. SGM performs a fast approximation by pathwise optimizations from all directions. The discussion also addresses occlusion detection, subpixel refinement, and multibaseline matching. Additionally, postprocessing steps for removing outliers, recovering from specific problems of structured environments, and the interpolation of gaps are presented. Finally, strategies for processing almost arbitrarily large images and fusion of disparity images using orthographic projection are proposed. A comparison on standard stereo images shows that SGM is among the currently top-ranked algorithms and is best, if subpixel accuracy is considered. The complexity is linear to the number of pixels and disparity range, which results in a runtime of just 1-2 seconds on typical test images. An in depth evaluation of the Ml-based matching cost demonstrates a tolerance against a wide range of radiometric transformations. Finally, examples of reconstructions from huge aerial frame and pushbroom images demonstrate that the presented ideas are working well on practical problems.

...read moreread less

3,302 citations

Journal Article•DOI•

80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

[...]

Antonio Torralba¹, Rob Fergus², William T. Freeman¹•Institutions (2)

Massachusetts Institute of Technology¹, New York University²

01 Nov 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

...read moreread less

Abstract: With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

...read moreread less

1,871 citations

Journal Article•DOI•

A Closed-Form Solution to Natural Image Matting

[...]

Anat Levin¹, Dani Lischinski¹, Yair Weiss¹•Institutions (1)

Hebrew University of Jerusalem¹

01 Feb 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A closed-form solution to natural image matting that allows us to find the globally optimal alpha matte by solving a sparse linear system of equations and predicts the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms.

...read moreread less

Abstract: Interactive digital matting, the process of extracting a foreground object from an image based on limited user input, is an important task in image and video editing. From a computer vision perspective, this task is extremely challenging because it is massively ill-posed - at each pixel we must estimate the foreground and the background colors, as well as the foreground opacity ("alpha matte") from a single color measurement. Current approaches either restrict the estimation to a small part of the image, estimating foreground and background colors based on nearby pixels where they are known, or perform iterative nonlinear estimation by alternating foreground and background color estimation with alpha estimation. In this paper, we present a closed-form solution to natural image matting. We derive a cost function from local smoothness assumptions on foreground and background colors and show that in the resulting expression, it is possible to analytically eliminate the foreground and background colors to obtain a quadratic cost function in alpha. This allows us to find the globally optimal alpha matte by solving a sparse linear system of equations. Furthermore, the closed-form formula allows us to predict the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms. We show that high-quality mattes for natural images may be obtained from a small amount of user input.

...read moreread less

1,851 citations

Journal Article•DOI•

A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors

[...]

Richard Szeliski¹, Ramin Zabih², Daniel Scharstein³, Olga Veksler⁴, Vladimir Kolmogorov⁵, Aseem Agarwala⁶, Marshall F. Tappen⁷, Carsten Rother¹ - Show less +4 more•Institutions (7)

Microsoft¹, Cornell University², Middlebury College³, University of Western Ontario⁴, University College London⁵, Adobe Systems⁶, University of Central Florida⁷

01 Jun 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A set of energy minimization benchmarks are described and used to compare the solution quality and runtime of several common energy minimizations algorithms and a general-purpose software interface is provided that allows vision researchers to easily switch between optimization methods.

...read moreread less

Abstract: Among the most exciting advances in early vision has been the development of efficient energy minimization algorithms for pixel-labeling tasks such as depth or texture computation. It has been known for decades that such problems can be elegantly expressed as Markov random fields, yet the resulting energy minimization problems have been widely viewed as intractable. Algorithms such as graph cuts and loopy belief propagation (LBP) have proven to be very powerful: For example, such methods form the basis for almost all the top-performing stereo methods. However, the trade-offs among different energy minimization algorithms are still not well understood. In this paper, we describe a set of energy minimization benchmarks and use them to compare the solution quality and runtime of several common energy minimization algorithms. We investigate three promising methods-graph cuts, LBP, and tree-reweighted message passing-in addition to the well-known older iterated conditional mode (ICM) algorithm. Our benchmark problems are drawn from published energy functions used for stereo, image stitching, interactive segmentation, and denoising. We also provide a general-purpose software interface that allows vision researchers to easily switch between optimization methods. The benchmarks, code, images, and results are available at http://vision.middlebury.edu/MRF/.

...read moreread less

1,065 citations

Journal Article•DOI•

Pedestrian Detection via Classification on Riemannian Manifolds

[...]

Oncel Tuzel¹, Fatih Porikli², Peter Meer¹•Institutions (2)

Rutgers University¹, Mitsubishi Electric Research Laboratories²

01 Oct 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel approach for classifying points lying on a connected Riemannian manifold using the geometry of the space of d-dimensional nonsingular covariance matrices as object descriptors.

...read moreread less

Abstract: We present a new algorithm to detect pedestrian in still images utilizing covariance matrices as object descriptors. Since the descriptors do not form a vector space, well known machine learning techniques are not well suited to learn the classifiers. The space of d-dimensional nonsingular covariance matrices can be represented as a connected Riemannian manifold. The main contribution of the paper is a novel approach for classifying points lying on a connected Riemannian manifold using the geometry of the space. The algorithm is tested on INRIA and DaimlerChrysler pedestrian datasets where superior detection rates are observed over the previous approaches.

...read moreread less

1,044 citations

Journal Article•DOI•

Emotion recognition based on physiological changes in music listening

[...]

Jonghwa Kim¹, Elisabeth André¹•Institutions (1)

University of Augsburg¹

01 Dec 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel scheme of emotion-specific multilevel dichotomous classification (EMDC) is developed and compared with direct multiclass classification using the pLDA, with improved recognition accuracy of 95 percent and 70 percent for subject-dependent and subject-independent classification, respectively.

...read moreread less

Abstract: Little attention has been paid so far to physiological signals for emotion recognition compared to audiovisual emotion channels such as facial expression or speech. This paper investigates the potential of physiological signals as reliable channels for emotion recognition. All essential stages of an automatic recognition system are discussed, from the recording of a physiological data set to a feature-based multiclass classification. In order to collect a physiological data set from multiple subjects over many weeks, we used a musical induction method that spontaneously leads subjects to real emotional states, without any deliberate laboratory setting. Four-channel biosensors were used to measure electromyogram, electrocardiogram, skin conductivity, and respiration changes. A wide range of physiological features from various analysis domains, including time/frequency, entropy, geometric analysis, subband spectra, multiscale entropy, etc., is proposed in order to find the best emotion-relevant features and to correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by classification results. Classification of four musical emotions (positive/high arousal, negative/high arousal, negative/low arousal, and positive/low arousal) is performed by using an extended linear discriminant analysis (pLDA). Furthermore, by exploiting a dichotomic property of the 2D emotion model, we develop a novel scheme of emotion-specific multilevel dichotomous classification (EMDC) and compare its performance with direct multiclass classification using the pLDA. An improved recognition accuracy of 95 percent and 70 percent for subject-dependent and subject-independent classification, respectively, is achieved by using the EMDC scheme.

...read moreread less

953 citations

Journal Article•DOI•

Multicamera People Tracking with a Probabilistic Occupancy Map

[...]

François Fleuret¹, Jérôme Berclaz¹, R. Lengagne², Pascal Fua³•Institutions (3)

École Normale Supérieure¹, GE Security², École Polytechnique Fédérale de Lausanne³

01 Feb 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is demonstrated that the generative model can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori.

...read moreread less

Abstract: Given two to four synchronized video streams taken at eye level and from different angles, we show that we can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes. In addition, we also derive metrically accurate trajectories for each of them. Our contribution is twofold. First, we demonstrate that our generative model can effectively handle occlusions in each time frame independently, even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori. Second, we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences, provided that a reasonable heuristic is used to rank these individuals and that we avoid confusing them with one another.

...read moreread less

865 citations

Journal Article•DOI•

Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors

[...]

Amit Adam¹, Ehud Rivlin¹, Ilan Shimshoni², David Reinitz³•Institutions (3)

Technion – Israel Institute of Technology¹, University of Haifa², Rafael Advanced Defense Systems³

01 Mar 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel algorithm for detection of certain types of unusual events based on multiple local monitors which collect low-level statistics that is robust and works well in crowded scenes where tracking-based algorithms are likely to fail.

...read moreread less

Abstract: We present a novel algorithm for detection of certain types of unusual events. The algorithm is based on multiple local monitors which collect low-level statistics. Each local monitor produces an alert if its current measurement is unusual and these alerts are integrated to a final decision regarding the existence of an unusual event. Our algorithm satisfies a set of requirements that are critical for successful deployment of any large-scale surveillance system. In particular, it requires a minimal setup (taking only a few minutes) and is fully automatic afterwards. Since it is not based on objects' tracks, it is robust and works well in crowded scenes where tracking-based algorithms are likely to fail. The algorithm is effective as soon as sufficient low-level observations representing the routine activity have been collected, which usually happens after a few minutes. Our algorithm runs in real-time. It was tested on a variety of real-life crowded scenes. A ground-truth was extracted for these scenes, with respect to which detection and false-alarm rates are reported.

...read moreread less

822 citations

Journal Article•

Automatic age estimation based on facial aging patterns (vol 29, pg 2234, 2007)

[...]

Xin Geng, Zhi-Hua Zhou, Kate Smith-Miles

01 Jan 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

814 citations

Journal Article•DOI•

Scene Classification Using a Hybrid Generative/Discriminative Approach

[...]

Anna Bosch, Andrew Zisserman¹, X. Muoz•Institutions (1)

University of Oxford¹

01 Apr 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work introduces a novel vocabulary using dense color SIFT descriptors and investigates the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM).

...read moreread less

Abstract: We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos.

...read moreread less

778 citations

Journal Article•DOI•

Principal Component Analysis Based on L1-Norm Maximization

[...]

Nojun Kwak¹•Institutions (1)

Ajou University¹

01 Sep 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A method of principal component analysis (PCA) based on a new L1-norm optimization technique which is robust to outliers and invariant to rotations and also proven to find a locally maximal solution.

...read moreread less

Abstract: A method of principal component analysis (PCA) based on a new L1-norm optimization technique is proposed. Unlike conventional PCA which is based on L2-norm, the proposed method is robust to outliers because it utilizes L1-norm which is less sensitive to outliers. It is invariant to rotations as well. The proposed L1-norm optimization technique is intuitive, simple, and easy to implement. It is also proven to find a locally maximal solution. The proposed method is applied to several datasets and the performances are compared with those of other conventional methods.

...read moreread less

Journal Article•DOI•

Real-Time Computerized Annotation of Pictures

[...]

Jia Li¹, James Z. Wang¹•Institutions (1)

Pennsylvania State University¹

01 Jun 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: New optimization and estimation techniques to address two fundamental problems in machine learning are developed, which serve as the basis for the Automatic Linguistic Indexing of Pictures - Real Time (ALIPR) system of fully automatic and high speed annotation for online pictures.

...read moreread less

Abstract: Developing effective methods for automated annotation of digital pictures continues to challenge computer scientists. The capability of annotating pictures by computers can lead to breakthroughs in a wide range of applications, including Web image search, online picture-sharing communities, and scientific experiments. In this work, the authors developed new optimization and estimation techniques to address two fundamental problems in machine learning. These new techniques serve as the basis for the automatic linguistic indexing of pictures - real time (ALIPR) system of fully automatic and high-speed annotation for online pictures. In particular, the D2-clustering method, in the same spirit as K-Means for vectors, is developed to group objects represented by bags of weighted vectors. Moreover, a generalized mixture modeling technique (kernel smoothing as a special case) for nonvector data is developed using the novel concept of hypothetical local mapping (HLM). ALIPR has been tested by thousands of pictures from an Internet photo-sharing site, unrelated to the source of those pictures used in the training process. Its performance has also been studied at an online demonstration site, where arbitrary users provide pictures of their choices and indicate the correctness of each annotation word. The experimental results show that a single computer processor can suggest annotation terms in real time and with good accuracy.

...read moreread less

Journal Article•DOI•

Groups of Adjacent Contour Segments for Object Detection

[...]

Vittorio Ferrari¹, L. Fevrier, Frédéric Jurie, Cordelia Schmid•Institutions (1)

University of Oxford¹

01 Jan 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is shown that kAS substantially outperform IPs for detecting shape-based classes, and the object detector is compared to the recent state-of-the-art system by Dalal and Triggs (2005).

...read moreread less

Abstract: We present a family of scale-invariant local shape features formed by chains of k connected roughly straight contour segments (kAS), and their use for object class detection. kAS are able to cleanly encode pure fragments of an object boundary without including nearby clutter. Moreover, they offer an attractive compromise between information content and repeatability and encompass a wide variety of local shape structures. We also define a translation and scale invariant descriptor encoding the geometric configuration of the segments within a kAS, making kAS easy to reuse in other frameworks, for example, as a replacement or addition to interest points (IPs). Software for detecting and describing kAS is released at http://lear.inrialpes.fr/software. We demonstrate the high performance of kAS within a simple but powerful sliding-window object detection scheme. Through extensive evaluations, involving eight diverse object classes and more than 1,400 images, we (1) study the evolution of performance as the degree of feature complexity k varies and determine the best degree, (2) show that kAS substantially outperform IPs for detecting shape-based classes, and (3) compare our object detector to the recent state-of-the-art system by Dalal and Triggs (2005).

...read moreread less

Journal Article•DOI•

Parametric Image Alignment Using Enhanced Correlation Coefficient Maximization

[...]

Georgios Evangelidis¹, Emmanouil Z. Psarakis¹•Institutions (1)

University of Patras¹

01 Oct 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes the use of a modified version of the correlation coefficient as a performance criterion for the image alignment problem and proposes an efficient approximation that leads to a closed form solution which is of low computational complexity.

...read moreread less

Abstract: In this work we propose the use of a modified version of the correlation coefficient as a performance criterion for the image alignment problem. The proposed modification has the desirable characteristic of being invariant with respect to photometric distortions. Since the resulting similarity measure is a nonlinear function of the warp parameters, we develop two iterative schemes for its maximization, one based on the forward additive approach and the second on the inverse compositional method. As it is customary in iterative optimization, in each iteration the nonlinear objective function is approximated by an alternative expression for which the corresponding optimization is simple. In our case we propose an efficient approximation that leads to a closed form solution (per iteration) which is of low computational complexity, the latter property being particularly strong in our inverse version. The proposed schemes are tested against the forward additive Lucas-Kanade and the simultaneous inverse compositional algorithm through simulations. Under noisy conditions and photometric distortions our forward version achieves more accurate alignments and exhibits faster convergence whereas our inverse version has similar performance as the simultaneous inverse compositional algorithm but at a lower computational complexity.

...read moreread less

Journal Article•DOI•

Likelihood Ratio-Based Biometric Score Fusion

[...]

Karthik Nandakumar¹, Yi Chen¹, Sarat C. Dass¹, Anil K. Jain¹•Institutions (1)

Michigan State University¹

01 Feb 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Experiments on three multibiometric databases indicate that the proposed fusion framework achieves consistently high performance compared to commonly used score fusion techniques based on score transformation and classification.

...read moreread less

Abstract: Multibiometric systems fuse information from different sources to compensate for the limitations in performance of individual matchers. We propose a framework for the optimal combination of match scores that is based on the likelihood ratio test. The distributions of genuine and impostor match scores are modeled as finite Gaussian mixture model. The proposed fusion approach is general in its ability to handle 1) discrete values in biometric match score distributions, 2) arbitrary scales and distributions of match scores, 3) correlation between the scores of multiple matchers, and 4) sample quality of multiple biometric sources. Experiments on three multibiometric databases indicate that the proposed fusion framework achieves consistently high performance compared to commonly used score fusion techniques based on score transformation and classification.

...read moreread less

Journal Article•DOI•

Automatic Estimation and Removal of Noise from a Single Image

[...]

Ce Liu¹, Richard Szeliski², Sing Bing Kang², C.L. Zitnick², William T. Freeman¹ - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

01 Feb 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A unified framework for automatic estimation and removal of color noise from a single image using piecewise smooth image models is proposed and an upper bound of the real NLF is estimated by fitting a lower envelope to the standard deviations of per-segment image variances.

...read moreread less

Abstract: Image denoising algorithms often assume an additive white Gaussian noise (AWGN) process that is independent of the actual RGB values. Such approaches cannot effectively remove color noise produced by today's CCD digital camera. In this paper, we propose a unified framework for two tasks: automatic estimation and removal of color noise from a single image using piecewise smooth image models. We introduce the noise level function (NLF), which is a continuous function describing the noise level as a function of image brightness. We then estimate an upper bound of the real NLF by fitting a lower envelope to the standard deviations of per-segment image variances. For denoising, the chrominance of color noise is significantly removed by projecting pixel values onto a line fit to the RGB values in each segment. Then, a Gaussian conditional random field (GCRF) is constructed to obtain the underlying clean image from the noisy input. Extensive experiments are conducted to test the proposed algorithm, which is shown to outperform state-of-the-art denoising algorithms.

...read moreread less

Journal Article•DOI•

Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors

[...]

Lorenzo Torresani¹, Aaron Hertzmann², Christoph Bregler³•Institutions (3)

Microsoft¹, University of Toronto², New York University³

01 May 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A reconstruction method using a Probabilistic Principal Components Analysis shape model and an estimation algorithm that simultaneously estimates 3D shape and motion for each instant, learns the PPCA model parameters, and robustly fills-in missing data points is proposed.

...read moreread less

Abstract: This paper describes methods for recovering time-varying shape and motion of nonrigid 3D objects from uncalibrated 2D point tracks. For example, given a video recording of a talking person, we would like to estimate the 3D shape of the face at each instant and learn a model of facial deformation. Time-varying shape is modeled as a rigid transformation combined with a nonrigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed, and thus additional assumptions about deformations are required. We first suggest restricting shapes to lie within a low-dimensional subspace and describe estimation algorithms. However, this restriction alone is insufficient to constrain reconstruction. To address these problems, we propose a reconstruction method using a Probabilistic Principal Components Analysis (PPCA) shape model and an estimation algorithm that simultaneously estimates 3D shape and motion for each instant, learns the PPCA model parameters, and robustly fills-in missing data points. We then extend the model to represent temporal dynamics in object shape, allowing the algorithm to robustly handle severe cases of missing data.

...read moreread less

Journal Article•DOI•

VisualRank: Applying PageRank to Large-Scale Image Search

[...]

Yushi Jing¹, Shumeet Baluja²•Institutions (2)

Georgia Institute of Technology¹, Google²

01 Nov 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work cast the image-ranking problem into the task of identifying "authority" nodes on an inferred visual similarity graph and proposes VisualRank to analyze the visual link structures among images and describes the techniques required to make this system practical for large-scale deployment in commercial search engines.

...read moreread less

Abstract: Because of the relative ease in understanding and processing text, commercial image-search systems often rely on techniques that are largely indistinguishable from text search. Recently, academic studies have demonstrated the effectiveness of employing image-based features to provide either alternative or additional signals to use in this process. However, it remains uncertain whether such techniques will generalize to a large number of popular Web queries and whether the potential improvement to search quality warrants the additional computational cost. In this work, we cast the image-ranking problem into the task of identifying "authority" nodes on an inferred visual similarity graph and propose VisualRank to analyze the visual link structures among images. The images found to be "authorities" are chosen as those that answer the image-queries well. To understand the performance of such an approach in a real system, we conducted a series of large-scale experiments based on the task of retrieving images for 2,000 of the most popular products queries. Our experimental results show significant improvement, in terms of user satisfaction and relevancy, in comparison to the most recent Google image search results. Maintaining modest computational cost is vital to ensuring that this procedure can be used in practice; we describe the techniques required to make this system practical for large-scale deployment in commercial search engines.

...read moreread less

Journal Article•DOI•

Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures

[...]

Antoni B. Chan¹, Nuno Vasconcelos¹•Institutions (1)

University of California, San Diego¹

01 May 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture.

...read moreread less

Abstract: A dynamic texture is a spatio-temporal generative model for video, which represents video sequences as observations from a linear dynamical system. This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture. An expectation-maximization (EM) algorithm is derived for learning the parameters of the model, and the model is related to previous works in linear systems, machine learning, time- series clustering, control theory, and computer vision. Through experimentation, it is shown that the mixture of dynamic textures is a suitable representation for both the appearance and dynamics of a variety of visual processes that have traditionally been challenging for computer vision (for example, fire, steam, water, vehicle and pedestrian traffic, and so forth). When compared with state-of-the-art methods in motion segmentation, including both temporal texture methods and traditional representations (for example, optical flow or other localized motion representations), the mixture of dynamic textures achieves superior performance in the problems of clustering and segmenting video of such processes.

...read moreread less

Journal Article•DOI•

Riemannian Manifold Learning

[...]

Tong Lin¹, Hongbin Zha¹•Institutions (1)

Peking University¹

01 May 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel framework based on the assumption that the input high-dimensional data lie on an intrinsically low-dimensional Riemannian manifold, which can learn intrinsic geometric structures of the data, preserve radial geodesic distances, and yield regular embeddings.

...read moreread less

Abstract: Recently, manifold learning has been widely exploited in pattern recognition, data analysis, and machine learning. This paper presents a novel framework, called Riemannian manifold learning (RML), based on the assumption that the input high-dimensional data lie on an intrinsically low-dimensional Riemannian manifold. The main idea is to formulate the dimensionality reduction problem as a classical problem in Riemannian geometry, that is, how to construct coordinate charts for a given Riemannian manifold? We implement the Riemannian normal coordinate chart, which has been the most widely used in Riemannian geometry, for a set of unorganized data points. First, two input parameters (the neighborhood size k and the intrinsic dimension d) are estimated based on an efficient simplicial reconstruction of the underlying manifold. Then, the normal coordinates are computed to map the input high-dimensional data into a low- dimensional space. Experiments on synthetic data, as well as real-world images, demonstrate that our algorithm can learn intrinsic geometric structures of the data, preserve radial geodesic distances, and yield regular embeddings.

...read moreread less

Journal Article•DOI•

Optimal Randomized RANSAC

[...]

Ondrej Chum, Jiri Matas

01 Aug 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A randomized model verification strategy for RANSAC that removes the requirement for a priori knowledge of the fraction of outliers and estimates the quantity online, and has performance close to the theoretically optimal and is up to four times faster than previously published methods.

...read moreread less

Abstract: A randomized model verification strategy for RANSAC is presented. The proposed method finds, like RANSAC, a solution that is optimal with user-specified probability. The solution is found in time that is close to the shortest possible and superior to any deterministic verification strategy. A provably fastest model verification strategy is designed for the (theoretical) situation when the contamination of data by outliers is known. In this case, the algorithm is the fastest possible (on the average) of all randomized RANSAC algorithms guaranteeing a confidence in the solution. The derivation of the optimality property is based on Wald's theory of sequential decision making, in particular, a modified sequential probability ratio test (SPRT). Next, the R-RANSAC with SPRT algorithm is introduced. The algorithm removes the requirement for a priori knowledge of the fraction of outliers and estimates the quantity online. We show experimentally that on standard test data, the method has performance close to the theoretically optimal and is 2 to 10 times faster than standard RANSAC and is up to four times faster than previously published methods.

...read moreread less

Journal Article•DOI•

Segmentation and Tracking of Multiple Humans in Crowded Environments

[...]

Tao Zhao, Ramakant Nevatia¹, Bo Wu¹•Institutions (1)

University of Southern California¹

01 Jul 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework is proposed, which defines a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation.

...read moreread less

Abstract: Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion. We propose a model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework. We define a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation. The optimal solution is obtained by using an efficient sampling method, data-driven Markov chain Monte Carlo (DDMCMC), which uses image observations for proposal probabilities. Knowledge of various aspects, including human shape, camera model, and image cues, are integrated in one theoretically sound framework. We present experimental results and quantitative evaluation, demonstrating that the resulting approach is effective for very challenging data.

...read moreread less

Journal Article•DOI•

Multiscale Categorical Object Recognition Using Contour Fragments

[...]

Jamie Shotton¹, Andrew Blake¹, Roberto Cipolla¹•Institutions (1)

Toshiba¹

01 Jul 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale, is proposed and compared with other methods based on contour and local descriptors in a detailed evaluation over 17 challenging categories.

...read moreread less

Abstract: Psychophysical studies show that we can recognize objects using fragments of outline contour alone. This paper proposes a new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale. The system first builds a class-specific codebook of local fragments of contour using a novel formulation of chamfer matching. These local fragments allow recognition that is robust to within-class variation, pose changes, and articulation. Boosting combines these fragments into a cascaded sliding-window classifier, and mean shift is used to select strong responses as a final set of detection. We show how learning can be performed iteratively on both training and test sets to bootstrap an improved classifier. We compare with other methods based on contour and local descriptors in our detailed evaluation over 17 challenging categories and obtain highly competitive results. The results confirm that contour is indeed a powerful cue for multiscale and multiclass visual object recognition.

...read moreread less

Journal Article•DOI•

Comments on "Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Application to Face and Palm Biometrics"

[...]

Weihong Deng¹, Jiani Hu¹, Jun Guo¹, Honggang Zhang¹, Chuang Zhang¹ - Show less +1 more•Institutions (1)

Beijing University of Posts and Telecommunications¹

01 Aug 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This communication shows that the basic ideas of UDP and LPP are identical, and UDP is just a simplified version of LPP on the assumption that the local density is uniform.

...read moreread less

Abstract: In (Yang et al, 2007), UDP is proposed to address the limitation of LPP for the clustering and classification tasks In this communication, we show that the basic ideas of UDP and LPP are identical In particular, UDP is just a simplified version of LPP on the assumption that the local density is uniform

...read moreread less

Journal Article•DOI•

Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles

[...]

Bastian Leibe¹, Konrad Schindler¹, Nico Cornelis², L. Van Gool¹•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

01 Oct 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem is presented, formulated in a minimum description length hypothesis selection framework, which allows the system to recover from mismatches and temporarily lost tracks.

...read moreread less

Abstract: We present a novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem. Our approach is formulated in a minimum description length hypothesis selection framework, which allows our system to recover from mismatches and temporarily lost tracks. Building upon a state-of-the-art object detector, it performs multiview/multicategory object recognition to detect cars and pedestrians in the input images. The 2D object detections are checked for their consistency with (automatically estimated) scene geometry and are converted to 3D observations which are accumulated in a world coordinate frame. A subsequent trajectory estimation module analyzes the resulting 3D observations to find physically plausible spacetime trajectories. Tracking is achieved by performing model selection after every frame. At each time instant, our approach searches for the globally optimal set of spacetime trajectories which provides the best explanation for the current image and for all evidence collected so far while satisfying the constraints that no two objects may occupy the same physical space nor explain the same image pixels at any point in time. Successful trajectory hypotheses are then fed back to guide object detection in future frames. The optimization procedure is kept efficient through incremental computation and conservative hypothesis pruning. We evaluate our approach on several challenging video sequences and demonstrate its performance on both a surveillance-type scenario and a scenario where the input videos are taken from inside a moving vehicle passing through crowded city areas.

...read moreread less

Journal Article•DOI•

Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces

[...]

Erno Mäkinen¹, Roope Raisamo¹•Institutions (1)

University of Tampere¹

01 Mar 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: One of the findings was that the automatic face alignment methods did not increase the gender classification rates, but manual alignment increased classification rates a little, which suggests that automatic alignment would be useful when the alignment methods are further improved.

...read moreread less

Abstract: We present a systematic study on gender classification with automatically detected and aligned faces. We experimented with 120 combinations of automatic face detection, face alignment, and gender classification. One of the findings was that the automatic face alignment methods did not increase the gender classification rates. However, manual alignment increased classification rates a little, which suggests that automatic alignment would be useful when the alignment methods are further improved. We also found that the gender classification methods performed almost equally well with different input image sizes. In any case, the best classification rate was achieved with a support vector machine. A neural network and Adaboost achieved almost as good classification rates as the support vector machine and could be used in applications where classification speed is considered more important than the maximum classification accuracy.

...read moreread less

Journal Article•DOI•

Path Similarity Skeleton Graph Matching

[...]

Xiang Bai¹, Longin Jan Latecki¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Jul 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel graph matching algorithm is proposed and applies it to shape recognition based on object silhouettes by comparing the geodesic paths between skeleton endpoints by motivated by the fact that visually similar skeleton graphs may have completely different topological structures.

...read moreread less

Abstract: This paper proposes a novel graph matching algorithm and applies it to shape recognition based on object silhouettes. The main idea is to match skeleton graphs by comparing the geodesic paths between skeleton endpoints. In contrast to typical tree or graph matching methods, we do not consider the topological graph structure. Our approach is motivated by the fact that visually similar skeleton graphs may have completely different topological structures. The proposed comparison of geodesic paths between endpoints of skeleton graphs yields correct matching results in such cases. The skeletons are pruned by contour partitioning with discrete. Curve evolution, which implies that the endpoints of skeleton branches correspond to visual parts of the objects. The experimental results demonstrate that our method is able to produce correct results in the presence of articulations, stretching, and contour deformations.

...read moreread less

Journal Article•DOI•

A Discriminative Kernel-Based Approach to Rank Images from Text Queries

[...]

David Grangier¹, Samy Bengio²•Institutions (2)

Idiap Research Institute¹, Google²

01 Aug 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper introduces a discriminative model for the retrieval of images from text queries that formalizes the retrieval task as a ranking problem, and introduces a learning procedure optimizing a criterion related to the ranking performance.

...read moreread less

Abstract: This paper introduces a discriminative model for the retrieval of images from text queries. Our approach formalizes the retrieval task as a ranking problem, and introduces a learning procedure optimizing a criterion related to the ranking performance. The proposed model hence addresses the retrieval problem directly and does not rely on an intermediate image annotation task, which contrasts with previous research. Moreover, our learning procedure builds upon recent work on the online learning of kernel-based classifiers. This yields an efficient, scalable algorithm, which can benefit from recent kernels developed for image comparison. The experiments performed over stock photography data show the advantage of our discriminative ranking approach over state-of-the-art alternatives (e.g. our model yields 26.3% average precision over the Corel dataset, which should be compared to 22.0%, for the best alternative model evaluated). Further analysis of the results shows that our model is especially advantageous over difficult queries such as queries with few relevant pictures or multiple-word queries.

...read moreread less

Journal Article•DOI•

Randomized Clustering Forests for Image Classification

[...]

Frank Moosmann, Eric Nowak, Frédéric Jurie

01 Sep 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work introduces Extremely Randomized Clustering Forests-ensembles of randomly created clustering trees-and shows that they provide more accurate results, much faster training and testing, and good resistance to background clutter.

...read moreread less

Abstract: Some of the most effective recent methods for content-based image classification work by quantizing image descriptors, and accumulating histograms of the resulting visual word codes. Large numbers of descriptors and large codebooks are required for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests-ensembles of randomly created clustering trees-and show that they provide more accurate results, much faster training and testing, and good resistance to background clutter. Second, an efficient image classification method is proposed. It combines ERC-Forests and saliency maps very closely with the extraction of image information. For a given image, a classifier builds a saliency map online and uses it to classify the image. We show in several state-of-the-art image classification tasks that this method can speed up the classification process enormously. Finally, we show that the proposed ERC-Forests can also be used very successfully for learning distance between images. The distance computation algorithm consists of learning the characteristic differences between local descriptors sampled from pairs of same or different objects. These differences are vector quantized by ERC-Forests and the similarity measure is computed from this quantization. The similarity measure has been evaluated on four very different datasets and always outperforms the state-of-the-art competitive approaches.

...read moreread less

Journal Article•DOI•

Nonchronological Video Synopsis and Indexing

[...]

Yael Pritch¹, A. Rav-Acha², Shmuel Peleg¹•Institutions (2)

Hebrew University of Jerusalem¹, Weizmann Institute of Science²

01 Nov 2008-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Video synopsis provides a short video representation, while preserving the essential activities of the original video, in order to create a synopsis of an endless video streams, as generated by Webcams and by surveillance cameras.

...read moreread less

Abstract: The amount of captured video is growing with the increased numbers of video cameras, especially the increase of millions of surveillance cameras that operate 24 hours a day. Since video browsing and retrieval is time consuming, most captured video is never watched or examined. Video synopsis is an effective tool for browsing and indexing of such a video. It provides a short video representation, while preserving the essential activities of the original video. The activity in the video is condensed into a shorter period by simultaneously showing multiple activities, even when they originally occurred at different times. The synopsis video is also an index into the original video by pointing to the original time of each activity. Video synopsis can be applied to create a synopsis of an endless video streams, as generated by Webcams and by surveillance cameras. It can address queries like "show in one minute the synopsis of this camera broadcast during the past day''. This process includes two major phases: (i) an online conversion of the endless video stream into a database of objects and activities (rather than frames). (ii) A response phase, generating the video synopsis as a response to the user's query.

...read moreread less

Collapse