scispace - formally typeset
Search or ask a question

Showing papers by "Dorin Comaniciu published in 2002"


Journal ArticleDOI
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Abstract: A general non-parametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure: the mean shift. For discrete data, we prove the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-Watson estimator from kernel regression and the robust M-estimators; of location is also established. Algorithms for two low-level vision tasks discontinuity-preserving smoothing and image segmentation - are described as applications. In these algorithms, the only user-set parameter is the resolution of the analysis, and either gray-level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

11,727 citations


Patent
20 Dec 2002
TL;DR: In this article, an apparatus and method for video object generation and selective encoding is provided, which includes a detection module for detecting a first object in at least one image frame of a series of image frames; a tracking module for tracking the first object and segmenting the object from a background, the background being a second object; and an encoder for encoding the first and second objects to be transmitted to a receiver, wherein the first objects are compressed at a high compression rate and the second object is compressed at low compression rate.
Abstract: An apparatus and method for video object generation and selective encoding is provided. The apparatus includes a detection module for detecting a first object in at least one image frame of a series of image frames; a tracking module for tracking the first object in successive image frames and segmenting the first object from a background, the background being a second object; and an encoder for encoding the first and second objects to be transmitted to a receiver, wherein the first object is compressed at a high compression rate and the second object is compressed at a low compression rate. The receiver merges the first and second object to form a composite image frame. The method provides for detecting, tracking and segmenting one or more objects, such as a face, from a background to be encoded at the same or different compression rates to conserve bandwidth.

129 citations


Patent
02 Jul 2002
TL;DR: In this article, a method and system for tracking a position and orientation (pose) of a camera using real scene features is presented, which includes the steps of capturing a video sequence by the camera, extracting features from the video sequence, estimating a first pose of the camera by an external tracking system, constructing a model of the features from first pose, and estimating a second pose by tracking the model of features.
Abstract: A method and system for tracking a position and orientation (pose) of a camera using real scene features is provided. The method includes the steps of capturing a video sequence by the camera; extracting features from the video sequence; estimating a first pose of the camera by an external tracking system; constructing a model of the features from the first pose; and estimating a second pose by tracking the model of the features, wherein after the second pose is estimated, the external tracking system is eliminated. The system includes an external tracker for estimating a reference pose; a camera for capturing a video sequence; a feature extractor for extracting features from the video sequence; a model builder for constructing a model of the features from the reference pose; and a pose estimator for estimating a pose of the camera by tracking the model of the features.

69 citations


Proceedings ArticleDOI
10 Dec 2002
TL;DR: A system for video object generation and selective encoding with applications in surveillance, mobile videophones, and the automotive industry, which belongs to a new generation of intelligent vision sensors called smart cameras, which execute autonomous vision tasks and report events and data to a remote base-station.
Abstract: The paper presents a system for video object generation and selective encoding with applications in surveillance, mobile videophones, and the automotive industry. Object tracking and MPEG-4 compression are performed in real-time. The system belongs to a new generation of intelligent vision sensors called smart cameras, which execute autonomous vision tasks and report events and data to a remote base-station. A detection module signals the presence of an object of interest within the camera field of view, while the tracking part follows the target to generate temporal trajectories. The compression is MPEG-4 compliant and implements the simple profile of the standard, which is capable of encoding up to four video objects. At the same time, the compression is selective, maintaining a higher quality for foreground objects and a lower quality for background representation. This property contributes to bandwidth reduction while preserving the essential information of foreground objects. The system performance is demonstrated in experiments that involve objects representing faces and vehicles seen from both static and moving cameras.

50 citations


Book ChapterDOI
25 Sep 2002
TL;DR: A robust and efficient method for the segmentation of 3D structures in CT and MR images is presented based on 3D ray propagation by mean shift analysis with a smoothness constraint, which is used to guide an evolving surface due to its computational efficiency.
Abstract: A robust and efficient method for the segmentation of 3D structures in CT and MR images is presented. The proposed method is based on 3D ray propagation by mean shift analysis with a smoothness constraint. Specifically, ray propagation is used to guide an evolving surface due to its computational efficiency. In addition, non-parametric analysis and shape priors are incorporated to the proposed technique for robust convergence. Several examples are depicted to illustrate its effectiveness.

47 citations


Book ChapterDOI
Dorin Comaniciu1
16 Sep 2002
TL;DR: This paper analyzes the simplest parameterization represented by translation in both domains and presents a gradient-based iterative solution that accommodates variations in the target appearance, while being robust to outliers represented by partial occlusions.
Abstract: We present a Bayesian approach to real-time object tracking using nonparametric density estimation. The target model and candidates are represented by probability densities in the joint spatial-intensity domain. The new location and appearance of the target are jointly derived by computing the maximum likelihood estimate of the parameter vector that characterizes the transformation from the candidate to the model. This probabilistic formulation accommodates variations in the target appearance, while being robust to outliers represented by partial occlusions. In this paper we analyze the simplest parameterization represented by translation in both domains and present a gradient-based iterative solution. Various tracking sequences demonstrate the superior behavior of the method.

37 citations


Proceedings ArticleDOI
10 Dec 2002
TL;DR: A novel statistical framework for image segmentation based on nonparametric clustering is discussed, employing the mean shift procedure for analysis, that identifies image regions as clusters in the joint color-spatial domain.
Abstract: We discuss a novel statistical framework for image segmentation based on nonparametric clustering. By employing the mean shift procedure for analysis, image regions are identified as clusters in the joint color-spatial domain. To measure the significance of each cluster we use a test statistics that compares the estimated density of the cluster mode with the estimated density on the cluster boundary. The cluster boundary in the color domain is defined by saddle points lying on the cluster borders defined in the spatial domain. The proposed technique compares favorably to other segmentation methods described in literature.

29 citations


Journal ArticleDOI
TL;DR: A real-time foveation system for remote and distributed surveillance that performs face detection, tracking, selective encoding of the face and background, and efficient data transmission using a Java-based client-server architecture.
Abstract: We present a real-time foveation system for remote and distributed surveillance. The system performs face detection, tracking, selective encoding of the face and background, and efficient data transmission. A Java-based client-server architecture connects a radial network of camera servers to their central processing unit. Each camera server includes a detection and tracking module that signals the human presence within an observation area and provides the 2D face coordinates and its estimated scale to the video transmission module. The captured video data are then efficiently represented in log-polar coordinates, with the foveation point centered on the face, and sent to the connecting client modules for further processing. The current set-up of the system employs active cameras that track the detected person, by switching between smooth pursuit and saccadic movements, as a function of the target presence in the fovea region. High reconstruction resolution in the fovea region enables the successive application of recognition/verification modules on the transmitted video without sacrificing their performance. The system modules are well suited for implementation on the next generation of Java-based intelligent cameras.

28 citations


Book ChapterDOI
28 May 2002
TL;DR: A gradient-based iterative algorithm for saddle point detection is developed and shown to be effectiveness in various data decomposition tasks and significance measures that allow formal hypothesis testing of cluster existence are computed.
Abstract: Decomposition methods based on nonparametric density estimation define a cluster as the basin of attraction of a local maximum (mode) of the density function, with the cluster borders being represented by valleys surrounding the mode. To measure the significance of each delineated cluster we propose a test statistics that compares the estimated density of the mode with the estimated maximum density on the cluster boundary. While for a given kernel bandwidth the modes can be safely obtained by using the mean shift procedure, the detection of maximum density points on the cluster boundary (i.e., the saddle points) is not straightforward for multivariate data. We therefore develop a gradient-based iterative algorithm for saddle point detection and show its effectiveness in various data decomposition tasks. After finding the largest density saddle point associated with each cluster, we compute significance measures that allow formal hypothesis testing of cluster existence. The new statistical framework is extended and tested for the task of image segmentation.

24 citations


Proceedings ArticleDOI
05 Dec 2002
TL;DR: A statistical solution to the fusion problem based on variable-bandwidth kernel density estimation and it is shown that the fusion estimate is consistent and conservative, and Superior experimental results validate the theory.
Abstract: Vision tasks, such as motion analysis, object tracking, robot localization, and 3D modeling, often require the fusion of estimates coming from different sources. Most of the fusion algorithms, however, are not robust with respect to outliers and only consider one source models. Their performance deteriorates when initial assumptions are not valid (e.g., the presence of outliers in the data or data corresponding to multiple motions). The paper presents a statistical solution to the fusion problem based on variable-bandwidth kernel density estimation. The fusion estimate is represented by the mode of a density function that exploits the uncertainty of the estimates to be fused. We show that the fusion estimate is consistent and conservative. Since our construction is founded on density estimation, it handles naturally outliers and multiple source models. We test the density-based fusion for the task of multiple motion computation. Superior experimental results validate our theory.

20 citations


Journal ArticleDOI
TL;DR: A new paradigm that combines data modeling and vector quantization in an effective coding technique is presented that fits a statistical model to the input data and uses the best fit parameters to synthesize training vector sets with statistics similar to theinput.

Book ChapterDOI
28 May 2002
TL;DR: This paper presents a generic modelling scheme to characterize the nonlinear structure of the manifold and to learn its multimodal distribution, and shows results on both synthetic and real training sets, and demonstrates that the proposed scheme has the ability to reveal important structures of the data.
Abstract: In many vision problems, the observed data lies in a nonlinear manifold in a high-dimensional space. This paper presents a generic modelling scheme to characterize the nonlinear structure of the manifold and to learn its multimodal distribution. Our approach represents the data as a linear combination of parameterized local components, where the statistics of the component parameterization describe the nonlinear structure of the manifold. The components are adaptively selected from the training data through a progressive density approximation procedure, which leads to the maximum likelihood estimate of the underlying density. We show results on both synthetic and real training sets, and demonstrate that the proposed scheme has the ability to reveal important structures of the data.

Patent
22 Oct 2002
TL;DR: In this paper, a set of rays from the seed is initialized to form a curve and their speed function is determined, and a curve is evolved by propagating the rays based on the speed function.
Abstract: The method involves receiving and visualizing two-dimensional (2D) image data on a display device. A structure in the image data is selected by placing a seed in it. A set of rays from the seed is initialized to form a curve and their speed function is determined. A curve is evolved by propagating the rays based on the speed function. The structure is segmented when all the rays converge on the structures boundary. An Independent claim is also included for a system for segmenting structures from 2D images.