scispace - formally typeset
Search or ask a question

Showing papers presented at "British Machine Vision Conference in 2004"


Proceedings ArticleDOI
07 Sep 2004
TL;DR: This paper proposes a new approach for finding expressive and geometrically invariant parts for modeling 3D objects that remain approximately affinely rigid across a range of views of an object, and across multiple instances of the same object class.
Abstract: This paper proposes a new approach for finding expressive and geometrically invariant parts for modeling 3D objects. The approach relies on identifying groups of local affine regions (image features having a characteristic appearance and elliptical shape) that remain approximately affinely rigid across a range of views of an object, and across multiple instances of the same object class. These groups, termed semi-local affine parts, are learned using correspondence search between pairs of unsegmented and cluttered input images, followed by validation against additional training images. The proposed approach is applied to the recognition of butterflies in natural imagery.

287 citations


Journal ArticleDOI
04 Oct 2004
TL;DR: An algorithm to measure the vessel diameter to subpixel accuracy is presented, based on a two-dimensional difference of Gaussian model, which is optimized to fit aTwo-dimensional intensity vessel segment.
Abstract: Changes in retinal vessel diameter are an important sign of diseases such as hypertension, arteriosclerosis and diabetes mellitus. Obtaining precise measurements of vascular widths is a critical and demanding process in automated retinal image analysis as the typical vessel is only a few pixels wide. This paper presents an algorithm to measure the vessel diameter to subpixel accuracy. The diameter measurement is based on a two-dimensional difference of Gaussian model, which is optimized to fit a two-dimensional intensity vessel segment. The performance of the method is evaluated against Brinchmann-Hansen's half height, Gregson's rectangular profile and Zhou's Gaussian model. Results from 100 sample profiles show that the presented algorithm is over 30% more precise than the compared techniques and is accurate to a third of a pixel.

275 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: A novel shape constraint technique which is incorporated into a multi-stage algorithm to automatically locate features of the human face is described, which outperforms previous published results on the BIOID test set.
Abstract: We describe a novel shape constraint technique which is incorporated into a multi-stage algorithm to automatically locate features o n the human face. The method is coarse-to-fine. First a face detector is applie d to find the approximate scale and location of the face in the image. Then individual feature detectors are applied and combined using a novel algorithm known as Pairwise Reinforcement of Feature Responses (PRFR). The points predicted by this method are then refined using a version of the Active Appe arance Model (AAM) search, which is tuned to edge and corner features. The final output of the three stage algorithm is shown to give much better results than any other combination of methods. The method outperforms previous published results on the BIOID test set [11].

231 citations


Proceedings ArticleDOI
16 Sep 2004
TL;DR: A novel wide-baseline matching algorithm is described that can identify corres ponding building facades in two views despite significant changes of viewpoin t and lighting.
Abstract: We describe the prototype of a system intended to allow a userto navigate in an urban environment using a mobile telephone equipped wi th a camera. The system uses a database of views of building facades to det ermine the pose of a query view provided by the user. Our method is based o n a novel wide-baseline matching algorithm that can identify corres ponding building facades in two views despite significant changes of viewpoin t and lighting. We show that our system is capable of localising query views r eliably in a large part of Cambridge city centre.

228 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: It is demonstrated that a relatively simple set of features extracted from sections of car front images can be used to obtain high performance verification and recognition of vehicle type (both car model and class) for secure access and traffic monitoring applications.
Abstract: We describe an investigation into feature representations for rigid structure recognition framework for recognition of objects with a multitude of classes. The intended application is automatic recognition of vehicle type for secure access and traffic monitoring applications, a problem not hitherto considered at such a level of accuracy. We demonstrate that a relatively simple set of features extracted from sections of car front images can be used to obtain high performance verification and recognition of vehicle type (both car model and class). We describe the approach and resulting system in full, and the results of experiments comparing a wide variety of different features. The final system is capable of recognition rates of over 93% and verification equal error rates of fewer than 5.6% when tested on over 1000 images containing 77 different classes. The system is shown to be robust for a wide range of weather and lighting conditions.

182 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: This paper proposes a novel block-based algorithm for background subtraction based on the Local Binary Pattern (LBP) texture measure that operates in real-time under the assumption of a stationary camera with fixed focal length.
Abstract: The detection of moving objects from video frames plays an important and often very critical role in different kinds of machine vision applications including human detection and tracking, traffic monitoring, humanmachine interfaces and military applications, since it usually is one of the first phases in a system architecture. A common way to detect moving objects is background subtraction. In background subtraction, moving objects are detected by comparing each video frame against an existing model of the scene background. In this paper, we propose a novel block-based algorithm for background subtraction. The algorithm is based on the Local Binary Pattern (LBP) texture measure. Each image block is modelled as a group of weighted adaptive LBP histograms. The algorithm operates in real-time under the assumption of a stationary camera with fixed focal length. It can adapt to inherent changes in scene background and can also handle multimodal backgrounds.

175 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: This work shows that long-term tracking is improved by treating salient feature patches as observations of locally planar regions on 3D world surfaces on SLAM framework, enabling pre-warping of templates for matching.
Abstract: The performance of sequential structure from motion systems, where scene mapping is sparse to permit real-time operation, depends greatly on the ability to repeatedly measure the same visual features from a wide range of viewpoints. While previous systems have tracked features as 2D templates in image space, we show that long-term tracking is improved by treating salient feature patches as observations of locally planar regions on 3D world surfaces. Within a SLAM framework for motion and structure estimation, a gradient-based image alignment method is used to deduce estimates feature surface normal estimates, enabling pre-warping of templates for matching. As an added benet these normals provide a richer description of the scene.

154 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: This work proposes a robust method for detecting local planar regions in a scene with an uncalibrated stereo based on random sampling using distributions of feature point locations and finds the largest consensus set of the homography.
Abstract: We propose a robust method for detecting local planar regions in a scene with an uncalibrated stereo. Our method is based on random sampling using distributions of feature point locations. For doing RANSAC, we use the distributions for each feature point defined by the distances between the point and the other points. We first choose a correspondence by using an uniform distribution and next choose candidate correspondences by using the distribution of the chosen point. Then, we compute a homography from the chosen correspondences and find the largest consensus set of the homography. We repeat this procedure until all regions are detected. We demonstrate that our method is robust to the outliers in a scene by simulations and real image examples.

140 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: This paper presents a novel approach to the detection of unusual or interesting events in videos involving certain types of intentional behaviour, such as pedestrian scenes, based upon building an understanding of the way people navigate towards a goal.
Abstract: This paper presents a novel approach to the detection of unusual or interesting events in videos involving certain types of intentional behaviour, such as pedestrian scenes. The approach is not based upon a statistical measure of typicality, but upon building an understanding of the way people navigate towards a goal. The activity of agents moving around within the scene is evaluated based upon whether the behaviour in question is consistent with a simple model of goal-directed behaviour and a model of those goals and obstacles known to be in the scene. The advantages of such an approach are multiple: it handles the presence of movable obstacles (for example, parked cars) with ease; trajectories which have never before been presented to the system can be classied as explicable; and the technique as a whole has a prima facie psychological plausibility. A system based upon these principles is demonstrated in two scenes: a car-park, and in a foyer scenario 1 .

130 citations


Proceedings ArticleDOI
01 Jan 2004
TL;DR: An empirical evaluation shows that Person Specific AAMs are, as expected, both easier to build and more robust to fit than Generic AAMm, and it is shown that building a generic shape model is far easier than building aGeneric appearance model.
Abstract: Active Appearance Models (AAMs) are generative parametric models that have been successfully used in the past to model faces. Anecdotal evidence, however, suggests that the performance of an AAM built to model the variation in appearance of a single person across pose, illumination, and expression (a Person Specific AAM) is substantially better than the performance of an AAM built to model the variation in appearance of many faces, including unseen subjects not in the training set (a Generic AAM). In this paper, we present an empirical evaluation that shows that Person Specific AAMs are, as expected, both easier to build and more robust to fit than Generic AAMs. Moreover, we show that: (1) building a generic shape model is far easier than building a generic appearance model, and (2) the shape component is the main cause of the reduced fitting robustness of Generic AAMs. We then proceed to describe two refinements to Generic AAMs to improve their performance: (1) a refitting procedure to improve the quality of the ground-truth data used to build the AAM and (2) a new fitting algorithm. For both refinements we demonstrate dramatically improved fitting performance. Finally, we evaluate the effect of these improvements on a combined model construction and fitting task.

117 citations


Proceedings ArticleDOI
07 Sep 2004
TL;DR: A flexible monocular system capable of recognising sign lexicons far greater in number than previous approaches and generating extremely high recognition rates for large lexicons with as little as a single training instance per sign is presented.
Abstract: This paper presents a flexible monocular system capable of recognising sign lexicons far greater in number than previous approaches. The power of the system is due to four key elements: (i) Head and hand detection based upon boosting which removes the need for temperamental colour segmentation; (ii) A body centred description of activity which overcomes issues with camera placement, calibration and user; (iii) A two stage classification in which stage I generates a high level linguistic description of activity which naturally generalises and hence reduces training; (iv) A stage II classifier bank which does not require HMMs, further reducing training requirements. The outcome of which is a system capable of running in real-time, and generating extremely high recognition rates for large lexicons with as little as a single training instance per sign. We demonstrate classification rates as high as 92% for a lexicon of 164 words with extremely low training requirements outperforming previous approaches where thousands of training examples are required.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: The Mercer property of matching kernels which mimic classical matching algorithms used in techniques based on points of interest are studied, and a new statistical approach of kernel positiveness is introduced, which can provide bounds on the probability that the Gram matrix is actually positive definite for kernels in large class of functions.
Abstract: On the one hand, Support Vector Machines have met with significant success in solving difficult pattern recognition problems with global features representation. On the other hand, local features in images have shown to be suitable representations for efficient object recognition. Therefore, it is natural to try to combine SVM approach with local features representation to gain advantages on both sides. We study in this paper the Mercer property of matching kernels which mimic classical matching algorithms used in techniques based on points of interest. We introduce a new statistical approach of kernel positiveness. We show that despite the absence of an analytical proof of the Mercer property, we can provide bounds on the probability that the Gram matrix is actually positive definite for kernels in large class of functions, under reasonable assumptions. A few experiments validate those on object recognition tasks.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: An unsupervised algorithm for following ill-structured roads in which dominant texture orientations computed with Gabor wavelet filters vote for a consensus road vanishing point location is described.
Abstract: Many rural roads lack sharp, smoothly curving edges and a homogeneous surface appearance, hampering traditional vision-based road-following methods. However, they often have strong texture cues parallel to the road direction in the form of ruts and tracks left by other vehicles. This paper describes an unsupervised algorithm for following ill-structured roads in which dominant texture orientations computed with Gabor wavelet filters vote for a consensus road vanishing point location. The technique is first described for estimating the direction of straight-road segments, then extended to curved and undulating roads by tracking the vanishing point indicated by a dierential “strip” of voters moving up toward the nominal vanishing line. Finally, the vanishing point is used to constrain a search for the road boundaries by maximizing texture- and color-based region discriminant functions. Results are shown for a variety of road scenes including gravel roads, dirt trails, and highways.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This work proposes an approach based on representing the induced transformation between images using Radial Basis Functions (RBF) and shows that the computed registrations are sufficiently accurate to allow convincing augmentations of the images.
Abstract: Registering images of a deforming surface is a well-studied problem. Solutions include computing optic flow or estimating a parameterized motion model. In the case of optic flow it is necessary to include some regularization. We propose an approach based on representing the induced transformation between images using Radial Basis Functions (RBF). The approach can be viewed as a direct, i.e. intensity-based, method, or equivalently, as a way of using RBFs as non-linear regularizers on the optic flow field. The approach is demonstrated on several image sequences of deforming surfaces. It is shown that the computed registrations are sufficiently accurate to allow convincing augmentations of the images.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: The goal of this paper is to recognize various deformable objects from images by extending the class of generative probabilistic models known as pictorial structures, particularly suited to represent articulated structures.
Abstract: The goal of this paper is to recognize various deformable objects from images. To this end we extend the class of generative probabilistic models known as pictorial structures. This class of models is particularly suited to represent articulated structures, and has previously been used by Felzenszwalb and Huttenlocher for pose estimation of humans. We extend pictorial structures in three ways: (i) likelihoods are included for both the boundary and the enclosed texture of the animal; (ii) a complete graph is modelled (rather than a tree structure); (iii) it is demonstrated that the model can be tted in polynomial time using belief propagation. We show examples for two types of quadrupeds, cows and horses. We achieve excellent recognition performance for cows with an equal error rate of 3% for 500 positive and 5000 negative images.

Proceedings Article
07 Sep 2004
TL;DR: This work presents an approach to full human-body tracking, using markerless multiview images as input, performing acquisition, reconstruction and tracking in real-time on a single PC, using a hierarchical visual-hull algorithm which segments only the most interesting regions of the images and includes colour information.
Abstract: We present an approach to full human-body tracking, using markerless multiview images as input, performing acquisition, reconstruction and tracking in real-time on a single PC. Our approach employs a hierarchical visual-hull algorithm which segments only the most interesting regions of the images and includes colour information. The tracking step uses blobs attached to a kinematic model to recover joint angles in an expectation-maximization framework. We demonstrate the robustness of the approach on video sequences of various body configurations in an unaugmented office environment. We also show that tracking challenging poses with self-occlusions is possible without the processing cost of stochastic sampling schemes.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: Preliminary experimental results show that the learning process captures common errors in SSD matching including the fattening effect, the aperture effect, and mismatches in occluded or low texture regions, and significantly improves the accuracy of the depth computation.
Abstract: This paper describes a novel learning-based approach for improving the performance of stereo computation. It is based on the observation that whether the image matching scores lead to true or erroneous depth values is dependent on the original stereo images and the underlying scene structure. This function is learned from training data and is integrated into a depth estimation algorithm using the MAP-MRF framework. Because the resultant likelihood function is dependent on the states of a large neighboring region around each pixel, we propose to solve the high-order MRF inference problem using the simulated annealing algorithm combined with a Metropolis-Hastings sampler. A segmentation-based approach is proposed to accelerate the computational speed and improve the performance. Preliminary experimental results show that the learning process captures common errors in SSD matching including the fattening effect, the aperture effect, and mismatches in occluded or low texture regions. It is also demonstrated that the proposed approach significantly improves the accuracy of the depth computation.

Proceedings ArticleDOI
07 Sep 2004
TL;DR: A real time approach to locate and track the upper torso of the human body using background suppression and a general approximation to body shape, applied within a particle filter framework, making use of integral images to maintain real-time performance.
Abstract: This paper presents a real time approach to locate and track the upper torso of the human body. Our main interest is not in 3D biometric accuracy, but rather a sufficient discriminatory representation for visual interaction. The algorithm employs background suppression and a general approximation to body shape, applied within a particle filter framework, making use of integral images to maintain real-time performance. Furthermore, we present a novel method to disambiguate the hands of the subject and to predict the likely position of elbows. The final system is demonstrated segmenting multiple subjects from a cluttered scene at above real time operation.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This paper describes how a single AAM can be fit to multiple images, captured simultaneously by cameras with arbitrary geometry and response functions, and retains the major benefits of Coupled-View AAMs: the integration of information from multiple images into a single model, and improved fitting robustness.
Abstract: Active Appearance Models (AAMs) are a well studied 2D deformable model. One recently proposed extension of AAMs to multiple images is the CoupledView AAM. Coupled-View AAMs model the 2D shape and appearance of a face in two or more views simultaneously. The major limitation of CoupledView AAMs, however, is that they are specific to a particular set of cameras, both in geometry and the photometric responses. In this paper, we describe how a single AAM can be fit to multiple images, captured simultaneously by cameras with arbitrary geometry and response functions. Our algorithm retains the major benefits of Coupled-View AAMs: the integration of information from multiple images into a single model, and improved fitting robustness.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: A method of constructing parametric statistical models of shape variation which can generate continuous diffeomorphic (non-folding) deformation deformation £elds is described and examples of the resulting models are given.
Abstract: We describe a method of constructing parametric statistical models of shape variation which can generate continuous diffeomorphic (non-folding) deformation £elds. Traditional statistical shape models are constructed by analysis of the positions of a set of landmark points. Here we analyse the parameters of continuous warp £elds, constructed by composing simple parametric diffeomorphic warps. The warps are composed in such a way that the deformations are always de£ned in a reference frame. This allows the parameters controlling the deformations to be meaningfully compared from one example to another. A linear model is learnt to represent the variations in the warp parameters across the training set. This model can then be used to generalise the deformations. Models can be built either from sets of annotated points, or from unlabelled images. In the latter case, we use techniques from non-rigid registration to construct the warp £elds deforming a reference image into each example. We describe the technique in detail and give examples of the resulting models.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: A real time system for tracking targets across blind regions of multiple cameras with non-overlapping fields of views (FOVs) using camera topology, and targets’ motion and shape information is proposed.
Abstract: In this paper, we propose a real time system for tracking targets across blind regions of multiple cameras with non-overlapping fields of views (FOVs) using camera topology, and targets’ motion and shape information. Kalman filters are used to robustly track each target’s shape and motion in each camera view and the common ground plane view composed of all camera views. The target’s track in the blind region between cameras is obtained using Kalman filter predictions. For multi-camera correspondence matching we compute the Gaussian distributions of the tracking parameters across cameras for the target motion and position in the ground plane view. Matching of targets across camera views uses a graph based track initialization scheme, which accumulates information from occurrences of target in several consecutive frames of the video. Probabilistic matching is carried out by using the track parameters for new tracks obtained from the graph in a camera view with the parameters of the terminated tracks learnt by Kalman filters in the other camera views and ground plane view. We obtain 85% accuracy for corresponding matching while tracking vehicles observed from two cameras monitoring a highway.

Proceedings ArticleDOI
01 Sep 2004
TL;DR: This paper presents a method for computing the localization of a mobile robot with reference to a learning video sequence, where the robot is first guided on a path by a human, while the camera records a monocular learning sequence.
Abstract: In this paper we present a method for computing the localization of a mobile robot with reference to a learning video sequence. The robot is first guided on a path by a human, while the camera records a monocular learning sequence. Then the computer builds a map of the environment. This is done by first extracting key frames from the learning sequence. Then the epipolar geometry and camera motion are computed between key frames. Additionally, a hierachical bundle adjustment is used to refine the reconstruction. The map stored for the localization include the position odf the camera associated with each key frame as well as a set of interest points detected in the images and reconstructed in 3D. Using this map it is possible to compute the localization of the robot in real time during the automatic driving phase.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This paper proposes an algorithm which consists in performing the detection in the space of the equivalent sphere which is the unified domain of central catadioptric sensors, and proposes to apply this unifying model in order to perform the calibration of the intrinsic parameters required for the projection on the sphere.
Abstract: Central catadioptric sensors enable to acquire panoramic images on a 360 degree field of view while preserving a single viewpoint. These advantages account for the growing use of these sensors in applications such as surveillance, navigation or modelling. However, the deformations of the image do not allow to apply classical perspective image algorithms or operators. Typically, straight line detection in perspective image becomes a delicate and complex conic detection problem in central catadioptric image. Previous methods proposed in the literature were essentially motivated by particular cases such as horizontal line detection or paracatadioptric line detection. In this paper, we propose an algorithm which consists in performing the detection in the space of the equivalent sphere which is the unified domain of central catadioptric sensors. On this sphere, real lines are projected into great circles that we detect thanks to the Hough transform. We also propose to apply this unifying model in order to perform the calibration of the intrinsic parameters required for the projection on the sphere. We show results on synthetic and real catadioptric images (parabolic, hyperbolic) to demonstrate the relevance of the detection on the sphere.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This work reports results of a psychological study that provides independent evidence regarding the validity of the face reconstruction task, and demonstrates an improved reconstruction approach using a positive, local linear representation that provides demonstrably lower reconstruction error than is obtainable with a global representation.
Abstract: A numberof investigatorshave had success usingdomainspecific priorknowledgeto produce improved superresolution images of faces (“hallucinating faces”). These efforts address the scenario where a face image is obtained from a low-resolutioncamera. A related but less studied problem occurs when the missing information is the result of occlusion rather than low camera resolution, as in the case when a person is wearing sunglasses. Recently Hwang and Lee [14] introduced the first algorithm for solving this reconstruction “inpainting” problem. In the current work we report results of a psychological study that provides independent evidence regarding the validity of the face reconstruction task, and we demonstrate an improved reconstruction approach using a positive, local linear representation. The positive, local mixture operates on real-world images without manual intervention in many cases, and provides demonstrably lower reconstruction error than is obtainable with a global representation.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: The results obtained on a complex database of mono- and bi-manual gestures are studied by using Input/Output Hidden Markov Model (IOHMM), implemented within the framework of an open source machine learning library, and are compared to Hidden MarkOV Model (HMM).
Abstract: In this paper, we address the problem of the recognition of isolated complex mono- and bi-manual hand gestures. In the proposed system, hand gestures are represented by the 3D trajectories of blobs obtained by tracking colored body parts. In this paper, we study the results obtained on a complex database of mono- and bi-manual gestures. These results are obtained by using Input/Output Hidden Markov Model (IOHMM), implemented within the framework of an open source machine learning library, and are compared to Hidden Markov Model (HMM).

Proceedings ArticleDOI
01 Sep 2004
TL;DR: This paper is concerned with allowing the user of a wearable, portable, vision system to interact with the visual information using hand movements and gestures and builds on earlier work which recovers 3D scene structure at video-rate.
Abstract: This paper is concerned with allowing the user of a wearable, portable, vision system to interact with the visual information using hand movements and gestures. Two example scenarios are explored. The first, in 2D, uses the wearer’s hand to both guide an active wearable camera and to highlight objects of interest using a grasping vector. The second is based in 3D, and builds on earlier work which recovers 3D scene structure at video-rate, allowing real-time purposive redirection of the camera to any scene point. Here, a range of hand gestures are used to highlight and select 3D points within the structure and in this instance used to insert 3D graphical objects into the scene. Structure recovery, gesture recognition, scene annotation and augmentation are achieved in parallel and at video-rate.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This work focuses on the incremental image-to-image updating of the homography matrix, which greatly facilitates real-time operation and relies on the automatic tracking of interest lines and points and on their use for robust homography estimation.
Abstract: We describe an important element of an automatic sport analysis system. This element continuously estimates the image-to-model homography from the video stream of a single camera. Here, we focus on the incremental image-to-image updating of the homography matrix, which greatly facilitates real-time operation. This updating relies on the automatic tracking of interest lines and points and on their use for robust homography estimation. Results on real video sequences are shown.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: A fully automatic face recognition system that uses video information to achieve illumination and pose robustness and is shown to greatly outperform state-of-the-art algorithms.
Abstract: Illumination and pose invariance are the most challenging aspects of face recognition. In this paper we describe a fully automatic face recognition system that uses video information to achieve illumination and pose robustness. In the proposed method, highly nonlinear manifolds of face motion are approximated using three Gaussian pose clusters. Pose robustness is achieved by comparing the corresponding pose clusters and probabilistically combining the results to derive a measure of similarity between two manifolds. Illumination is normalized on a per-pose basis. Region-based gamma intensity correction is used to correct for coarse illumination changes, while further refinement is achieved by combining a learnt linear manifold of illumination variation with constraints on face pattern distribution, derived from video. Comparative experimental evaluation is presented and the proposed method is shown to greatly outperform state-of-the-art algorithms. Consistent recognition rates of 94-100% are achieved across dramatic changes in illumination.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This paper presents a novel 3D face modeling approach from a monocular video captured using a conventional camera that relies on matching a generic3D face model to the outer contours of the face to be modeled and a few of its internal features.
Abstract: In this paper, we present a novel 3D face modeling approach from a monocular video captured using a conventional camera. The proposed algorithm relies on matching a generic 3D face model to the outer contours of the face to be modeled and a few of its internal features. At the first stage of the method, we estimate the head pose by comparing the edges extracted from video frames, with the contours extracted from a generic face model. Next, the generic face model is adapted to the actual 3D face by global and local deformations. An affine model is used for global deformation. The 3D model is locally deformed by computing the optimal perturbations of a sparse set of control points using a stochastic search optimization method. The deformations are integrated over a set of poses in the video sequence, leading to an accurate 3D model.

Proceedings ArticleDOI
01 Jan 2004
TL;DR: This work proposes a method to search the state space that fairly distributes impoverishment effects between the objects by defining a set of mixture components and performing PS in each of these components using one of a small set of representative object orderings.
Abstract: Multi-Object tracking (MOT) is an important problem in a number of vision applications. For particle filter (PF) tracking, as the number of objects tracked increases, the search space for random sampling explodes in dimension. Partitioned sampling (PS) solves this problem by partitioning the search space, then searching each partition sequentially. However, sequential weighted resampling steps cause an impoverishment effect that increases with the number of objects. This effect depends on the specific order in which the partitions are explored, creating an erratic and undesirable performance. We propose a method to search the state space that fairly distributes these impoverishment effects between the objects by defining a set of mixture components and performing PS in each of these components using one of a small set of representative object orderings. Using synthetic and real data, we show that our method retains the overall performance and reduced computational cost of PS, while improving performance in scenes where the impoverishment effect is significant.