scispace - formally typeset
Search or ask a question

Showing papers presented at "British Machine Vision Conference in 2005"


Proceedings ArticleDOI
01 Sep 2005
TL;DR: The ability of the particle filter to deal with non-linearities and non-Gaussian statistics suggests the potential to provide improved robustness over existing approaches, such as those based on the Kalman filter.
Abstract: We describe a particle filtering method for vision based tracking of a hand held calibrated camera in real-time. The ability of the particle filter to deal with non-linearities and non-Gaussian statistics suggests the potential to provide improved robustness over existing approaches, such as those based on the Kalman filter. In our approach, the particle filter provides recursive approximations to the posterior density for the 3-D motion parameters. The measurements are inlier/outlier counts of likely correspondence matches for a set of salient points in the scene. The algorithm is simple to implement and we present results illustrating good tracking performance using a ‘live’ camera. We also demonstrate the potential robustness of the method, including the ability to recover from loss of track and to deal with severe occlusion.

155 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: A learning-based method for counting people in crowds from a single camera that takes into account feature normalization to deal with perspective projection and different camera orientation and can be deployed with minimal setup for a new site.
Abstract: This paper describes a learning-based method for counting people in crowds from a single camera. Our method takes into account feature normalization to deal with perspective projection and different camera orientation. Thus, our system is trained to be viewpoint invariant and can be deployed with minimal setup for a new site. This is achieved by applying background subtraction and edge detection to each frame and extracting edge orientation and blob size histograms as features. A homography is computed between the ground plane and the image plane coordinates for the region of interest (ROI). A density map that measures the relative size of individuals and a global scale measuring camera orientation are also estimated and used for feature normalization. The relationship between the feature histograms and the number of pedestrians in the crowds is learned from labeled training data. The two training methods used in the current system are linear fitting and neural networks. Experimental results from different sites with different camera orientation demonstrate the performance and the potential of our method.

151 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: A probabilistic framework of assembling detected human body parts into a full 2D human configuration is presented and the combination of skin colour likelihood and detection likelihood to further reduce false hand and face detections is illustrated.
Abstract: This paper presents a probabilistic framework of assembling detected human body parts into a full 2D human configuration. The face, torso, legs and hands are detected in cluttered scenes using boosted body part detectors trained by AdaBoost. Body configurations are assembled from the detected parts using RANSAC, and a coarse heuristic is applied to eliminate obvious outliers. An a priori mixture model of upper-body configurations is used to provide a pose likelihood for each configuration. A joint-likelihood model is then determined by combining the pose, part detector and corresponding skin model likelihoods. The assembly with the highest likelihood is selected by RANSAC, and the elbow positions are inferred. This paper also illustrates the combination of skin colour likelihood and detection likelihood to further reduce false hand and face detections.

97 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: A method capable of recognising one of N objects in log(N) time, which preserves all the strengths of local affine region methods – robustness to background clutter, occlusion, and large changes of viewpoints.
Abstract: Realistic approaches to large scale object recognition, i.e. for detection and localisation of hundreds or more objects, must support sub-linear time indexing. In the paper, we propose a method capable of recognising one of N objects in log(N) time. The ”visual memory” is organised as a binary decision tree that is built to minimise average time to decision. Leaves of the tree represent a few local image areas, and each non-terminal node is associated with a ’weak classifier’. In the recognition phase, a single invariant measurement decides in which subtree a corresponding image area is sought. The method preserves all the strengths of local affine region methods – robustness to background clutter, occlusion, and large changes of viewpoints. Experimentally we show that it supports near real-time recognition of hundreds of objects with state-of-the-art recognition rates. After the test image is processed (in a second on a current PCs), the recognition via indexing into the visual memory requires milliseconds.

92 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: A simple statistical test is proposed that permits the scoring process be terminated early, potentially yielding large computational savings and is evaluated by estimation of the fundamental matrix for a large number of image pairs and is shown to offer a significant reduction in computational cost.
Abstract: The random sample consensus (RANSAC) algorithm, along with its many cousins such as MSAC and MLESAC, has become a standard choice for robust estimation in many computer vision problems. Recently, a raft of modifications to the basic RANSAC algorithm have been proposed aimed at improving its efficiency. Many of these optimizations work by reducing the number of hypotheses that need to be evaluated. This paper proposes a complementary strategy that aims to reduce the average amount of time spent computing the consensus score for each hypothesis. A simple statistical test is proposed that permits the scoring process be terminated early, potentially yielding large computational savings. The proposed test is simple to implement, imposes negligible computational overhead, and is effective for any given size of data set. The approach is evaluated by estimation of the fundamental matrix for a large number of image pairs and is shown to offer a significant reduction in computational cost compared to recently proposed RANSAC modifications.

91 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: An algorithm for obtaining correspondences across a group of images of deformable objects, to construct a statistical model of appearance which can encode the training images as compactly as possible using a Minimum Description Length framework.
Abstract: We describe an algorithm for obtaining correspondences across a group of imagesof deformable objects. The approach is to construct a statistical modelof appearance which can encode the training images as compactly as possible(a Minimum Description Length framework). Correspondences are defined bypiece-wise linear interpolation between a set of control points defined oneach image. Given such points a model can be constructed, which can approximateevery image in the set. The description length encodes the cost of the model,the parameters and most importantly, the residuals not explained by the model.By modifying the positions of the control points we can optimise the descriptionlength, leading to good correspondence. We describe the algorithm in detailand give examples of its application to MR brain images and to faces. We alsodescribe experiments which use a recently-introduced specificity measureto evaluate the performance of different components of the algorithm.

88 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: Shape Context trained on real edge images rather than on clean pedestrian silhouettes combined with the Hessian-Laplace detector outperforms all other tested approaches for the detection of pedestrians.
Abstract: Pedestrian detection in real world scenes is a challenging problem. In recent years a variety of approaches have been proposed, and impressive results have been reported on a variety of databases. This paper systematically evaluates (1) various local shape descriptors, namely Shape Context and Local Chamfer descriptor and (2) four different interest point detectors for the detection of pedestrians. Those results are compared to the standard global Chamfer matching approach. A main result of the paper is that Shape Context trained on real edge images rather than on clean pedestrian silhouettes combined with the Hessian-Laplace detector outperforms all other tested approaches.

77 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: To fill holes in photographs of structured, man made environments, a technique which automatically adjusts and clones large image patches that have similar structure is proposed, which handles macrostructure with an adjustable degree of automation.
Abstract: To fill holes in photographs of structured, man made environments, we propose a technique which automatically adjusts and clones large image patches that have similar structure. These source patches can come from elsewhere in the same image, or from other images shot from different perspectives. Two significant developments of this work are the ability to automatically detect and adjust source patches whose macrostructure is compatible with the hole region, and alternately, to interactively specify a user's desired search regions. In contrast to existing photomontage algorithms which either synthesize microstructure or require careful user interaction to fill holes, our approach handles macrostructure with an adjustable degree of automation.

76 citations


Proceedings ArticleDOI
16 Sep 2005
TL;DR: A new hierarchical part-based pose estimation method for the upper-body that efficiently searches the high dimensional articulation space using a modified Viterbi algorithm for smooth trajectory of articulation between frames.
Abstract: This paper addresses the problem of automatic detection and recovery of three-dimensional human body pose from monocular video sequences for HCI applications. We propose a new hierarchical part-based pose estimation method for the upper-body that efficiently searches the high dimensional articulation space. The body is treated as a collection of parts linked in a kinematic structure. Search for configurations of this collection is commenced from the most reliably detectable part. The rest of the parts are searched based on the detected locations of this anchor as they all are kinematically linked. Each part is represented by a set of 2D templates created from a 3D model, hence inherently encoding the 3D joint angles. The tree data structure is exploited to efficiently search through these templates. Multiple hypotheses are computed for each frame. By modelling these with a HMM, temporal coherence of body motion is exploited to find a smooth trajectory of articulation between frames using a modified Viterbi algorithm. Experimental results show that the proposed technique produces good estimates of the human 3D pose on a range of test videos in a cluttered environment.

76 citations


Proceedings ArticleDOI
01 Sep 2005
TL;DR: This paper proposes a method which can produce shadow free images quickly and without artifacts, and is based on a Hamiltonian path that enters and exists the shadow regions once.
Abstract: For some computer vision tasks, the presence of shadows in images can cause problems. For example, object tracks can be lost as an object crosses over a shadow boundary. Recently, it has been shown that it is possible to remove shadows from images. Assuming that the location of the shadows are known, shadow-free images are obtained in three steps. First, the image is differentiated. Second, the derivatives at the shadow edge are set to zero. Third, reintegration delivers an image without shadows. While this process can work well, the resultant shadow free image often has artifacts and, moreover, the reintegration is an expensive computational procedure. In this paper we propose a method which can produce shadow free images quickly and without artifacts. Our algorithm is based on two observations. First, that shadows in images are closed regions and if they are not closed artifacts can result during reintegration. Thus we propose to extend the existing methods and enforce the constraint that shadow boundaries must be closed prior to reintegration. Second, that the standard reintegration method used (solving a 2D Poisson equation) also, necessarily, introduces artifacts. The solution here is to reintegrate shadow and non shadow regions almost separately. Specifically, we reintegrate the image along a Hamiltonian path that enters and exists the shadow regions once. Detail that was masked out at the shadow boundary is then infilled in a second step. The resulting reintegrated image has much fewer artifacts. Moreover, since the reintegration method is path based it is both simple and fast. Experiments validate our approach.

70 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: Experimental results show that the proposed tennis ball tracking algorithm is robust and has a tracking accuracy that is sufficiently high for automatic annotation of tennis matches.
Abstract: Several tennis ball tracking algorithms have been reported in the literature. However, most of them use high quality video and multiple cameras, and the emphasis has been on coordinating the cameras, or visualising the tracking results. In this paper, we propose a tennis ball tracking algorithm for low quality off-air video recorded with a single camera. Multiple visual cues are exploited for tennis candidate detection. A particle filter with improved sampling efficiency is used to track the tennis candidates. Experimental results show that our algorithm is robust and has a tracking accuracy that is sufficiently high for automatic annotation of tennis matches.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This paper revisits the problem using the division model to parameterise the lens distortion, and demonstrates that the locus of distorted points from a straight line is a circular arc, which allows distortion estimation to be reformulated as circle-fitting for which many algorithms are available.
Abstract: A powerful and popular approach for estimating radial lens distortion parameters is to use the fact that lines which are straight in the scene should be imaged as straight lines under the pinhole camera model. This paper revisits this problem using the division model to parameterise the lens distortion. This turns out to have significant advantages over the more conventional parameterisation, especially for a single parameter model. In particular, we demonstrate that the locus of distorted points from a straight line is a circular arc. This allows distortion estimation to be reformulated as circle-fitting for which many algorithms are available. We compare a number of suboptimal methods offering closed-form solutions with an optimal, iterative technique which minimises a cost function on the actual image plane as opposed to existing techniques which suffer from a bias due to the fact that they optimise a geometric cost function on the undistorted image plane.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This paper investigates the extent to which factors affect graph spectra in practice, and whether they can be mitigated by choosing a particular matrix representation of the graph, and studies the use of the spectrum derived from the heat kernel matrix and path length distribution matrix.
Abstract: The spectrum of a graph has been widely used in graph theory to characterise the properties of a graph and extract information from its structure. It has been less popular as a representation for pattern matching for two reasons. Firstly, more than one graph may share the same spectrum. It is well known, for example, that very few trees can be uniquely specified by their spectrum. Secondly, the spectrum may change dramatically with a small change structure. In this paper we investigate the extent to which these factors affect graph spectra in practice, and whether they can be mitigated by choosing a particular matrix representation of the graph. There are a wide variety of graph matrix representations from which the spectrum can be extracted. In this paper we analyse the adjacency matrix, combinatorial Laplacian, normalised Laplacian and unsigned Laplacian. We also study the use of the spectrum derived from the heat kernel matrix and path length distribution matrix. We investigate the cospectrality of these matrices over large graph sets and show that the Euclidean distance between spectra tracks the edit distance over a wide range of edit costs, and we analyse the stability of this relationship. We then use the spectra to match and classify the graphs and demonstrate the effect of the graph matrix formulation on error rates.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: In this paper, the authors address the problem of learning Gaussian Mixture Models (GMMs) incrementally, where the current fit is updated with the assumption that the number of components is fixed, which is increased when enough evidence for a new component is seen.
Abstract: In this paper we address the problem of learning Gaussian Mixture Models (GMMs) incrementally. Unlike previous approaches which universally assume that new data comes in blocks representable by GMMs which are then merged with the current model estimate, our method works for the case when novel data points arrive oneby- one, while requiring little additional memory. We keep only two GMMs in the memory and no historical data. The current fit is updated with the assumption that the number of components is fixed, which is increased (or reduced) when enough evidence for a new component is seen. This is deduced from the change from the oldest fit of the same complexity, termed the Historical GMM, the concept of which is central to our method. The performance of the proposed method is demonstrated qualitatively and quantitatively on several synthetic data sets and video sequences of faces acquired in realistic imaging conditions

Proceedings ArticleDOI
01 Jan 2005
TL;DR: Extensive experiments on the Cohn-Kanade database illustrated that LBP features are effective for expression analysis, and CMIB enables much faster training than AdaBoost, and yields a classifier of improved c lassification performance.
Abstract: This paper proposes a novel approach for facial expression recognition by boosting Local Binary Patterns (LBP) based classifiers. L ow-cost LBP features are introduced to effectively describle local fea tures of face images. A novel learning procedure, Conditional Mutual Infomation based Boosting (CMIB), is proposed. CMIB learns a sequence of weak classifie rs that maximize their mutual information about a candidate class, conditional to the response of any weak classifier already selected; a strong cl assifier is constructed by combining the learned weak classifiers using the Naive-Bayes. Extensive experiments on the Cohn-Kanade database illustrated that LBP features are effective for expression analysis, and CMIB enables much faster training than AdaBoost, and yields a classifier of improved c lassification performance.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: An object classification method that can learn from a single training example that significantly improves the performance of the baseline algorithm and can combine the merits of many different classification methods.
Abstract: We describe an object classification method that can learn from a single training example. In this method, a novel class is characterized by its similarity to a number of previously learned, familiar classes. We demonstrate that this similarity is well-preserved across different class instances. As a result, it generalizes well to new instances of the novel class. A simple comparison of the similarity patterns is therefore sufficient to obtain useful classification performance from a single training example. The similarity between the novel class and the familiar classes in the proposed method can be evaluated using a wide variety of existing classification schemes. It can therefore combine the merits of many different classification methods. Experiments on a database of 107 widely varying object classes demonstrate that the proposed method significantly improves the performance of the baseline algorithm.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: The system can be used in any application requiring zero form factor and minimized or no contact with a medium, as in a large number of cases in human-to-computer interaction, virtual reality, game control, 3D designs, etc.
Abstract: In this paper, a complete system is presented which mimics a QWERTY keyboard on an arbitrary surface. The system consists of a pattern projector and a true-3D range camera for detecting the typing events. We exploit depth information acquired with the 3D range camera and detect the hand region using a pre-computed reference frame. The fingertips are found by analyzing the hands' contour and fitting the depth curve with different feature models. To detect a keystroke, we analyze the feature of the depth curve and map it back to a global coordinate system to find which key was pressed. These steps are fully automated and do not require human intervention. The system can be used in any application requiring zero form factor and minimized or no contact with a medium, as in a large number of cases in human-to-computer interaction, virtual reality, game control, 3D designs, etc.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: The key idea is that the frontal texture is estimated, and a correct estimation leads to the most consistent surface, and in addition to surface shape, a frontal view of the texture is also recovered.
Abstract: We present a method for Shape-from-Texture in one of its most general forms. Previous Shape-from-Texture papers assume that the texture is constrained by one or more of the following properties: homogeneity, isotropy, stationarity, or viewed orthographically. We make none of these assumptions. We do not presume that the frontal texture is known a priori, or from a known set, or even present in the image. Instead, surface smoothness is assumed, and the surface is recovered via a consistency constraint. The key idea is that the frontal texture is estimated, and a correct estimation leads to the most consistent surface. In addition to surface shape, a frontal view of the texture is also recovered. Results are given for synthetic and real examples.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: A new approach to overcome the problem caused by illumination variation in face recognition is proposed, and significantly better results are achieved for both automatic and semi-automatic face recognition experiments on LED illuminated faces than on face images under ambient illuminations.
Abstract: A new approach to overcome the problem caused by illumination variation in face recognition is proposed in this paper. Active Near-Infrared (Near-IR) illumination projected by a Light Emitting Diode (LED) light source is used to provide a constant illumination. The difference between two face images captured when the LED light is on and off respectively, is the image of a face under just the LED illumination, and is independent of ambient illumination. In preliminary experiments with various ambient illuminations, significantly better results are achieved for both automatic and semi-automatic face recognition experiments on LED illuminated faces than on face images under ambient illuminations.

Proceedings ArticleDOI
08 Sep 2005
TL;DR: A very fast method of estimating the camera rotation from a single frame which does not require any detection, matching or extraction of feature points and can be used as a motion estimator to reduce the search range for feature matching algorithms that may be subsequently applied to the image.
Abstract: Rapid camera rotations (e.g. camera shake) are a significant problem when real-time computer vision algorithms are applied to video from a handheld or head-mounted camera. Such camera motions cause image features to move large distances in the image and cause significant motion blur. Here we propose a very fast method of estimating the camera rotation from a single frame which does not require any detection, matching or extraction of feature points and can be used as a motion estimator to reduce the search range for feature matching algorithms that may be subsequently applied to the image. This method exploits the motion blur in the frame, using features which remain sharp to rapidly compute the axis of rotation of the camera, and using blurred features to estimate the magnitude of the camera's rotation.

Proceedings ArticleDOI
01 Sep 2005
TL;DR: A 3-D human-body tracker capable of handling fast and complex motions in real-time is introduced and a new evaluation scheme based on volumetric reconstruction and blobs-fitting, where appearance models and image evidences are represented by Gaussian mixtures is presented.
Abstract: In this paper, we introduce a 3-D human-body tracker capable of handling fast and complex motions in real-time. The parameter space, augmented with first order derivatives, is automatically partitioned into Gaussian clusters each representing an elementary motion: hypothesis propagation inside each cluster is therefore accurate and efficient. The transitions between clusters use the predictions of a Variable Length Markov Model which can explain highlevel behaviours over a long history. Using Monte-Carlo methods, evaluation of model candidates is critical for both speed and robustness. We present a new evaluation scheme based on volumetric reconstruction and blobs-fitting, where appearance models and image evidences are represented by Gaussian mixtures. We demonstrate the application of our tracker to long video sequences exhibiting rapid and diverse movements.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This paper investigates the utility of neuroscience inspired set of visual features in other common scene-understanding tasks and shows that this outstanding performance extends to shape-based object detection in the usual windowing framework, to amorphous object detection as a texture classification task, and finally to context understanding.
Abstract: Recently, a neuroscience inspired set of visual features was introduced. It was shown that this representation facilitates better performance than stateof-the-art vision systems for object recognition in cluttered and unsegmented images. In this paper, we investigate the utility of these features in other common scene-understanding tasks. We show that this outstanding performance extends to shape-based object detection in the usual windowing framework, to amorphous object detection as a texture classification task, and finally to context understanding These tasks are performed on a large set of images which were collected as a benchmark for the problem of scene understanding. The final system is able to reliably identify cars, pedestrians, bicycles, sky, road, buildings and trees in a diverse set of images.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: The lazy random walk is characterised using the commute time between nodes, and it is shown how this quantity may be computed from the Laplacian spectrum using the discrete Green’s function.
Abstract: This paper exploits the properties of the commute time to develop a graphspectral method for image segmentation. Our starting point is the lazy random walk on the graph, which is determined by the heat-kernel of the graph and can be computed from the spectrum of the graph Laplacian. We characterise the random walk using the commute time between nodes, and show how this quantity may be computed from the Laplacian spectrum using the discrete Green’s function. We explore the application of the commute time for image segmentation using the eigenvector corresponding to the smallest eigenvalue of the commute time matrix.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: The proposed method based on comparisons between corresponding vector subspaces is shown to outperform state-of-the-art methods in the literature and to demonstrate how boosting can be used for application-optimal principal angle fusion.
Abstract: In this paper we address the problem of classifying vector sets. We motivate and introduce a novel method based on comparisons between corresponding vector subspaces. In particular, there are two main areas of novelty: (i) we extend the concept of principal angles between linear subspaces to manifolds with arbitrary nonlinearities; (ii) it is demonstrated how boosting can be used for application-optimal principal angle fusion. The strengths of the proposed method are empirically demonstrated on the task of automatic face recognition (AFR), in which it is shown to outperform state-of-the-art methods in the literature.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: A robust approach to non-rigid object tracking in video sequences by a 2-dimensional point distribution model whose landmarks correspond to interest points that are automatically extracted from the object and described by their geometrical position and their local appearance.
Abstract: This paper presents a robust approach to non-rigid object tracking in video sequences. The object to track is described by a 2-dimensional point distribution model whose landmarks correspond to interest points that are automatically extracted from the object and described by their geometrical position and their local appearance. The approach is novel in that we describe the appearance locally instead of using the raw texture information. This provides a natural way to robustly handle partial occlusions. A second contribution is that we present a method that allows to learn the model automatically. Our algorithms have been successfully tested on several video streams taken from soccer games and video surveillance footage. They have been implemented with the aim of achieving near real-time performance.

Proceedings ArticleDOI
16 Sep 2005
TL;DR: An approach to recognise 10 elementary gestures is proposed and it can be applied to sign language recognition and can work reliably in real-time without relying on accurate tracking, and give a probabilistic output that is useful in complex motion analysis.
Abstract: An approach to recognise 10 elementary gestures is proposed and it can be applied to sign language recognition. In this work, a motion gradient orientation image is extracted directly from a raw video input and transformed to a motion feature vector. This feature vector is then classified into one of the 10 elementary gestures by a sparse Bayesian classifier. A training set of 628 samples and a testing set of over 1000 samples have been obtained to evaluate the proposed method. A real-time system was built and trained with the training set. From the experiment, the reported classification accuracy is 90% and the system can run in around 25 frames per second. Compared with other recently proposed methods that involve the use of hand tracking, the system can work reliably in real-time without relying on accurate tracking, and give a probabilistic output that is useful in complex motion analysis.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This work proposes a practical method for the recovery of projective depths, camera motion and non-rigid 3D shape from a sequence of images under strong perspective conditions based on minimizing 2D reprojection errors, solving the minimization as four weighted least squares problems.
Abstract: In this paper we address the problem of projective reconstruction for deformable objects. Recent work in non-rigid factorization has proved that it is possible to model deformations as a linear combination of basis shapes, allowing the recovery of camera motion and 3D shape under weak perspective viewing conditions. However, the performance of these methods degrades when the object of interest is close to the camera and strong perspective distortion is present in the data. The main contribution of this work is the proposal of a practical method for the recovery of projective depths, camera motion and non-rigid 3D shape from a sequence of images under strong perspective conditions. Our approach is based on minimizing 2D reprojection errors, solving the minimization as four weighted least squares problems. Results using synthetic and real data are given to illustrate the performance of our method.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This paper demonstrates how to determine canonical views of objects in a way that is simple, robust and versatile, and compares it qualitatively to alternatives.
Abstract: This paper demonstrates how to determine canonical views of objects in a way that is simple, robust and versatile. Common parlance loosely defines canonical views as the “front”, “side”, and “top” views of an object. Our approach determines these views for objects whether represented by images or three-dimensional points; neither image segmentation nor model analysis is required. It is easy to introduce constraints so that other views can be determined, as desired. We explain our method and compare it qualitatively to alternatives.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: Experiments show that the adaptive use of colour and orientation information improves over either feature taken separately, both in terms of tracking accuracy and of reduction of lost tracks, and the automatic scale selection for the derivative filters results in increased robustness.
Abstract: We propose an accurate tracking algorithm based on a multi-feature statistical model. The model combines in a single particle filter colour and gradient-based orientation information. A reliability measure derived from the particle distribution is used to adaptively weigh the contribution of the two features. Furthermore, information from the tracker is used to set the dimension of the filters for the computation of the gradient, effectively solving the scale selection problem. Experiments over a set of real-world sequences show that the adaptive use of colour and orientation information improves over either feature taken separately, both in terms of tracking accuracy and of reduction of lost tracks. Also, the automatic scale selection for the derivative filters results in increased robustness.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: A novel region-based matching approach for automatic 3D face recognition which is robust to facial expressions, facial hair, illumination changes and large occlusions and shows that the eyes-forehead is the most signicant region for 3Dface recognition.
Abstract: We present a novel region-based matching approach for automatic 3D face recognition which is robust to facial expressions, facial hair, illumination changes and large occlusions. Each 3D face in the gallery is segmented ofine into three disjoint regions, namely eyes-forehead, nose and cheeks. Recognition is performed on the basis of only the eyes-forehead and nose regions to avoid the effects of expressions and artifacts that occur in 3D faces due to a mustache or beard. These two regions of the gallery are matched with a probe using a modied version of the ICP algorithm and their matching scores are fused. The identity of the gallery face which gets the highest score is declared as the identity of the probe. Experiments were performed on the UND Biometrics Database which is so far the largest known database of 3D faces. We achieved a combined identication rate of 100% and a maximum verication rate of 99.42%. Our results also show that the eyes-forehead is the most signicant region for 3D face recognition with individual identication and verication rates of 97.32% and 97.25% respectively.