Showing papers in "Computer Vision and Image Understanding in 2008"
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
Abstract: This article presents a novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster. This is achieved by relying on integral images for image convolutions; by building on the strengths of the leading existing detectors and descriptors (specifically, using a Hessian matrix-based measure for the detector, and a distribution-based descriptor); and by simplifying these methods to the essential. This leads to a combination of novel detection, description, and matching steps. The paper encompasses a detailed description of the detector and descriptor and then explores the effects of the most important parameters. We conclude the article with SURF's application to two challenging, yet converse goals: camera calibration as a special case of image registration, and object recognition. Our experiments underline SURF's usefulness in a broad range of topics in computer vision.
TL;DR: An extensive evaluation of the unsupervised objective evaluation methods that have been proposed in the literature are presented and the advantages and shortcomings of the underlying design mechanisms in these methods are discussed and analyzed.
Abstract: Image segmentation is an important processing step in many image, video and computer vision applications. Extensive research has been done in creating many different approaches and algorithms for image segmentation, but it is still difficult to assess whether one algorithm produces more accurate segmentations than another, whether it be for a particular image or set of images, or more generally, for a whole class of images. To date, the most common method for evaluating the effectiveness of a segmentation method is subjective evaluation, in which a human visually compares the image segmentation results for separate segmentation algorithms, which is a tedious process and inherently limits the depth of evaluation to a relatively small number of segmentation comparisons over a predetermined set of images. Another common evaluation alternative is supervised evaluation, in which a segmented image is compared against a manually-segmented or pre-processed reference image. Evaluation methods that require user assistance, such as subjective evaluation and supervised evaluation, are infeasible in many vision applications, so unsupervised methods are necessary. Unsupervised evaluation enables the objective comparison of both different segmentation methods and different parameterizations of a single method, without requiring human visual comparisons or comparison with a manually-segmented or pre-processed reference image. Additionally, unsupervised methods generate results for individual images and images whose characteristics may not be known until evaluation time. Unsupervised methods are crucial to real-time segmentation evaluation, and can furthermore enable self-tuning of algorithm parameters based on evaluation results. In this paper, we examine the unsupervised objective evaluation methods that have been proposed in the literature. An extensive evaluation of these methods are presented. The advantages and shortcomings of the underlying design mechanisms in these methods are discussed and analyzed through analytical evaluation and empirical evaluation. Finally, possible future directions for research in unsupervised evaluation are proposed.
TL;DR: This survey covers the historical development and current state of the art in image understanding for iris biometrics and suggests a short list of recommended readings for someone new to the field to quickly grasp the big picture of irisBiometrics.
Abstract: This survey covers the historical development and current state of the art in image understanding for iris biometrics. Most research publications can be categorized as making their primary contribution to one of the four major modules in iris biometrics: image acquisition, iris segmentation, texture analysis and matching of texture representations. Other important research includes experimental evaluations, image databases, applications and systems, and medical conditions that may affect the iris. We also suggest a short list of recommended readings for someone new to the field to quickly grasp the big picture of iris biometrics.
TL;DR: This paper model the distribution of the texture features using a mixture of Gaussian distributions, allowing the mixture components to be degenerate or nearly-degenerate, and shows that such a mixture distribution can be effectively segmented by a simple agglomerative clustering algorithm derived from a lossy data compression approach.
Abstract: In this paper, we cast natural-image segmentation as a problem of clustering texture features as multivariate mixed data. We model the distribution of the texture features using a mixture of Gaussian distributions. Unlike most existing clustering methods, we allow the mixture components to be degenerate or nearly-degenerate. We contend that this assumption is particularly important for mid-level image segmentation, where degeneracy is typically introduced by using a common feature representation for different textures in an image. We show that such a mixture distribution can be effectively segmented by a simple agglomerative clustering algorithm derived from a lossy data compression approach. Using either 2D texture filter banks or simple fixed-size windows to obtain texture features, the algorithm effectively segments an image by minimizing the overall coding length of the feature vectors. We conduct comprehensive experiments to measure the performance of the algorithm in terms of visual evaluation and a variety of quantitative indices for image segmentation. The algorithm compares favorably against other well-known image-segmentation methods on the Berkeley image database.
TL;DR: This paper provides a brief description of each method, highlighting its basic assumptions and mathematical properties, and proposes some numerical benchmarks in order to compare the methods in terms of their efficiency and accuracy in the reconstruction of surfaces corresponding to synthetic, as well as to real images.
Abstract: Many algorithms have been suggested for the shape-from-shading problem, and some years have passed since the publication of the survey paper by Zhang et al. [R. Zhang, P.-S. Tsai, J.E. Cryer, M. Shah, Shape from shading: a survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (8) (1999) 690-706]. In this new survey paper, we try to update their presentation including some recent methods which seem to be particularly representative of three classes of methods: methods based on partial differential equations, methods using optimization and methods approximating the image irradiance equation. One of the goals of this paper is to set the comparison of these methods on a firm basis. To this end, we provide a brief description of each method, highlighting its basic assumptions and mathematical properties. Moreover, we propose some numerical benchmarks in order to compare the methods in terms of their efficiency and accuracy in the reconstruction of surfaces corresponding to synthetic, as well as to real images.
TL;DR: The proposed algorithm learns conformity in the traversed paths and hence the inter-camera relationships in the form of multivariate probability density of space-time variables (entry and exit locations, velocities, and transition times) using kernel density estimation.
Abstract: Tracking across cameras with non-overlapping views is a challenging problem. Firstly, the observations of an object are often widely separated in time and space when viewed from non-overlapping cameras. Secondly, the appearance of an object in one camera view might be very different from its appearance in another camera view due to the differences in illumination, pose and camera properties. To deal with the first problem, we observe that people or vehicles tend to follow the same paths in most cases, i.e., roads, walkways, corridors etc. The proposed algorithm uses this conformity in the traversed paths to establish correspondence. The algorithm learns this conformity and hence the inter-camera relationships in the form of multivariate probability density of space-time variables (entry and exit locations, velocities, and transition times) using kernel density estimation. To handle the appearance change of an object as it moves from one camera to another, we show that all brightness transfer functions from a given camera to another camera lie in a low dimensional subspace. This subspace is learned by using probabilistic principal component analysis and used for appearance matching. The proposed approach does not require explicit inter-camera calibration, rather the system learns the camera topology and subspace of inter-camera brightness transfer functions during a training phase. Once the training is complete, correspondences are assigned using the maximum likelihood (ML) estimation framework using both location and appearance cues. Experiments with real world videos are reported which validate the proposed approach.
TL;DR: Experiments and comparative results with multilevel thresholding methods over a synthetic histogram and real images show the efficiency of the proposed method.
Abstract: In this paper, a multilevel thresholding method which allows the determination of the appropriate number of thresholds as well as the adequate threshold values is proposed. This method combines a genetic algorithm with a wavelet transform. First, the length of the original histogram is reduced by using the wavelet transform. Based on this lower resolution version of the histogram, the number of thresholds and the threshold values are determined by using a genetic algorithm. The thresholds are then projected onto the original space. In this step, a refinement procedure may be added to detect accurate threshold values. Experiments and comparative results with multilevel thresholding methods over a synthetic histogram and real images show the efficiency of the proposed method.
TL;DR: A novel method based on principles from linear programming and, in particular, on primal-dual strategies that generalizes prior state-of-the-art methods and can be used for efficiently minimizing NP-hard problems with complex pair-wise potential functions.
Abstract: In this paper we introduce a novel method to address minimization of static and dynamic MRFs. Our approach is based on principles from linear programming and, in particular, on primal-dual strategies. It generalizes prior state-of-the-art methods such as @a-expansion, while it can also be used for efficiently minimizing NP-hard problems with complex pair-wise potential functions. Furthermore, it offers a substantial speedup - of a magnitude 10 - over existing techniques, due to the fact that it exploits information coming not only from the original MRF problem, but also from a dual one. The proposed technique consists of recovering pair of solutions for the primal and the dual such that the gap between them is minimized. Therefore, it can also boost performance of dynamic MRFs, where one should expect that the new pair of primal-dual solutions is closed to the previous one. Promising results in a number of applications, and theoretical, as well as numerical comparisons with the state of the art demonstrate the extreme potentials of this approach.
TL;DR: This work proposes a novel method for detecting and estimating the count of people in groups, dense or otherwise, as well as tracking them, using prior knowledge obtained from the scene and accurate camera calibration.
Abstract: The goal of this work is to provide a system which can aid in monitoring crowded urban environments, which often contain tight groups of people. In this paper, we consider the problem of counting the number of people in the scene and also tracking them reliably. We propose a novel method for detecting and estimating the count of people in groups, dense or otherwise, as well as tracking them. Using prior knowledge obtained from the scene and accurate camera calibration, the system learns the parameters required for estimation. This information can then be used to estimate the count of people in the scene, in real-time. Groups are tracked in the same manner as individuals, using Kalman filtering techniques. Favorable results are shown for groups of various sizes moving in an unconstrained fashion.
TL;DR: A novel MRF-based model for deformable image matching that is able to match much wider deformations than was considered previously in global optimization framework and applies TRW-S (Sequential Tree-Reweighted Message passing) algorithm to solve the relaxed problem.
Abstract: We propose a novel MRF-based model for deformable image matching (also known as registration). The deformation is described by a field of discrete variables, representing displacements of (blocks of) pixels. Discontinuities in the deformation are prohibited by imposing hard pairwise constraints in the model. Exact maximum a posteriori inference is intractable and we apply a linear programming relaxation technique. We show that, when reformulated in the form of two coupled fields of x- and y-displacements, the problem leads to a simpler relaxation to which we apply the sequential tree-reweighted message passing (TRW-S) algorithm [Wainwright-03, Kolmogorov-05]. This enables image registration with large displacements at a single scale. We employ fast message updates for a special type of interaction as was proposed [Felzenszwalb and Huttenlocher-04] for the max-product belief propagation (BP) and introduce a few independent speedups. In contrast to BP, the TRW-S allows us to compute per-instance approximation ratios and thus to evaluate the quality of the optimization. The performance of our technique is demonstrated on both synthetic and real-world experiments.
TL;DR: This paper designs and implements a novel graph-based min-cut/max-flow algorithm that incorporates topology priors as global constraints and introduces a label attribute for each node to explicitly handle the topology constraints.
Abstract: Topology is an important prior in many image segmentation tasks. In this paper, we design and implement a novel graph-based min-cut/max-flow algorithm that incorporates topology priors as global constraints. We show that the optimization of the energy function we consider here is NP-hard. However, our algorithm is guaranteed to find an approximate solution that conforms to the initialization, which is a desirable property in many applications since the globally optimum solution does not consider any initialization information. The key innovation of our algorithm is the organization of the search for maximum flow in a way that allows consideration of topology constraints. In order to achieve this, we introduce a label attribute for each node to explicitly handle the topology constraints, and we use a distance map to keep track of those nodes that are closest to the boundary. We employ the bucket priority queue data structure that records nodes of equal distance and we efficiently extract the node with minimal distance value. Our methodology of embedding distance functions in a graph-based algorithm is general and can also account for other geometric priors. Experimental results show that our algorithm can efficiently handle segmentation cases that are challenging for graph-cut algorithms. Furthermore, our algorithm is a natural choice for problems with rich topology priors such as object tracking.
TL;DR: The min-marginal energies obtained by the proposed algorithm are exact, as opposed to the ones obtained from other inference algorithms like loopy belief propagation and generalized belief propagation.
Abstract: In recent years graph cuts have become a popular tool for performing inference in Markov and conditional random fields. In this context the question arises as to whether it might be possible to compute a measure of uncertainty associated with the graph cut solutions. In this paper we answer this particular question by showing how the min-marginals associated with the label assignments of a random field can be efficiently computed using a new algorithm based on dynamic graph cuts. The min-marginal energies obtained by our proposed algorithm are exact, as opposed to the ones obtained from other inference algorithms like loopy belief propagation and generalized belief propagation. The paper also shows how min-marginals can be used for parameter learning in conditional random fields.
TL;DR: The experimental results show that the incremental and adaptive behaviour modelling approach is superior to a conventional batch-mode one in terms of both performance on abnormality detection and computational efficiency.
Abstract: We develop a novel visual behaviour modelling approach that performs incremental and adaptive model learning for online abnormality detection in a visual surveillance scene. The approach has the following key features that make it advantageous over previous ones: (1) Fully unsupervised learning: both feature extraction for behaviour pattern representation and model construction are carried out without the laborious and unreliable process of data labelling. (2) Robust abnormality detection: using Likelihood Ratio Test (LRT) for abnormality detection, the proposed approach is robust to noise in behaviour representation. (3) Online and incremental model construction: after being initialised using a small bootstrapping dataset, our behaviour model is learned incrementally whenever a new behaviour pattern is captured. This makes our approach computationally efficient and suitable for real-time applications. (4) Model adaptation to reflect changes in visual context. Online model structure adaptation is performed to accommodate changes in the definition of normality/abnormality caused by visual context changes. This caters for the need to reclassify what may initially be considered as being abnormal to be normal over time, and vice versa. These features are not only desirable but also necessary for processing large volume of unlabelled surveillance video data with visual context changing over time. The effectiveness and robustness of our approach are demonstrated through experiments using noisy datasets collected from a real world surveillance scene. The experimental results show that our incremental and adaptive behaviour modelling approach is superior to a conventional batch-mode one in terms of both performance on abnormality detection and computational efficiency.
TL;DR: This paper presents a novel framework for matching video sequences using the spatiotemporal segmentation of videos that uses interest point trajectories to generate video volumes and employs an Earth Mover's Distance based approach for the comparison of volume features.
Abstract: This paper presents a novel framework for matching video sequences using the spatiotemporal segmentation of videos. Instead of using appearance features for region correspondence across frames, we use interest point trajectories to generate video volumes. Point trajectories, which are generated using the SIFT operator, are clustered to form motion segments by analyzing their motion and spatial properties. The temporal correspondence between the estimated motion segments is then established based on most common SIFT correspondences. A two pass correspondence algorithm is used to handle splitting and merging regions. Spatiotemporal volumes are extracted using the consistently tracked motion segments. Next, a set of features including color, texture, motion, and SIFT descriptors are extracted to represent a volume. We employ an Earth Mover's Distance (EMD) based approach for the comparison of volume features. Given two videos, a bipartite graph is constructed by modeling the volumes as vertices and their similarities as edge weights. Maximum matching of this graph produces volume correspondences between the videos, and these volume matching scores are used to compute the final video matching score. Experiments for video retrieval were performed on a variety of videos obtained from different sources including BBC Motion Gallery and promising results were achieved. We present qualitative and quantitative analysis of retrieval along with a comparison with two baseline methods.
TL;DR: A weighted fragment based approach that tackles partial occlusion is proposed that is computationally simple enough to be executed in real-time and can be directly extended to a multiple object tracking system.
Abstract: Object tracking is critical to visual surveillance, activity analysis and event/gesture recognition. The major issues to be addressed in visual tracking are illumination changes, occlusion, appearance and scale variations. In this paper, we propose a weighted fragment based approach that tackles partial occlusion. The weights are derived from the difference between the fragment and background colors. Further, a fast and yet stable model updation method is described. We also demonstrate how edge information can be merged into the mean shift framework without having to use a joint histogram. This is used for tracking objects of varying sizes. Ideas presented here are computationally simple enough to be executed in real-time and can be directly extended to a multiple object tracking system.
TL;DR: A novel representation for human actions which encodes the variations in the shape and motion of the performing actor in an unified manner and is robust to viewpoint changes is presented.
Abstract: This paper presents a novel representation for human actions which encodes the variations in the shape and motion of the performing actor. When an actor performs an action, at each time instant, the outer object boundary is projected to the image plane as a 2D contour. A sequence of such contours forms a 3D volume in the spatiotemporal space. The differential geometric analysis of the volume surface results in a set of action descriptors. These descriptors constitute the action sketch which is used to represent the human actions. The action sketch captures the changes in the shape and motion of the performing actor in an unified manner. Since the action sketch is obtained from the extrema of the differential geometric surface features, it is robust to viewpoint changes. We demonstrate the versatility of the action sketch in the context of action recognition, which is formulated as a view geometric similarity problem.
TL;DR: This paper implemented a prototype mobile leaf image retrieval system, carried out various experiments for a database with 1,032 leaf images and implemented an adaptive grid-based matching algorithm based on the Nearest Neighbor (NN) search scheme.
Abstract: In this paper, we propose a new scheme for similarity-based leaf image retrieval. For the effective measurement of leaf similarity, we have considered shape and venation features together. In the shape domain, we construct a matrix of interest points to model the similarity between two leaf images. In order to improve the retrieval performance, we implemented an adaptive grid-based matching algorithm. Based on the Nearest Neighbor (NN) search scheme, this algorithm computes a minimum weight from the constructed matrix and uses it as similarity degree between two leaf images. This reduces necessary search space for matching. In the venation domain, we construct an adjacency matrix from the intersection and end points of a venation to model similarity between two leaf images. Based on these features, we implemented a prototype mobile leaf image retrieval system and carried out various experiments for a database with 1,032 leaf images. Experimental result shows that our scheme achieves a great performance enhancement compared to other existing methods.
TL;DR: Fully automatic methods are presented for the estimation of scene structure and camera motion from an image sequence acquired by a catadioptric system, and many experiments dealing with robustness, accuracy, uncertainty, comparisons between both central and non-central models, and piecewise planar 3D modeling are provided.
Abstract: Fully automatic methods are presented for the estimation of scene structure and camera motion from an image sequence acquired by a catadioptric system. The first contribution is the design of bundle adjustments for both central and non-central models, by taking care of the smoothness of the minimized error functions. The second contribution is an extensive experimental study for long sequences of catadioptric images in a context useful for applications: a hand-held and equiangular camera moving on the ground. An equiangular camera is non-central and provides uniform resolution in the image radial direction. Many experiments dealing with robustness, accuracy, uncertainty, comparisons between both central and non-central models, and piecewise planar 3D modeling are provided.
TL;DR: A new technique to compute belief propagation messages in time linear with respect to clique size for a large class of potential functions over real-valued variables and develops a form of nonparametric belief representation specifically designed to address issues common to networks with higher-order cliques.
Abstract: Belief propagation over pairwise-connected Markov random fields has become a widely used approach, and has been successfully applied to several important computer vision problems. However, pairwise interactions are often insufficient to capture the full statistics of the problem. Higher-order interactions are sometimes required. Unfortunately, the complexity of belief propagation is exponential in the size of the largest clique. In this paper, we introduce a new technique to compute belief propagation messages in time linear with respect to clique size for a large class of potential functions over real-valued variables. We discuss how this technique can be generalized to still wider classes of potential functions at varying levels of efficiency. Also, we develop a form of nonparametric belief representation specifically designed to address issues common to networks with higher-order cliques and also to the use of guaranteed-convergent forms of belief propagation. To illustrate these techniques, we perform efficient inference in graphical models where the spatial prior of natural images is captured by 2x2 cliques. This approach shows significant improvement over the commonly used pairwise-connected models, and may benefit a variety of applications using belief propagation to infer images or range images, including stereo, shape-from-shading, image-based rendering, segmentation, and matting.
TL;DR: What is seen as current best practices in algorithmic novelty and the increasing importance of validation on particular data sets and problems are reviewed and refinements that may benefit the field of computer vision are suggested.
Abstract: It is frequently remarked that designers of computer vision algorithms and systems cannot reliably predict how algorithms will respond to new problems. A variety of reasons have been given for this situation and a variety of remedies prescribed in literature. Most of these involve, in some way, paying greater attention to the domain of the problem and to performing detailed empirical analysis. The goal of this paper is to review what we see as current best practices in these areas and also suggest refinements that may benefit the field of computer vision. A distinction is made between the historical emphasis on algorithmic novelty and the increasing importance of validation on particular data sets and problems.
TL;DR: A new computational framework for modelling visual-object-based attention and attention-driven eye movements within an integrated system in a biologically inspired approach is presented, resulting in sophisticated performance in complicated natural scenes.
Abstract: This paper presents a new computational framework for modelling visual-object-based attention and attention-driven eye movements within an integrated system in a biologically inspired approach. Attention operates at multiple levels of visual selection by space, feature, object and group depending on the nature of targets and visual tasks. Attentional shifts and gaze shifts are constructed upon their common process circuits and control mechanisms but also separated from their different function roles, working together to fulfil flexible visual selection tasks in complicated visual environments. The framework integrates the important aspects of human visual attention and eye movements resulting in sophisticated performance in complicated natural scenes. The proposed approach aims at exploring a useful visual selection system for computer vision, especially for usage in cluttered natural visual environments.
TL;DR: Two new fusion quality indexes are proposed and implemented through using the phase congruency measurement of the input images to provide a blind evaluation of the image fusion result, i.e. no reference image is needed.
Abstract: Pixel-level image fusion has been investigated in various applications and a number of algorithms have been developed and proposed. However, few authors have addressed the problem of how to assess the performance of those algorithms and evaluate the resulting fused images objectively and quantitatively. In this study, two new fusion quality indexes are proposed and implemented through using the phase congruency measurement of the input images. Therefore, the feature-based measurements can provide a blind evaluation of the image fusion result, i.e. no reference image is needed. These metrics take the advantage of the phase congruency measurement which provides a dimensionless contrast- and brightness-invariant representation of image features. The fusion quality indexes are compared with recently developed blind evaluation metrics. The validity of the new metrics are identified by the test on the fusion results achieved by a number of multiresolution pixel-level fusion algorithms.
TL;DR: Examples of useful strategies that can be employed to improve the performance of shape matching algorithms are described, which significantly improves shape database retrieval accuracy.
Abstract: Skeletons are often used as a framework for part-based shape description and matching. This paper describes some useful strategies that can be employed to improve the performance of such shape matching algorithms. Firstly, it is important that ligature-sensitive information be incorporated into the part decomposition and shape matching processes. Secondly, part decomposition should be treated as a dynamic process in which the selection of the final decomposition of a shape is deferred until the shape matching stage. Thirdly, both local and global measures must be employed when computing shape dissimilarity. Finally, skeletal segments must be weighted by appropriate visual saliency measures during the part matching process. These saliency measures include curvature and ligature-based measures. Experimental results show that the incorporation of these strategies significantly improves shape database retrieval accuracy.
TL;DR: A Selective Coefficient Mask Shift (SCMShift) coding method, implemented over regions of interest (ROIs), is proposed, based on shifting the wavelet coefficients that belong to different subbands, depending on the coefficients relative to the original image.
Abstract: Image compression can improve the performance of the digital systems by reducing time and cost in image storage and transmission without significant reduction of the image quality. Furthermore, the JPEG2000 has emerged as the new state-of-the art standard for image compression. In this paper, a Selective Coefficient Mask Shift (SCMShift) coding method is proposed. The technique, implemented over regions of interest (ROIs), is based on shifting the wavelet coefficients that belong to different subbands, depending on the coefficients relative to the original image. This method allows: (1) codification of multiple ROIs at various degrees of interest, (2) arbitrary shaped ROI coding, and (3) flexible adjustment of the compression quality of the ROI and the background. No standard modification for JPEG200 decoder was required. The method was applied over different types of images. Results show a better performance for the selected regions, when ROI coding methods were employed for the whole set of images. We believe that this method is an excellent tool for future image compression research, mainly on images where ROI coding can be of interest, such as the medical imaging modalities and several multimedia applications.
TL;DR: Experimental results using face images of the UTK-LRHM database demonstrate a significant improvement in recognition rates after assessment and enhancement of degradations.
Abstract: In this paper, we describe a face video database, UTK-LRHM, acquired from long distances and with high magnifications. Both indoor and outdoor sequences are collected under uncontrolled surveillance conditions. To our knowledge, it is the first database to provide face images from long distances (indoor: 10-16m and outdoor: 50-300m). The corresponding system magnifications range from 3x to 20x for indoor and up to 284x for outdoor. This database has applications in experimentations with human identification and authentication in long range surveillance and wide area monitoring. Deteriorations unique to long range and high magnification face images are investigated in terms of face recognition rates based on the UTK-LRHM database. Magnification blur is shown to be a major degradation source, the effect of which is quantified using a novel blur assessment measure and alleviated via adaptive deblurring algorithms. A comprehensive processing algorithm, including frame selection, enhancement, and super-resolution is introduced for long range and high magnification face images with a large variety of resolutions. Experimental results using face images of the UTK-LRHM database demonstrate a significant improvement in recognition rates after assessment and enhancement of degradations.
TL;DR: This paper proposes a novel stereo rectification method for dual-PTZ-camera system, which is essential to greatly increase the efficiency of stereo matching and results show that this approach works well.
Abstract: The research of traditional stereo vision is mainly based on static cameras. As PTZ (Pan-Tilt-Zoom) cameras are able to obtain multi-view-angle and multi-resolution information, they have received more and more concern in both research and real application. Stereo vision using dual-PTZ-camera system, compared with using dual-static-camera system, is much more challenging. Dual-PTZ-camera system could have more extensive scope of application by combining the merits of PTZ-camera. However, few works about stereo vision with dual-PTZ-camera system were found in literature. In this paper, we propose a novel stereo rectification method for dual-PTZ-camera system, which is essential to greatly increase the efficiency of stereo matching. In dual-PTZ-camera system, the inconsistence of intensities in two camera images, which is caused by camera's self-adjustment of intensity under different illumination condition with different view fields, is also a challenge in stereo matching. In order to deal with this problem, we propose a two-step based stereo matching strategy. Experimental results show that our approach works well.
TL;DR: A 3-D human-body tracker capable of handling fast and complex motions in real-time is introduced, built upon the Monte-Carlo Bayesian framework, and novel prediction and evaluation methods improving the robustness and efficiency of the tracker are proposed.
Abstract: In this paper, we introduce a 3-D human-body tracker capable of handling fast and complex motions in real-time. We build upon the Monte-Carlo Bayesian framework, and propose novel prediction and evaluation methods improving the robustness and efficiency of the tracker. The parameter space, augmented with first order derivatives, is automatically partitioned into Gaussian clusters each representing an elementary motion: hypothesis propagation inside each cluster is therefore accurate and efficient. The transitions between clusters use the predictions of a variable length Markov model which can explain high-level behaviours over a long history. Using Monte-Carlo methods, evaluation of model candidates is critical for both speed and robustness. We present a new evaluation scheme based on hierarchical 3-D reconstruction and blob-fitting, where appearance models and image evidences are represented by mixtures of Gaussian blobs. Our tracker is also capable of automatic-initialisation and self-recovery. We demonstrate the application of our tracker to long video sequences exhibiting rapid and diverse movements.
TL;DR: An object-based approach based on the F-Measure-a single-valued ROC-like measure which enables a straight-forward mechanism for both optimising and comparing motion detection algorithms.
Abstract: The majority of visual surveillance algorithms rely on effective and accurate motion detection. However, most evaluation techniques described in literature do not address the complexity and range of the issues which underpin the design of a good evaluation methodology. In this paper, we explore the problems associated with both the optimising the operating point of any motion detection algorithms and the objective performance comparison of competing algorithms. In particular, we develop an object-based approach based on the F-Measure-a single-valued ROC-like measure which enables a straight-forward mechanism for both optimising and comparing motion detection algorithms. Despite the advantages over pixel-based ROC approaches, a number of important issues associated with parameterising the evaluation algorithm need to be addressed. The approach is illustrated by a comparison of three motion detection algorithms including the well-known Stauffer and Grimson algorithm, based on results obtained on two datasets.
TL;DR: The fuzzy metric peer group concept is used to build novel switching vector filters andComparisons are provided to show that the proposed approach suppresses impulsive noise, while preserving image details.
Abstract: A new method for removing impulsive noise in color images is presented The fuzzy metric peer group concept is used to build novel switching vector filters In the proposed filtering procedure, a set of noise-free pixels of high reliability is determined by applying a highly restrictive condition based on the peer group concept Afterwards, an iterative detection process is used to refine the initial findings by detecting additional noise-free pixels Finally, noisy pixels are filtered by maximizing the employed fuzzy distance criterion between the pixels inside the filter window Comparisons are provided to show that our approach suppresses impulsive noise, while preserving image details In addition, the method is analyzed in order to justify the necessity of the iterative process and demonstrate the computational efficiency of the proposed approach
TL;DR: This paper addresses the task of human gait and activity analysis from image sequences by learning and recognition of sequential data under a general integrated framework and carries out extensive experiments in three related domains: human activity recognition, abnormal gait analysis, and gait-based human identification.
Abstract: Human motion analysis is increasingly attracting much attention from computer vision researchers. This paper aims to address the task of human gait and activity analysis from image sequences by learning and recognition of sequential data under a general integrated framework. Human movements generally exhibit intrinsically nonlinear spatiotemporal characteristics in the high-dimensional ambient space. An attractive framework, which we explore here, is to: (1) Extract simple and reliable features from image sequences. (2) Find a low-dimensional feature representation embedded in high-dimensional image data. (3) Then characterize/classify the motions in this low-dimensional feature space. We examine two simple alternatives for step 1: silhouette and a distance transformed silhouette; and three quite different methods for step 3: Gaussian mixture models (GMM) based classification, a matching-based approach with the mean Hausdorff distance, and continuous hidden Markov models (HMM) based modelling and recognition. The core is step 2 where we choose to use LPP (locality preserving projections), an optimal linear approximation to a nonlinear spectral embedding technique (i.e., Laplacian eigenmap). In essence our aim is to see whether this core, together with simple approaches to steps 1 and 3, can solve problems across several types of human gait and activity. To see how well the proposed framework performs, we carry out extensive experiments in three related domains: human activity recognition, abnormal gait analysis, and gait-based human identification. The experimental results show that the proposed framework performs well across all three areas.