Showing papers in "Image and Vision Computing in 2004"
TL;DR: The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes.
Abstract: The wide-baseline stereo problem, i.e. the problem of establishing correspondences between a pair of images taken from different viewpoints is studied. A new set of image elements that are put into correspondence, the so called extremal regions , is introduced. Extremal regions possess highly desirable properties: the set is closed under (1) continuous (and thus projective) transformation of image coordinates and (2) monotonic transformation of image intensities. An efficient (near linear complexity) and practically fast detection algorithm (near frame rate) is presented for an affinely invariant stable subset of extremal regions, the maximally stable extremal regions (MSER). A new robust similarity measure for establishing tentative correspondences is proposed. The robustness ensures that invariants from multiple measurement regions (regions obtained by invariant constructions from extremal regions), some that are significantly larger (and hence discriminative) than the MSERs, may be used to establish tentative correspondences. The high utility of MSERs, multiple measurement regions and the robust metric is demonstrated in wide-baseline experiments on image pairs from both indoor and outdoor scenes. Significant change of scale (3.5×), illumination conditions, out-of-plane rotation, occlusion, locally anisotropic scale change and 3D translation of the viewpoint are all present in the test problems. Good estimates of epipolar geometry (average distance from corresponding points to the epipolar line below 0.09 of the inter-pixel distance) are obtained.
3,422 citations
TL;DR: A robust algorithm, called CHEVP, is presented for providing a good initial position for the B-Snake model, and a minimum error method by Minimum Mean Square Error (MMSE) is proposed to determine the control points of the B -Snake model by the overall image forces on two sides of lane.
Abstract: In this paper, we proposed a B-Snake based lane detection and tracking algorithm without any cameras' parameters. Compared with other lane models, the B-Snake based lane model is able to describe a wider range of lane structures since B-Spline can form any arbitrary shape by a set of control points. The problems of detecting both sides of lane markings (or boundaries) have been merged here as the problem of detecting the mid-line of the lane, by using the knowledge of the perspective parallel lines. Furthermore, a robust algorithm, called CHEVP, is presented for providing a good initial position for the B-Snake. Also, a minimum error method by Minimum Mean Square Error (MMSE) is proposed to determine the control points of the B-Snake model by the overall image forces on two sides of lane. Experimental results show that the proposed method is robust against noise, shadows, and illumination variations in the captured road images. It is also applicable to the marked and the unmarked roads, as well as the dash and the solid paint line roads.
812 citations
TL;DR: A new randomized (hypothesis evaluation) version of the ransac algorithm, r-ransac, is introduced and a mathematically tractable class of statistical preverification test of samples is introduced that derives an approximate relation for the optimal setting of its single parameter.
Abstract: Many computer vision algorithms include a robust estimation step where model parameters are computed from a data set containing a significant proportion of outliers. The ransac algorithm is possibly the most widely used robust estimator in the field of computer vision. In the paper we show that under a broad range of conditions, ransac efficiency is significantly improved if its hypothesis evaluation step is randomized . A new randomized (hypothesis evaluation) version of the ransac algorithm, r-ransac , is introduced. Computational savings are achieved by typically evaluating only a fraction of data points for models contaminated with outliers. The idea is implemented in a two-step evaluation procedure. A mathematically tractable class of statistical preverification test of samples is introduced. For this class of preverification test we derive an approximate relation for the optimal setting of its single parameter. The proposed pre-test is evaluated on both synthetic data and real-world problems and a significant increase in speed is shown.
297 citations
TL;DR: The retrieval using the MCM is better than the CCM since it captures the third order image statistics in the local neighborhood and the use of MCM considerably improves the retrieval performance.
Abstract: We present a new technique for content based image retrieval using motif cooccurrence matrix (MCM). The MCM is derived from the motif transformed image. The whole image is divided into 2×2 pixel grids. Each grid is replaced by a scan motif that minimizes the local gradient while traversing the 2×2 grid forming a motif transformed image. The MCM is then defined as a 3D matrix whose (i,j,k) entry denotes the probability of finding a motif i at a distance k from the motif j in the transformed image. Conceptually, the MCM is quite similar to the color cooccurrence matrix (CCM), however, the retrieval using the MCM is better than the CCM since it captures the third order image statistics in the local neighborhood. Experiments confirm that the use of MCM considerably improves the retrieval performance.
293 citations
TL;DR: The proposed area-based stereo algorithm relies on the uniqueness constraint and on a matching process that rejects previous matches as soon as more reliable ones are found, and is compared with bidirectional matching.
Abstract: This paper proposes an area-based stereo algorithm suitable to real time applications. The core of the algorithm relies on the uniqueness constraint and on a matching process that rejects previous matches as soon as more reliable ones are found. The proposed approach is also compared with bidirectional matching (BM), since the latter is the basic method for detecting unreliable matches in most area-based stereo algorithms. We describe the algorithm's matching core, the additional constraints introduced to improve the reliability and the computational optimizations carried out to achieve a very fast implementation. We provide a large set of experimental results, obtained on a standard set of images with ground-truth as well as on stereo sequences, and computation time measurements. These data are used to evaluate the proposed algorithm and compare it with a well-known algorithm based on BM.
273 citations
TL;DR: A vehicle tracking algorithm is presented based on the combination of a novel per-pixel (Gaussian Mixture Based) background model and a set of foreground models of object size, position, velocity, and colour distribution, which is robust assuming sufficient image resolution is available and vehicle sizes do not greatly exceed the priors on object size used in object initialisation.
Abstract: In this paper a vehicle tracking algorithm is presented based on the combination of a novel per-pixel (Gaussian Mixture Based) background model and a set of foreground models of object size, position, velocity, and colour distribution. Each pixel in the scene is ‘explained’ as either background, belonging to a foreground object, or as noise. A projective ground-plane transform is used within the foreground model to strengthen object size and velocity consistency assumptions. A learned model of typical road travel direction and speed is used to provide a prior estimate of object velocity, which is used to initialise the velocity model for each of the foreground objects. The system runs at near video framerate (>20 fps) on modest hardware and is robust assuming sufficient image resolution is available and vehicle sizes do not greatly exceed the priors on object size used in object initialisation.
265 citations
TL;DR: It is shown that kernel density estimation applied in the joint spatial–range domain yields a powerful processing paradigm—the mean shift procedure, related to bilateral filtering but having additional flexibility, which establishes an attractive relationship between the theory of statistics and that of diffusion and energy minimization.
Abstract: In this paper, a common framework is outlined for nonlinear diffusion, adaptive smoothing, bilateral filtering and mean shift procedure. Previously, the relationship between bilateral filtering and the nonlinear diffusion equation was explored by using a consistent adaptive smoothing formulation. However, both nonlinear diffusion and adaptive smoothing were treated as local processes applying a 3×3 window at each iteration. Here, these two approaches are extended to an arbitrary window, showing their equivalence and stressing the importance of using large windows for edge-preserving smoothing. Subsequently, it follows that bilateral filtering is a particular choice of weights in the extended diffusion process that is obtained from geometrical considerations. We then show that kernel density estimation applied in the joint spatial–range domain yields a powerful processing paradigm—the mean shift procedure, related to bilateral filtering but having additional flexibility. This establishes an attractive relationship between the theory of statistics and that of diffusion and energy minimization. We experimentally compare the discussed methods and give insights on their performance.
246 citations
TL;DR: In a contour detection task, the Canny operator augmented with the proposed suppression and post-processing step achieves better results than the traditional Canny edge detector and the SUSAN edge detector.
Abstract: We propose a computational step, called surround suppression, to improve detection of object contours and region boundaries in natural scenes. This step is inspired by the mechanism of non-classical receptive field inhibition that is exhibited by most orientation selective neurons in the primary visual cortex and that influences the perception of groups of edges or lines. We illustrate the principle and the effect of surround suppression by adding this step to the Canny edge detector. The resulting operator responds strongly to isolated lines and edges, region boundaries, and object contours, but exhibits a weaker or no response to texture edges. Additionally, we introduce a new post-processing method that further suppresses texture edges. We use natural images with associated subjectively defined desired output contour and boundary maps to evaluate the performance of the proposed additional steps. In a contour detection task, the Canny operator augmented with the proposed suppression and post-processing step achieves better results than the traditional Canny edge detector and the SUSAN edge detector. The performance gain is highest at scales for which these latter operators strongly react to texture in the input image.
238 citations
TL;DR: A new camera based automatic system that utilizes Kalman filtering in tracking and Learning Vector Quantization for classifying the observations to pedestrians and cyclists is described.
Abstract: Camera based systems are routinely used for monitoring highway traffic, supplementing inductive loops and microwave sensors employed for counting purposes. These techniques achieve very good counting accuracy and are capable of discriminating trucks and cars. However, pedestrians and cyclists are mostly counted manually. In this paper, we describe a new camera based automatic system that utilizes Kalman filtering in tracking and Learning Vector Quantization for classifying the observations to pedestrians and cyclists. Both the requirements for such systems and the algorithms used are described. The tests performed show that the system achieves around 80–90% accuracy in counting and classification.
206 citations
TL;DR: A novel neural classifier LImited Receptive Area (LIRA) for the image recognition that contains three neuron layers: sensor, associative and output layers and shows sufficiently good results in task of the pin–hole position estimation.
Abstract: We have developed a novel neural classifier LImited Receptive Area (LIRA) for the image recognition. The classifier LIRA contains three neuron layers: sensor, associative and output layers. The sensor layer is connected with the associative layer with no modifiable random connections and the associative layer is connected with the output layer with trainable connections. The training process converges sufficiently fast. This classifier does not use floating point and multiplication operations. The classifier was tested on two image databases. The first database is the MNIST database. It contains 60,000 handwritten digit images for the classifier training and 10,000 handwritten digit images for the classifier testing. The second database contains 441 images of the assembly microdevice. The problem under investigation is to recognize the position of the pin relatively to the hole. A random procedure was used for partition of the database to training and testing subsets. There are many results for the MNIST database in the literature. In the best cases, the error rates are 0.7, 0.63 and 0.42%. The classifier LIRA gives error rate of 0.61% as a mean value of three trials. In task of the pin–hole position estimation the classifier LIRA also shows sufficiently good results.
175 citations
TL;DR: A novel algorithm for face detection is developed by combining the Eigenface and SVM methods which performs almost as fast as theEigenface method but with a significant improved speed.
Abstract: Detecting faces across multiple views is more challenging than in a fixed view, e.g. frontal view, owing to the significant non-linear variation caused by rotation in depth, self-occlusion and self-shadowing. To address this problem, a novel approach is presented in this paper. The view sphere is separated into several small segments. On each segment, a face detector is constructed. We explicitly estimate the pose of an image regardless of whether or not it is a face. A pose estimator is constructed using Support Vector Regression. The pose information is used to choose the appropriate face detector to determine if it is a face. With this pose-estimation based method, considerable computational efficiency is achieved. Meanwhile, the detection accuracy is also improved since each detector is constructed on a small range of views. We developed a novel algorithm for face detection by combining the Eigenface and SVM methods which performs almost as fast as the Eigenface method but with a significant improved speed. Detailed experimental results are presented in this paper including tuning the parameters of the pose estimators and face detectors, performance evaluation, and applications to video based face detection and frontal-view face recognition. q 2004 Elsevier B.V. All rights reserved.
TL;DR: This work describes a new combination of plan-View statistics that better represents the shape of tracked objects and provides a more robust substrate for person detection and tracking than prior plan-view algorithms, and introduces a new method ofPlan-view person tracking, using adaptive statistical templates and Kalman prediction.
Abstract: As the cost of computing per-pixel depth imagery from stereo cameras in real time has fallen rapidly in recent years, interest in using stereo vision for person tracking has greatly increased. Methods that attempt to track people directly in these ‘camera-view’ depth images are confronted by their substantial amounts of noise and unreliable data. Some recent methods have therefore found it useful to first compute overhead, ‘plan-view’ statistics of the depth data, and then track people in images of these statistics. We describe a new combination of plan-view statistics that better represents the shape of tracked objects and provides a more robust substrate for person detection and tracking than prior plan-view algorithms. We also introduce a new method of plan-view person tracking, using adaptive statistical templates and Kalman prediction. Adaptive templates provide more detailed models of tracked objects than prior choices such as Gaussians, and we illustrate that the typical problems with template-based tracking in camera-view images are easily avoided in a plan-view framework. We compare results of our method with those for techniques using different plan-view statistics or person models, and find our method to exhibit superior tracking through challenging phenomena such as complex inter-person occlusions and close interactions. Reasonable values for most system parameters may be derived from physically measurable quantities such as average person dimensions.
TL;DR: Four different types of corner extractors are analyzed, which have been widely used for a variety of applications, and corner stability and corner localization properties are used as measures to evaluate the quality of the features extracted by the four detectors.
Abstract: In this paper we assess the performance of a variety of corner (point) detecting algorithms for feature tracking applications. We analyze four different types of corner extractors, which have been widely used for a variety of applications (they are described later in the paper). We use corner stability and corner localization properties as measures to evaluate the quality of the features extracted by the four detectors. For effective assessment of the corner detectors, first, we employed image sequences with no motion (simply static image sequences), so that the appearance and disappearance of corners in each frame is purely due to image plane noise and illumination conditions. The second stage included experiments on sequences with small motion. The experiments were devised to make the testing environment ideal to analyze the stability and localization properties of the corners extracted. The corners detected from the initial frame are then matched through the sequence using a corner matching strategy. We employed two different types of matchers, namely the GVM (Gradient Vector Matcher) and the Product Moment Coefficient Matcher (PMCM). Each of the corner detectors was tested with each of the matching algorithms to evaluate their performance in tracking (matching) the features. The experiments were carried out on a variety of image sequences with and without motion.
TL;DR: An image tracking system and its applications for traffic monitoring and accident detection at road intersections using the active contour model approach and a contour initialization method based on the concept of contour growing are presented.
Abstract: This paper presents an image tracking system and its applications for traffic monitoring and accident detection at road intersections. Locations of motorcycles as well as automobiles are obtained in real time using the active contour model approach. Image measurement is further incorporated with Kalman filtering techniques to track individual vehicle motion. To initialize image tracking of vehicles at a junction, we propose a contour initialization method based on the concept of contour growing. Using a specially designed circuit board, a stand-alone image tracker has been designed and created for automatic traffic monitoring. We successfully achieved real-time image tracking of multi-lane vehicles. Interesting experimental results are presented to demonstrate the effectiveness of the proposed system.
TL;DR: It is shown that more can be achieved than simply combining the sensor data within a statistical filter: besides using inertial data to provide predictions for theVisual sensor, this data can be used to dynamically tune the parameters of each feature detector in the visual sensor.
Abstract: This paper presents a novel method for increasing the robustness of visual tracking systems by incorporating information from inertial sensors. We show that more can be achieved than simply combining the sensor data within a statistical filter: besides using inertial data to provide predictions for the visual sensor, this data can be used to dynamically tune the parameters of each feature detector in the visual sensor. This allows the visual sensor to provide useful information even in the presence of substantial motion blur. Finally, the visual sensor can be used to calibrate the parameters of the inertial sensor to eliminate drift.
TL;DR: This paper proposes similarity measures based on neighbourhoods, so that the relevant structures of the images are observed better and 13 new similarity measures were found to be appropriate for the comparison of images.
Abstract: Fuzzy techniques can be applied in several domains of image processing. In this paper, we will show how notions of fuzzy set theory are used in establishing measures for image comparison. Objective quality measures or measures of comparison are of great importance in the field of image processing. These measures serve as a tool to evaluate and to compare different algorithms designed to solve problems, such as noise reduction, deblurring, compression, etc. Consequently these measures serve as a basis on which one algorithm is preferred to another. It is well known that classical quality measures, such as the MSE (mean square error) or the PSNR (peak-signal-to-noise-ratio), do not always correspond to visual observations. Therefore, several researchers are—and have been—looking for new quality measures, better adapted to human perception. Van der Weken et al. [Proceedings of ICASSP'2002, Orlando, 2002] gave an overview of similarity measures, originally introduced to express the degree of comparison between two fuzzy sets, which can be applied to images. These similarity measures are all pixel-based, and have therefore not always satisfactory results. To cope with this drawback, we propose similarity measures based on neighbourhoods, so that the relevant structures of the images are observed better. In this way, 13 new similarity measures were found to be appropriate for the comparison of images.
TL;DR: This work proposes new algorithms to extract and track the positions of eyes in a real-time video stream using a template of ‘Between-the-Eyes,’ which is updated frame-by-frame, instead of the eyes themselves.
Abstract: A head-off gaze-camera needs eye location information for head-free usage. For this purpose, we propose new algorithms to extract and track the positions of eyes in a real-time video stream. For extraction of eye positions, we detect blinks based on the differences between successive images. However, eyelid regions are fairly small. To distinguish them from dominant head movement, we elaborate a head movement cancellation process. For eye-position tracking, we use a template of ‘Between-the-Eyes,’ which is updated frame-by-frame, instead of the eyes themselves. Eyes are searched based on the current position of ‘Between-the-Eyes’ and their geometrical relations to the position in the previous frame. The ‘Between-the-Eyes’ pattern is easier to locate accurately than eye patterns. We implemented the system on a PC with a Pentium III 866-MHz CPU. The system runs at 30 frames/s and robustly detects and tracks the eyes.
TL;DR: This paper presents the vision-based technology which allows one in such a setup to significantly enhance the perceptual power of the computer and provides a complete solution for building intelligent hands-free input devices.
Abstract: Due to recent increase of computer power and decrease of camera cost, it became very common to see a camera on top of a computer monitor. This paper presents the vision-based technology which allows one in such a setup to significantly enhance the perceptual power of the computer. The described techniques for tracking a face using a convex-shape nose feature as well as for face-tracking with two off-theshelf cameras allow one to track faces robustly and precisely in both 2D and 3D with low resolution cameras. Supplemented by the mechanism for detecting multiple eye blinks, this technology provides a complete solution for building intelligent hands-free input devices. The theory behind the technology is presented. The results from running several perceptual user interfaces built with this technology are shown.
TL;DR: A new contour tracker based on unscented Kalman Filter that is superior to extended Kalman filter both in theory and in many practical situations, and employs a more accurate nonlinear measurement model, without computation of a Jacobian matrix.
Abstract: Visual contour tracking in complex background is a difficult task. The measurement model is often nonlinear due to clutter in images. Traditional visual trackers based on Kalman filters employ simple linear measurement models, and often collapse during the tracking process. This paper presents a new contour tracker based on unscented Kalman filter that is superior to extended Kalman filter both in theory and in many practical situations. The new tracker employs a more accurate nonlinear measurement model, without computation of a Jacobian matrix. During each time step, the tracker makes multiple measurements in terms of the set of appropriately chosen sample points, thus obtaining the best observation according to the measurement density. The resulting algorithm is able to obtain a more exact estimate of the state of the system, while having the same order of complexity as that of an extend Kalman Filter. The experiments show that the new algorithm is superior to those based on Kalman filters.
TL;DR: This paper proposes specific verification technology by making use of hand-based features by using the positive Boolean function (PBF) and the bootstrapping method to verify the identity of samples possessing confused hand shapes.
Abstract: Biometrics-based verification is an effective approach to personal authentication using biological features extracted from the individual In this paper, we propose specific verification technology by making use of hand-based features Two hand-based features, the hand geometry and the palmprint, are simultaneously grabbed by the CCD camera-based devices Basically, geometrical features of the hands are used to roughly verify the identity The samples possessing the confused hand shapes should be to re-check by the palmprint features First, the crucial points and the ROI of palmprint are determined in the preprocessing stage The hand shape features of length 11 are computed from these detected points Next, the multi-resolutional palmprint features are extracted from the ROI and the three middle fingers In that way the reference vectors are obtained for computing the similarity values in various resolutions In addition, the various verified results in multiple resolutions are integrated to achieve a better performance by using the positive Boolean function (PBF) and the bootstrapping method Experimental results were conducted to show the effectiveness of our proposed approaches
TL;DR: The neural-edge-based vehicle detection method is effective and the correct rate of vehicle detection is higher than 96%, independent of environmental conditions.
Abstract: Vehicle detection is a fundamental component of image-based traffic monitoring system. In this paper, we propose a neural-edge-based vehicle detection method to improve the accuracy of vehicle detection and classification. In this method, the feature information is extracted by the seed-filling-based method and is presented to the input of neural network for vehicle detection and classification. The neural-edge-based vehicle detection method is effective and the correct rate of vehicle detection is higher than 96%, independent of environmental conditions. Also, traffic parameters, such as vehicle count, vehicle class, and vehicle speed, are extracted via vehicle tracking method
TL;DR: This paper describes a system for recognition of various human actions from compressed video based on motion history information, and introduces the notion of quantifying the motion involved, through what is called Motion Flow History (MFH).
Abstract: Human motion analysis is a recent topic of interest among the computer vision and video processing community. Research in this area is motivated by its wide range of applications such as surveillance and monitoring systems. In this paper we describe a system for recognition of various human actions from compressed video based on motion history information. We introduce the notion of quantifying the motion involved, through what we call Motion Flow History (MFH). The encoded motion information readily available in the compressed MPEG stream is used to construct the coarse Motion History Image (MHI) and the corresponding MFH. The features extracted from the static MHI and MFH compactly characterize the spatio-temporal and motion vector information of the action. Since the features are extracted from the partially decoded sparse motion data, the computational load is minimized to a great extent. The extracted features are used to train the KNN, Neural network, SVM and the Bayes classifiers for recognizing a set of seven human actions. The performance of each feature set with respect to various classifiers are analyzed. q 2003 Elsevier B.V. All rights reserved.
TL;DR: A novel method of image based fingerprint matching based on the features extracted from the integrated Wavelet and the Fourier–Mellin Transform framework is proposed to remedy problems of minutiae-based and image-based fingerprint authentication.
Abstract: Today, minutiae-based and image-based are the two major approaches for the purpose of fingerprint authentication. Image based approach offers much higher computation efficiency with minimum pre-processing and proves also effective even when the image quality is too low to allow a reliable minutia extraction. However, this approach is vulnerable to shape distortions as well as variation in position, scale and orientation angle. In this paper, a novel method of image based fingerprint matching based on the features extracted from the integrated Wavelet and the Fourier–Mellin Transform (WFMT) framework is proposed to remedy these problems. Wavelet transform, with its energy compacted feature is used to preserve the local edges and reduce noise in the low frequency domain after image decomposition, and hence making the fingerprint images less sensitive to shape distortion. The Fourier–Mellin transform (FMT) served to produce a translation, rotation and scale invariant feature. Multiple WFMT features can be used to form a reference invariant feature through the linearity property of FMT and hence reduce the variability of the input fingerprint images. Based on this integrated framework, a fingerprint verification system is designed. The experiments show the verification accuracy is 5.66 and 1.01% of equal error rate is achieved when multiple WFMT features are used.
TL;DR: In this article, the authors explore using stereo performance on two different images from a single view as a confidence measure for a binocular stereo system incorporating that single view, and explore the performance characteristics of each metric under a variety of conditions.
Abstract: Although stereo vision research has progressed remarkably, stereo systems still need a fast, accurate way to estimate confidence in their output. In the current paper, we explore using stereo performance on two different images from a single view as a confidence measure for a binocular stereo system incorporating that single view. Although it seems counterintuitive to search for correspondence in two different images taken in quick succession from the same view, such a search gives us precise quantitative performance data. Correspondences significantly far from the same location are erroneous because there is little to no displacement between the two images. Using hand-generated ground truth, we quantitatively compare this new confidence metric with five commonly used confidence metrics. We explore the performance characteristics of each metric under a variety of conditions.
TL;DR: A kinematics-based approach to recovering motion parameters of people walking from monocular video sequences using robust image matching and hierarchical search and a hierarchical search strategy according to the tree-like structure of the human body model is proposed.
Abstract: Human tracking is currently one of the most active research topics in computer vision. This paper proposed a kinematics-based approach to recovering motion parameters of people walking from monocular video sequences using robust image matching and hierarchical search. Tracking a human with unconstrained movements in monocular image sequences is extremely challenging. To reduce the search space, we design a hierarchical search strategy in a divide-and-conquer fashion according to the tree-like structure of the human body model. Then a kinematics-based algorithm is proposed to recursively refine the joint angles. To measure the matching error, we present a pose evaluation function combining both boundary and region information. We also address the issue of initialization by matching the first frame to six key poses acquired by clustering and the pose having minimal matching error is chosen as the initial pose. Experimental results in both indoor and outdoor scenes demonstrate that our approach performs well. (C) 2004 Published by Elsevier B.V.
TL;DR: This work shows that the sign of the difference between two pixel measurements is maintained across global illumination changes and uses this result along with a statistical model for the camera noise to develop a change detection algorithm that deals with sudden changes in illumination.
Abstract: Effective change detection under dynamic illumination conditions is an active research topic. Most research has concentrated on adaptive statistical representations for the appearance of the background scene. There is limited work that develops the statistical models for background representation by taking into account an explicit model for the camera response function, the camera noise model, and illumination priors. Assuming a monotone but non-linear camera response function, a Phong shading model for the surface material, and a locally constant but spatially varying illumination, we show that the sign of the difference between two pixel measurements is maintained across global illumination changes. We use this result along with a statistical model for the camera noise to develop a change detection algorithm that deals with sudden changes in illumination. The performance evaluation of the algorithm is done through simulations and on real data.
TL;DR: There is a correspondence between the two common representations of filter outputs—textons and binned histograms and it is shown that two classification methodologies, nearest neighbour matching and Bayesian classification, are equivalent for particular choices of the distance measure.
Abstract: The objective of this paper is to examine statistical approaches to the classification of textured materials from a single image obtained under unknown viewpoint and illumination. The approaches investigated here are based on the joint probability distribution of filter responses. We review previous work based on this formulation and make two observations. First, we show that there is a correspondence between the two common representations of filter outputs—textons and binned histograms. Second, we show that two classification methodologies, nearest neighbour matching and Bayesian classification, are equivalent for particular choices of the distance measure. We describe the pros and cons of these alternative representations and distance measures, and illustrate the discussion by classifying all the materials in the Columbia-Utrecht (CUReT) texture database. These equivalences allow us to perform direct comparisons between the texton frequency matching framework, best exemplified by the classifiers of Leung and Malik [Int. J. Comput. Vis. 43 (2001) 29], Cula and Dana [Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2001) 1041], and Varma and Zisserman [Proceedings of the Seventh European Conference on Computer Vision 3 (2002) 255], and the Bayesian framework most closely represented by the work of Konishi and Yuille [Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2000) 125].
TL;DR: A method for registering pairs of digital images of the retina, using a small set of intrinsic control points whose matching is not known, and bilinear and second-order polynomial transformation models all prove to be appropriate for the final registration transform.
Abstract: A method for registering pairs of digital images of the retina is presented, using a small set of intrinsic control points whose matching is not known. Control point matching is then achieved by calculating similarity transformation (ST) coefficients for all possible combinations of control point pairs. The cluster of coefficients associated with the matched control point pairs is identified by calculating the Euclidean distance between each set of ST coefficients and its Rth nearest neighbour, followed by use of the Expectation ‐ Maximization (EM) algorithm. Registration is then achieved using linear regression to optimize similarity, bilinear or second order polynomial transformations for the matching control point pairs. Results are presented of (a) the cross-modal image registration of an optical image and a fluorescein angiogram, (b) temporal registration of two images of an infant eye, and (c) mono-modal registration of a set of seven standard field optical photographs. For cross-modal registration, using a set of independent matched control points, points are mapped with an estimated accuracy of 2.9 pixels for 575 £ 480 pixel images. Bilinear and second-order polynomial transformation models all prove to be appropriate for the final registration transform. q 2004 Elsevier B.V. All rights reserved.
TL;DR: A frame-rate, low-power, omni-directional tracking system (LOTS), with novel system component details is quasi-connected-components (QCC), which combines gap filling, thresholding-with-hysteresis (TWH), and a novel region merging/cleaning approach.
Abstract: Perimeter security generally requires watching areas that afford trespassers reasonable cover and concealment. By definition, such ‘interesting’ areas have limited visibility distance. Furthermore, targets of interest generally attempt to conceal themselves within the cover, sometimes adding camouflage to further reduce their visibility. Such targets are only visible while in motion. The combined result of limited visibility and target visibility severely reduces the usefulness of any approach using a standard Pan/Tilt/Zoom (PTZ) camera. As a result, these situations call for a very sensitive system with a wide field of view, and are a natural application for Omni-directional Video Surveillance and Monitoring. This paper describes a frame-rate, low-power, omni-directional tracking system (LOTS). The paper discusses related background work including resolution issues in omni-directional imaging. One of the novel system component details is quasi-connected-components (QCC). QCC combines gap filling, thresholding-with-hysteresis (TWH) and a novel region merging/cleaning approach. The multi-background modeling and dynamic thresholding make an ideal approach for difficult situations like outdoor tracking in high clutter. The paper also describes target geolocation and issues in the system user interface. The single viewpoint property of the omni-directional imaging system used simplifies the backprojection and unwarping. We end with a summary of an external evaluation of an early form of the system and comments about recent work and field tests.
TL;DR: A robust and accurate system for 3D reconstruction of real objects with high resolution shape and texture and a texture mapping strategy based on surface particles to adequately address photography related problems such as inhomogeneous lighting, highlights and occlusion are presented.
Abstract: We present a robust and accurate system for 3D reconstruction of real objects with high resolution shape and texture. Our reconstruction method is passive, the only information needed being 2D images obtained with a calibrated camera from different view angles as the object rotates on a turntable. The triangle surface model is obtained by a scheme combining octree construction and marching cubes algorithm, which is adapted to the shape from silhouette problem. We develop a texture mapping strategy based on surface particles to adequately address photography related problems such as inhomogeneous lighting, highlights and occlusion. Reconstruction results are included to demonstrate the attained quality.