Philip F. McLauchlan
Other affiliations: University of California, Berkeley
Bio: Philip F. McLauchlan is an academic researcher from University of Oxford. The author has contributed to research in topics: Machine vision & Active vision. The author has an hindex of 16, co-authored 28 publications receiving 1986 citations. Previous affiliations of Philip F. McLauchlan include University of California, Berkeley.
TL;DR: The issues associated with feature based tracking are described, the real-time implementation of a prototype system, and the performance of the system on a large data set are presented.
Abstract: Increasing congestion on freeways and problems associated with existing detectors have spawned an interest in new vehicle detection technologies such as video image processing. Existing commercial image processing systems work well in free-flowing traffic, but the systems have difficulties with congestion, shadows and lighting transitions. These problems stem from vehicles partially occluding one another and the fact that vehicles appear differently under various lighting conditions. We are developing a feature-based tracking system for detecting vehicles under these challenging conditions. Instead of tracking entire vehicles, vehicle features are tracked to make the system robust to partial occlusion. The system is fully functional under changing lighting conditions because the most salient features at the given moment are tracked. After the features exit the tracking region, they are grouped into discrete vehicles using a common motion constraint. The groups represent individual vehicle trajectories which can be used to measure traditional traffic parameters as well as new metrics suitable for improved automated surveillance. This paper describes the issues associated with feature based tracking, presents the real-time implementation of a prototype system, and the performance of the system on a large data set. ©
••17 Jun 1997
TL;DR: This paper describes the feature-based tracking approach for the task of tracking vehicles under congestion, a real-time implementation using a network of DSP chips, and experiments of the system on approximately 44 lane hours of video data.
Abstract: For the problem of tracking vehicles on freeways using machine vision, existing systems work well in free-flowing traffic. Traffic engineers, however, are more interested in monitoring freeways when there is congestion, and current systems break down for congested traffic due to the problem of partial occlusion. We are developing a feature-based tracking approach for the task of tracking vehicles under congestion. Instead of tracking entire vehicles, vehicle sub-features are tracked to make the system robust to partial occlusion. In order to group together sub-features that come from the same vehicle, the constraint of common motion is used. In this paper we describe the system, a real-time implementation using a network of DSP chips, and experiments of the system on approximately 44 lane hours of video data.
••20 Jun 1995
TL;DR: A statistical framework that enables 3D structure and motion to be computed optimally from an image sequence, on the assumption that feature measurement errors are independent and Gaussian distributed is proposed.
Abstract: The paper proposes a statistical framework that enables 3D structure and motion to be computed optimally from an image sequence, on the assumption that feature measurement errors are independent and Gaussian distributed. The analysis and results demonstrate that computing both camera/scene motion and 3D structure is essential to computing either with any accuracy. Having computed optimal estimates of structure and motion over a small number of initial images, a recursive version of the algorithm (previously reported) recomputes sub optimal estimates given new image data. The algorithm is designed explicitly for real time implementation, and the complexity is proportional to the number of tracked features. 3D projective, affine and Euclidean models of structure and motion recovery have been implemented, incorporating both point and line features into the computation. The framework can handle any feature type and camera model that may be encapsulated as a projection equation from scene to image. >
TL;DR: In this article, the authors describe the design, implementation and testing of a high speed controlled stereo "head/eye" platform which facilitates the rapid redirection of gaze in response to visual input.
Abstract: This paper describes the design, implementation and testing of a high speed controlled stereo “head/eye” platform which facilitates the rapid redirection of gaze in response to visual input. It details the mechanical device, which is based around geared DC motors, and describes hardware aspects of the controller and vision system, which are implemented on a reconfigurable network of general purpose parallel processors. The servo-controller is described in detail and higher level gaze and vision constructs outlined. The paper gives performance figures gained both from mechanical tests on the platform alone, and from closed loop tests on the entire system using visual feedback from a feature detector.
TL;DR: A methodology for, and real-time demonstrations of, the use of motion detection and segmentation processes to initiate “capture saccades” towards a moving object, and demonstrates in repeated trials that the transition from saccadic motion to tracking is more likely to succeed using position and velocity control, than when using position alone.
Abstract: Within the context of active vision, scant attention has been paid to the execution of motion saccades—rapid re-adjustments of the direction of gaze to attend to moving objects. In this paper we first develop a methodology for, and give real-time demonstrations of, the use of motion detection and segmentation processes to initiate “capture saccades” towards a moving object. The saccade is driven by both position and velocity of the moving target under the assumption of constant target velocity, using prediction to overcome the delay introduced by visual processing. We next demonstrate the use of a first order approximation to the segmented motion field to compute bounds on the time-to-contact in the presence of looming motion. If the bound falls below a safe limit, a “panic saccade” is fired, moving the camera away from the approaching object. We then describe the use of image motion to realize smooth pursuit, tracking using velocity information alone, where the camera is moved so as to null a single constant image motion fitted within a central image region. Finally, we glue together capture saccades with smooth pursuit, thus effecting changes in both what is being attended to and how it is being attended to. To couple the different visual activities of waiting, saccading, pursuing and panicking, we use a finite state machine which provides inherent robustness outside of visual processing and provides a means of making repeated exploration. We demonstrate in repeated trials that the transition from saccadic motion to tracking is more likely to succeed using position and velocity control, than when using position alone.
TL;DR: The first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches is presented.
Abstract: We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live augmented reality with a hand-held camera
TL;DR: This paper focuses on motion tracking and shows how one can use observed motion to learn patterns of activity in a site and create a hierarchical binary-tree classification of the representations within a sequence.
Abstract: Our goal is to develop a visual monitoring system that passively observes moving objects in a site and learns patterns of activity from those observations. For extended sites, the system will require multiple cameras. Thus, key elements of the system are motion tracking, camera coordination, activity classification, and event detection. In this paper, we focus on motion tracking and show how one can use observed motion to learn patterns of activity in a site. Motion segmentation is based on an adaptive background subtraction method that models each pixel as a mixture of Gaussians and uses an online approximation to update the model. The Gaussian distributions are then evaluated to determine which are most likely to result from a background process. This yields a stable, real-time outdoor tracker that reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. While a tracking system is unaware of the identity of any object it tracks, the identity remains the same for the entire tracking sequence. Our system leverages this information by accumulating joint co-occurrences of the representations within a sequence. These joint co-occurrence statistics are then used to create a hierarchical binary-tree classification of the representations. This method is useful for classifying sequences, as well as individual instances of activities in a site.
21 Sep 1999
TL;DR: A survey of the theory and methods of photogrammetric bundle adjustment can be found in this article, with a focus on general robust cost functions rather than restricting attention to traditional nonlinear least squares.
Abstract: This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal structure and viewing parameter estimates. Topics covered include: the choice of cost function and robustness; numerical optimization including sparse Newton methods, linearly convergent approximations, updating and recursive methods; gauge (datum) invariance; and quality control. The theory is developed for general robust cost functions rather than restricting attention to traditional nonlinear least squares.
••17 Jun 2006
TL;DR: This paper first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties, then describes the process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduces the evaluation methodology.
Abstract: This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties. We then describe our process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduce our evaluation methodology. Finally, we present the results of our quantitative comparison of state-of-the-art multi-view stereo reconstruction algorithms on six benchmark datasets. The datasets, evaluation details, and instructions for submitting new models are available online at http://vision.middlebury.edu/mview.
••01 Aug 2004
TL;DR: This paper reviews recent developments and general strategies of the processing framework of visual surveillance in dynamic scenes, and analyzes possible research directions, e.g., occlusion handling, a combination of two and three-dimensional tracking, and fusion of information from multiple sensors, and remote surveillance.
Abstract: Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux statistics and congestion analysis, detection of anomalous behaviors, and interactive surveillance using multiple cameras, etc. In general, the processing framework of visual surveillance in dynamic scenes includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, understanding and description of behaviors, human identification, and fusion of data from multiple cameras. We review recent developments and general strategies of all these stages. Finally, we analyze possible research directions, e.g., occlusion handling, a combination of twoand three-dimensional tracking, a combination of motion analysis and biometrics, anomaly detection and behavior prediction, content-based retrieval of surveillance videos, behavior understanding and natural language description, fusion of information from multiple sensors, and remote surveillance.