scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Pattern Analysis and Machine Intelligence in 2000"


Journal ArticleDOI
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We applied this approach to segmenting static images, as well as motion sequences, and found the results to be very encouraging.

13,789 citations


Journal ArticleDOI
ZhenQiu Zhang1
TL;DR: A flexible technique to easily calibrate a camera that only requires the camera to observe a planar pattern shown at a few (at least two) different orientations is proposed and advances 3D computer vision one more step from laboratory environments to real world use.
Abstract: We propose a flexible technique to easily calibrate a camera. It only requires the camera to observe a planar pattern shown at a few (at least two) different orientations. Either the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens distortion is modeled. The proposed procedure consists of a closed-form solution, followed by a nonlinear refinement based on the maximum likelihood criterion. Both computer simulation and real data have been used to test the proposed technique and very good results have been obtained. Compared with classical techniques which use expensive equipment such as two or three orthogonal planes, the proposed technique is easy to use and flexible. It advances 3D computer vision one more step from laboratory environments to real world use.

13,200 citations


Journal ArticleDOI
TL;DR: The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have been receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.

6,527 citations


Journal ArticleDOI
TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.
Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

6,447 citations


Journal ArticleDOI
TL;DR: Two of the most critical requirements in support of producing reliable face-recognition systems are a large database of facial images and a testing procedure to evaluate systems.
Abstract: Two of the most critical requirements in support of producing reliable face-recognition systems are a large database of facial images and a testing procedure to evaluate systems. The Face Recognition Technology (FERET) program has addressed both issues through the FERET database of facial images and the establishment of the FERET tests. To date, 14,126 images from 1,199 individuals are included in the FERET database, which is divided into development and sequestered portions of the database. In September 1996, the FERET program administered the third in a series of FERET face-recognition tests. The primary objectives of the third test were to 1) assess the state of the art, 2) identify future areas of research, and 3) measure algorithm performance.

4,816 citations


Journal ArticleDOI
TL;DR: In this paper, the primary goal of pattern recognition is supervised or unsupervised classification, and the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been used.
Abstract: The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical ap...

4,307 citations


Journal ArticleDOI
TL;DR: A look at progress in the field over the last 20 years is looked at and some of the challenges that remain for the years to come are suggested.
Abstract: The analysis of medical images has been woven into the fabric of the pattern analysis and machine intelligence (PAMI) community since the earliest days of these Transactions. Initially, the efforts in this area were seen as applying pattern analysis and computer vision techniques to another interesting dataset. However, over the last two to three decades, the unique nature of the problems presented within this area of study have led to the development of a new discipline in its own right. Examples of these include: the types of image information that are acquired, the fully three-dimensional image data, the nonrigid nature of object motion and deformation, and the statistical variation of both the underlying normal and abnormal ground truth. In this paper, we look at progress in the field over the last 20 years and suggest some of the challenges that remain for the years to come.

4,249 citations


Journal ArticleDOI
TL;DR: This paper focuses on motion tracking and shows how one can use observed motion to learn patterns of activity in a site and create a hierarchical binary-tree classification of the representations within a sequence.
Abstract: Our goal is to develop a visual monitoring system that passively observes moving objects in a site and learns patterns of activity from those observations. For extended sites, the system will require multiple cameras. Thus, key elements of the system are motion tracking, camera coordination, activity classification, and event detection. In this paper, we focus on motion tracking and show how one can use observed motion to learn patterns of activity in a site. Motion segmentation is based on an adaptive background subtraction method that models each pixel as a mixture of Gaussians and uses an online approximation to update the model. The Gaussian distributions are then evaluated to determine which are most likely to result from a background process. This yields a stable, real-time outdoor tracker that reliably deals with lighting changes, repetitive motions from clutter, and long-term scene changes. While a tracking system is unaware of the identity of any object it tracks, the identity remains the same for the entire tracking sequence. Our system leverages this information by accumulating joint co-occurrences of the representations within a sequence. These joint co-occurrence statistics are then used to create a hierarchical binary-tree classification of the representations. This method is useful for classifying sequences, as well as individual instances of activities in a site.

3,631 citations


Journal ArticleDOI
TL;DR: W/sup 4/ employs a combination of shape analysis and tracking to locate people and their parts and to create models of people's appearance so that they can be tracked through interactions such as occlusions.
Abstract: W/sup 4/ is a real time visual surveillance system for detecting and tracking multiple people and monitoring their activities in an outdoor environment. It operates on monocular gray-scale video imagery, or on video imagery from an infrared camera. W/sup 4/ employs a combination of shape analysis and tracking to locate people and their parts (head, hands, feet, torso) and to create models of people's appearance so that they can be tracked through interactions such as occlusions. It can determine whether a foreground region contains multiple people and can segment the region into its constituent people and track them. W/sup 4/ can also determine whether people are carrying objects, and can segment objects from their silhouettes, and construct appearance models for them so they can be identified in subsequent frames. W/sup 4/ can recognize events between people and objects, such as depositing an object, exchanging bags, or removing an object. It runs at 25 Hz for 320/spl times/240 resolution images on a 400 MHz dual-Pentium II PC.

2,870 citations


Journal ArticleDOI
TL;DR: The nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms are described.
Abstract: Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. Given its ubiquity in human transactions, machine recognition of handwriting has practical significance, as in reading handwritten notes in a PDA, in postal addresses on envelopes, in amounts in bank checks, in handwritten fields in forms, etc. This overview describes the nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms. Both the online case (which pertains to the availability of trajectory data during writing) and the off-line case (which pertains to scanned images) are considered. Algorithms for preprocessing, character and word recognition, and performance with practical systems are indicated. Other fields of application, like signature verification, writer authentification, handwriting learning tools are also considered.

2,653 citations


Journal ArticleDOI
TL;DR: The capability of the human visual system with respect to these problems is discussed, and it is meant to serve as an ultimate goal and a guide for determining recommendations for development of an automatic facial expression analyzer.
Abstract: Humans detect and interpret faces and facial expressions in a scene with little or no effort. Still, development of an automated system that accomplishes this task is rather difficult. There are several related problems: detection of an image segment as a face, extraction of the facial expression information, and classification of the expression (e.g., in emotion categories). A system that performs these operations accurately and in real time would form a big step in achieving a human-like interaction between man and machine. The paper surveys the past work in solving these problems. The capability of the human visual system with respect to these problems is discussed, too. It is meant to serve as an ultimate goal and a guide for determining recommendations for development of an automatic facial expression analyzer.

Journal ArticleDOI
TL;DR: A real-time computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task and demonstrates the ability to use these a priori models to accurately classify real human behaviors and interactions with no additional tuning or training.
Abstract: We describe a real-time computer vision and machine learning system for modeling and recognizing human behaviors in a visual surveillance task. The system deals in particularly with detecting when interactions between people occur and classifying the type of interaction. Examples of interesting interaction behaviors include following another person, altering one's path to meet another, and so forth. Our system combines top-down with bottom-up information in a closed feedback loop, with both components employing a statistical Bayesian approach. We propose and compare two different state-based learning architectures, namely, HMMs and CHMMs for modeling behaviors and interactions. Finally, a synthetic "Alife-style" training system is used to develop flexible prior models for recognizing human interactions. We demonstrate the ability to use these a priori models to accurately classify real human behaviors and interactions with no additional tuning or training.

Journal ArticleDOI
TL;DR: An assessing method of mixture model in a cluster analysis setting with integrated completed likelihood appears to be more robust to violation of some of the mixture model assumptions and it can select a number of dusters leading to a sensible partitioning of the data.
Abstract: We propose an assessing method of mixture model in a cluster analysis setting with integrated completed likelihood. For this purpose, the observed data are assigned to unknown clusters using a maximum a posteriori operator. Then, the integrated completed likelihood (ICL) is approximated using the Bayesian information criterion (BIC). Numerical experiments on simulated and real data of the resulting ICL criterion show that it performs well both for choosing a mixture model and a relevant number of clusters. In particular, ICL appears to be more robust than BIC to violation of some of the mixture model assumptions and it can select a number of dusters leading to a sensible partitioning of the data.

Journal ArticleDOI
TL;DR: A new approach named Hermes is proposed, which exploits aspects from the well-known front propagation algorithms and compares favorably to them, and very promising experimental results are provided using real video sequences.
Abstract: This paper presents a new variational framework for detecting and tracking multiple moving objects in image sequences. Motion detection is performed using a statistical framework for which the observed interframe difference density function is approximated using a mixture model. This model is composed of two components, namely, the static (background) and the mobile (moving objects) one. Both components are zero-mean and obey Laplacian or Gaussian law. This statistical framework is used to provide the motion detection boundaries. Additionally, the original frame is used to provide the moving object boundaries. Then, the detection and the tracking problem are addressed in a common framework that employs a geodesic active contour objective function. This function is minimized using a gradient descent method. A new approach named Hermes is proposed, which exploits aspects from the well-known front propagation algorithms and compares favorably to them. Very promising experimental results are provided using real video sequences.

Journal ArticleDOI
TL;DR: A calibration procedure for precise 3D computer vision applications is described that introduces bias correction for circular control points and a nonrecursive method for reversing the distortion model and indicates improvements in the calibration results in limited error conditions.
Abstract: Modern CCD cameras are usually capable of a spatial accuracy greater than 1/50 of the pixel size. However, such accuracy is not easily attained due to various error sources that can affect the image formation process. Current calibration methods typically assume that the observations are unbiased, the only error is the zero-mean independent and identically distributed random noise in the observed image coordinates, and the camera model completely explains the mapping between the 3D coordinates and the image coordinates. In general, these conditions are not met, causing the calibration results to be less accurate than expected. In the paper, a calibration procedure for precise 3D computer vision applications is described. It introduces bias correction for circular control points and a nonrecursive method for reversing the distortion model. The accuracy analysis is presented and the error sources that can reduce the theoretical accuracy are discussed. The tests with synthetic images indicate improvements in the calibration results in limited error conditions. In real images, the suppression of external error sources becomes a prerequisite for successful calibration.

Journal ArticleDOI
TL;DR: It is shown that the pose estimation problem can be formulated as that of minimizing an error metric based on collinearity in object (as opposed to image) space, and an iterative algorithm which directly computes orthogonal rotation matrices and which is globally convergent is derived.
Abstract: Determining the rigid transformation relating 2D images to known 3D geometry is a classical problem in photogrammetry and computer vision. Heretofore, the best methods for solving the problem have relied on iterative optimization methods which cannot be proven to converge and/or which do not effectively account for the orthonormal structure of rotation matrices. We show that the pose estimation problem can be formulated as that of minimizing an error metric based on collinearity in object (as opposed to image) space. Using object space collinearity error, we derive an iterative algorithm which directly computes orthogonal rotation matrices and which is globally convergent. Experimentally, we show that the method is computationally efficient, that it is no less accurate than the best currently employed optimization methods, and that it outperforms all tested methods in robustness to outliers.

Journal ArticleDOI
TL;DR: New techniques to detect and analyze periodic motion as seen from both a static and a moving camera are described and the periodicity is analyzed robustly using the 2D lattice structures inherent in similarity matrices.
Abstract: We describe new techniques to detect and analyze periodic motion as seen from both a static and a moving camera. By tracking objects of interest, we compute an object's self-similarity as it evolves in time. For periodic motion, the self-similarity measure is also periodic and we apply time-frequency analysis to detect and characterize the periodic motion. The periodicity is also analyzed robustly using the 2D lattice structures inherent in similarity matrices. A real-time system has been implemented to track and classify objects using periodicity. Examples of object classification (people, running dogs, vehicles), person counting, and nonstationary periodicity are provided.

Journal ArticleDOI
TL;DR: A probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents and how the system correctly interprets activities of multiple interacting objects is demonstrated.
Abstract: This paper describes a probabilistic syntactic approach to the detection and recognition of temporally extended activities and interactions between multiple agents. The fundamental idea is to divide the recognition problem into two levels. The lower level detections are performed using standard independent probabilistic event detectors to propose candidate detections of low-level features. The outputs of these detectors provide the input stream for a stochastic context-free grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low-level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. We develop a real-time system and demonstrate the approach in several experiments on gesture recognition and in video surveillance. In the surveillance application, we show how the system correctly interprets activities of multiple interacting objects.

Journal ArticleDOI
TL;DR: In this paper, the head is modeled as a texture mapped cylinder and tracking is formulated as an image registration problem in the cylinder's texture map image, which is solved by regularized weighted least squares error minimization.
Abstract: A technique for 3D head tracking under varying illumination is proposed. The head is modeled as a texture mapped cylinder. Tracking is formulated as an image registration problem in the cylinder's texture map image. The resulting dynamic texture map provides a stabilized view of the face that can be used as input to many existing 2D techniques for face recognition, facial expressions analysis, lip reading, and eye tracking. To solve the registration problem with lighting variation and head motion, the residual registration error is modeled as a linear combination of texture warping templates and orthogonal illumination templates. Fast stable online tracking is achieved via regularized weighted least-squares error minimization. The regularization tends to limit potential ambiguities that arise in the warping and illumination templates. It enables stable tracking over extended sequences. Tracking does not require a precise initial model fit; the system is initialized automatically using a simple 2D face detector. It is assumed that the target is facing the camera in the first frame. The formulation uses texture mapping hardware. The nonoptimized implementation runs at about 15 frames per second on a SGI O2 graphic workstation. Extensive experiments evaluating the effectiveness of the formulation are reported. The sensitivity of the technique to illumination, regularization parameters, errors in the initial positioning, and internal camera parameters are analyzed. Examples and applications of tracking are reported.

Journal ArticleDOI
TL;DR: This paper investigates and develops a methodology that serves to automatically identify a subset of aROIs (algorithmically detected ROIs) using different image processing algorithms (IPAs), and appropriate clustering procedures, and compares hROIs with hROI as a criterion for evaluating and selecting bottom-up, context-free algorithms.
Abstract: Many machine vision applications, such as compression, pictorial database querying, and image understanding, often need to analyze in detail only a representative subset of the image, which may be arranged into sequences of loci called regions-of-interest (ROIs). We have investigated and developed a methodology that serves to automatically identify such a subset of aROIs (algorithmically detected ROIs) using different image processing algorithms (IPAs), and appropriate clustering procedures. In human perception, an internal representation directs top-down, context-dependent sequences of eye movements to fixate on similar sequences of hROIs (human identified ROIs). In the paper, we introduce our methodology and we compare aROIs with hROIs as a criterion for evaluating and selecting bottom-up, context-free algorithms. An application is finally discussed.

Journal ArticleDOI
TL;DR: Presents a stereo algorithm for obtaining disparity maps with occlusion explicitly detected, and presents the processing results from synthetic and real image pairs, including ones with ground-truth values for quantitative comparison with other methods.
Abstract: Presents a stereo algorithm for obtaining disparity maps with occlusion explicitly detected. To produce smooth and detailed disparity maps, two assumptions that were originally proposed by Marr and Poggio (1976, 1979) are adopted: uniqueness and continuity. That is, the disparity maps have a unique value per pixel and are continuous almost everywhere. These assumptions are enforced within a three-dimensional array of match values in disparity space. Each match value corresponds to a pixel in an image and a disparity relative to another image. An iterative algorithm updates the match values by diffusing support among neighboring values and inhibiting others along similar lines of sight. By applying the uniqueness assumption, occluded regions can be explicitly identified. To demonstrate the effectiveness of the algorithm, we present the processing results from synthetic and real image pairs, including ones with ground-truth values for quantitative comparison with other methods.

Journal ArticleDOI
TL;DR: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.
Abstract: The contributions to document image analysis of 99 papers published in the IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) are clustered, summarized, interpolated, interpreted, and evaluated.

Journal ArticleDOI
TL;DR: This work applied a cognitively motivated similarity measure to shape matching of object contours in various image databases and compared it to well-known approaches in the literature, justifying that the shape matching procedure gives an intuitive shape correspondence and is stable with respect to noise distortions.
Abstract: A cognitively motivated similarity measure is presented and its properties are analyzed with respect to retrieval of similar objects in image databases of silhouettes of 2D objects. To reduce influence of digitization noise, as well as segmentation errors, the shapes are simplified by a novel process of digital curve evolution. To compute our similarity measure, we first establish the best possible correspondence of visual parts (without explicitly computing the visual parts). Then, the similarity between corresponding parts is computed and aggregated. We applied our similarity measure to shape matching of object contours in various image databases and compared it to well-known approaches in the literature. The experimental results justify that our shape matching procedure gives an intuitive shape correspondence and is stable with respect to noise distortions.

Journal ArticleDOI
TL;DR: Experimental results, up to a 97 percent rate of success in classification, will show the possibility of using this biometric system in medium/high security environments with full acceptance from all users.
Abstract: A work in defining and implementing a biometric system based on hand geometry identification is presented here. Hand features are extracted from a color photograph taken when the user has placed his hand on a platform designed for such a task. Different pattern recognition techniques have been tested to be used in classification and/or verification from Euclidean distance to neural networks. Experimental results, up to a 97 percent rate of success in classification, will show the possibility of using this system in medium/high security environments with full acceptance from all users.

Journal ArticleDOI
TL;DR: The 11 papers in this special section illustrate topics and techniques at the forefront of video surveillance research, touching on many of the core topics of computer vision, pattern analysis, and aritificial intelligence.
Abstract: UTOMATED video surveillance addresses real-time observation of people and vehicles within a busy environment, leading to a description of their actions and interactions. The technical issues include moving object detection and tracking, object classification, human motion analysis, and activity understanding, touching on many of the core topics of computer vision, pattern analysis, and aritificial intelligence. Video surveillance has spawned large research projects in the United States, Europe, and Japan, and has been the topic of several international conferences and workshops in recent years. There are immediate needs for automated surveillance systems in commercial, law enforcement, and military applications. Mounting video cameras is cheap, but finding available human resources to observe the output is expensive. Although surveillance cameras are already prevalent in banks, stores, and parking lots, video data currently is used only “after the fact” as a forensic tool, thus losing its primary benefit as an active, real-time medium. What is needed is continuous 24-hour monitoring of surveillance video to alert security officers to a burglary in progress or to a suspicious individual loitering in the parking lot, while there is still time to prevent the crime. In addition to the obvious security applications, video surveillance technology has been proposed to measure traffic flow, detect accidents on highways, monitor pedestrian congestion in public spaces, compile consumer demographics in shopping malls and amusement parks, log routine maintainence tasks at nuclear facilities, and count endangered species. The numerous military applications include patrolling national borders, measuring the flow of refugees in troubled areas, monitoring peace treaties, and providing secure perimeters around bases and embassies. The 11 papers in this special section illustrate topics and techniques at the forefront of video surveillance research. These papers can be loosely organized into three categories. Detection and tracking involves real-time extraction of moving objects from video and continuous tracking over time to form persistent object trajectories. C. Stauffer and W.E.L. Grimson introduce unsupervised statistical learning techniques to cluster object trajectories produced by adaptive background subtraction into descriptions of normal scene activity. Viewpoint-specific trajectory descriptions from multiple cameras are combined into a common scene coordinate system using a calibration technique described by L. Lee, R. Romano, and G. Stein, who automatically determine the relative exterior orientation of overlapping camera views by observing a sparse set of moving objects on flat terrain. Two papers address the accumulation of noisy motion evidence over time. R. Pless, T. Brodský, and Y. Aloimonos detect and track small objects in aerial video sequences by first compensating for the self-motion of the aircraft, then accumulating residual normal flow to acquire evidence of independent object motion. L. Wixson notes that motion in the image does not always signify purposeful travel by an independently moving object (examples of such “motion clutter” are wind-blown tree branches and sun reflections off rippling water) and devises a flow-based salience measure to highlight objects that tend to move in a consistent direction over time. Human motion analysis is concerned with detecting periodic motion signifying a human gait and acquiring descriptions of human body pose over time. R. Cutler and L.S. Davis plot an object’s self-similarity across all pairs of frames to form distinctive patterns that classify bipedal, quadripedal, and rigid object motion. Y. Ricquebourg and P. Bouthemy track apparent contours in XT slices of an XYT sequence volume to robustly delineate and track articulated human body structure. I. Haritaoglu, D. Harwood, and L.S. Davis present W4, a surveillance system specialized to the task of looking at people. The W4 system can locate people and segment their body parts, build simple appearance models for tracking, disambiguate between and separately track multiple individuals in a group, and detect carried objects such as boxes and backpacks. Activity analysis deals with parsing temporal sequences of object observations to produce high-level descriptions of agent actions and multiagent interactions. In our opinion, this will be the most important area of future research in video surveillance. N.M. Oliver, B. Rosario, and A.P. Pentland introduce Coupled Hidden Markov Models (CHMMs) to detect and classify interactions consisting of two interleaved agent action streams and present a training method based on synthetic agents to address the problem of parameter estimation from limited real-world training examples. M. Brand and V. Kettnaker present an entropyminimization approach to estimating HMM topology and

Journal ArticleDOI
TL;DR: The paper examines the mathematical tools that have proven successful, provides a taxonomy of the problem domain, and then examines the state of the art: person identification, surveillance/monitoring, 3D methods, and smart rooms/perceptual user interfaces.
Abstract: The research topic of looking at people, that is, giving machines the ability to detect, track, and identify people and more generally, to interpret human behavior, has become a central topic in machine vision research. Initially thought to be the research problem that would be hardest to solve, it has proven remarkably tractable and has even spawned several thriving commercial enterprises. The principle driving application for this technology is "fourth generation" embedded computing: "smart" environments and portable or wearable devices. The key technical goals are to determine the computer's context with respect to nearby humans (e.g., who, what, when, where, and why) so that the computer can act or respond appropriately without detailed instructions. The paper examines the mathematical tools that have proven successful, provides a taxonomy of the problem domain, and then examines the state of the art. Four areas receive particular attention: person identification, surveillance/monitoring, 3D methods, and smart rooms/perceptual user interfaces. Finally, the paper discusses some of the research challenges and opportunities.

Journal ArticleDOI
TL;DR: In this paper, a planar alignment matrix is used to align the scene's ground plane across multiple views and decompose the alignment matrix to recover the 3D relative camera and ground plane positions.
Abstract: Monitoring of large sites requires coordination between multiple cameras, which in turn requires methods for relating events between distributed cameras. This paper tackles the problem of automatic external calibration of multiple cameras in an extended scene, that is, full recovery of their 3D relative positions and orientations. Because the cameras are placed far apart, brightness or proximity constraints cannot be used to match static features, so we instead apply planar geometric constraints to moving objects tracked throughout the scene. By robustly matching and fitting tracked objects to a planar model, we align the scene's ground plane across multiple views and decompose the planar alignment matrix to recover the 3D relative camera and ground plane positions. We demonstrate this technique in both a controlled lab setting where we test the effects of errors in the intrinsic camera parameters, and in an uncontrolled, outdoor setting. In the latter, we do not assume synchronized cameras and we show that enforcing geometric constraints enables us to align the tracking data in time. In spite of noise in the intrinsic camera parameters and in the image data, the system successfully transforms multiple views of the scene's ground plane to an overhead view and recovers the relative 3D camera and ground plane positions.

Journal ArticleDOI
TL;DR: An effective fingerprint verification system is presented, which assumes that an existing reference fingerprint image must validate the identity of a person by means of a test fingerprint image acquired online and in real-time using minutiae matching.
Abstract: An effective fingerprint verification system is presented. It assumes that an existing reference fingerprint image must validate the identity of a person by means of a test fingerprint image acquired online and in real-time using minutiae matching. The matching system consists of two main blocks: The first allows for the extraction of essential information from the reference image off-line, the second performs the matching itself online. The information is obtained from the reference image by filtering and careful minutiae extraction procedures. The fingerprint identification is based on triangular matching to cope with the strong deformation of fingerprint images due to static friction or finger rolling. The matching is finally validated by dynamic time warping. Results reported on the NIST Special Database 4 reference set, featuring 85 percent correct verification (15 percent false negative) and 0.05 percent false positive, demonstrate the effectiveness of the verification technique.

Journal ArticleDOI
TL;DR: In this article, Hidden Markov Models (HMMs) are used to organize observed activity into meaningful states by minimizing the entropy of the joint distribution of the HMMs' internal state machine.
Abstract: Hidden Markov models (HMMs) have become the workhorses of the monitoring and event recognition literature because they bring to time-series analysis the utility of density estimation and the convenience of dynamic time warping. Once trained, the internals of these models are considered opaque; there is no effort to interpret the hidden states. We show that by minimizing the entropy of the joint distribution, an HMM's internal state machine can be made to organize observed activity into meaningful states. This has uses in video monitoring and annotation, low bit-rate coding of scene activity, and detection of anomalous behavior. We demonstrate with models of office activity and outdoor traffic, showing how the framework learns principal modes of activity and patterns of activity change. We then show how this framework can be adapted to infer hidden state from extremely ambiguous images, in particular, inferring 3D body orientation and pose from sequences of low-resolution silhouettes.

Journal ArticleDOI
TL;DR: This work defines principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution, making it possible to theoretically analyze principal curve learning from training data and it also leads to a new practical construction.
Abstract: Principal curves have been defined as "self-consistent" smooth curves which pass through the "middle" of a d-dimensional probability distribution or data cloud. They give a summary of the data and also serve as an efficient feature extraction tool. We take a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition makes it possible to theoretically analyze principal curve learning from training data and it also leads to a new practical construction. Our theoretical learning scheme chooses a curve from a class of polygonal lines with k segments and with a given total length to minimize the average squared distance over n training points drawn independently. Convergence properties of this learning scheme are analyzed and a practical version of this theoretical algorithm is implemented. In each iteration of the algorithm, a new vertex is added to the polygonal line and the positions of the vertices are updated so that they minimize a penalized squared distance criterion. Simulation results demonstrate that the new algorithm compares favorably with previous methods, both in terms of performance and computational complexity, and is more robust to varying data models.