Author
Amit K. Roy Chowdhury
Bio: Amit K. Roy Chowdhury is an academic researcher from University of Maryland, College Park. The author has contributed to research in topics: Structure from motion & Motion estimation. The author has an hindex of 10, co-authored 15 publications receiving 690 citations.
Papers
More filters
21 Jul 2003
TL;DR: It is shown that if the person is far enough from the camera, it is possible to synthesize a side view from any other arbitrary view using a single camera using the perspective projection model and the optical flow based structure from motion equations.
Abstract: Human gait is a spatio-temporal phenomenon and typifies the motion characteristics of an individual. The gait of a person is easily recognizable when extracted from a side-view of the person. Accordingly, gait-recognition algorithms work best when presented with images where the person walks parallel to the camera image plane. However, it is not realistic to expect this assumption to be valid in most real-life scenarios. Hence, it is important to develop methods whereby the side-view can be generated from any other arbitrary view in a simple, yet accurate, manner. This is the main theme of the paper. We show that if the person is far enough from the camera, it is possible to synthesize a side view (referred to as canonical view) from any other arbitrary view using a single camera. Two methods are proposed for doing this: (i) using the perspective projection model; (ii) using the optical flow based structure from motion equations. A simple camera calibration scheme for this method is also proposed. Examples of synthesized views are presented. Preliminary testing with gait recognition algorithms gives encouraging results. A by-product of this method is a simple algorithm for synthesizing novel views of a planar scene.
212 citations
23 Aug 2004
TL;DR: The paper poses video-to-video face recognition as a dynamical system identification and classification problem and uses an autoregressive and moving average (ARMA) model to represent such a system.
Abstract: The paper poses video-to-video face recognition as a dynamical system identification and classification problem. We model a moving face as a linear dynamical system whose appearance changes with pose. An autoregressive and moving average (ARMA) model is used to represent such a system. The choice of ARMA model is based on its ability to take care of the change in appearance while modeling the dynamics of pose, expression etc. Recognition is performed using the concept of sub space angles to compute distances between probe and gallery video sequences. The results obtained are very promising given the extent of pose, expression and illumination variation in the video data used for experiments.
129 citations
TL;DR: The main advantage of the proposed 3D reconstruction algorithm over others is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of the generic model and it does so even as the quality of the input video varies.
Abstract: Reconstructing a 3D model of a human face from a monocular video sequence is an important problem in computer vision, with applications to recognition, surveillance, multimedia, etc. However, the quality of 3D reconstructions using structure from motion (SfM) algorithms is often not satisfactory. One of the reasons is the poor quality of the input video data. Hence, it is important that 3D face reconstruction algorithms take into account the statistics representing the quality of the video. Also, because of the structural similarities of most faces, it is natural that the performance of these algorithms can be improved by using a generic model of a face. Most of the existing work using this approach initializes the reconstruction algorithm with this generic model. The problem with this approach is that the algorithm can converge to a solution very close to this initial value, resulting in a reconstruction which resembles the generic model rather than the particular face in the video which needs to be modeled. In this paper, we propose a method of 3D reconstruction of a human face from video in which the 3D reconstruction algorithm and the generic model are handled separately. We show that it is possible to obtain a reasonably good 3D SfM estimate purely from the video sequence, provided the quality of the input video is statistically assessed and incorporated into the algorithm. The final 3D model is obtained after combining the SfM estimate and the generic model using an energy function that corrects for the errors in the estimate by comparing the local regions in the two models. The main advantage of our algorithm over others is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of the generic model and it does so even as the quality of the input video varies. The evolution of the 3D model through the various stages of the algorithm and an analysis of its accuracy are presented.
70 citations
07 Nov 2002
TL;DR: A method of 3D reconstruction of a human face from video in which the3D reconstruction algorithm and the generic model are handled separately and the main advantage of this algorithm is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of theGeneric model.
Abstract: Reconstructing a 3D model of a human face from a video sequence is an important problem in computer vision, with applications to recognition, surveillance, multimedia etc. However, the quality of 3D reconstructions using structure from motion (SfM) algorithms is often not satisfactory. One common method of overcoming this problem is to use a generic model of a face. Existing work using this approach initializes the reconstruction algorithm with this generic model. The problem with this approach is that the algorithm can converge to a solution very close to this initial value, resulting in a reconstruction which resembles the generic model rather than the particular face in the video which needs to be modeled. We propose a method of 3D reconstruction of a human face from video in which the 3D reconstruction algorithm and the generic model are handled separately. A 3D estimate is obtained purely from the video sequence using SfM algorithms without use of the generic model. The final 3D model is obtained after combining the SfM estimate and the generic model using an energy function that corrects for the errors in the estimate by comparing local regions in the two models. The optimization is done using a Markov chain Monte Carlo (MCMC) sampling strategy. The main advantage of our algorithm over others is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of the generic model. The evolution of the 3D model through the various stages of the algorithm is presented.
63 citations
Patent•
21 Aug 2003TL;DR: In this paper, a method of 3D modeling of an object from a video sequence using an SfM algorithm and a generic object model is presented. But the model is not considered in this paper.
Abstract: In a novel method of 3D modeling of an object from a video sequence using an SfM algorithm and a generic object model, the generic model is incorporated after the SfM algorithm generates a 3D estimate of the object model purely and directly from the input video sequence. An optimization framework provides for comparison of the local trends of the 3D estimate and the generic model so that the errors in the 3D estimate are corrected. The 3D estimate is obtained by fusing intermediate 3D reconstructions of pairs of frames of the video sequence after computing the uncertainty of the two frame solutions. The quality of the fusion algorithm is tracked using a rate-distortion function. In order to combine the generic model with the 3D estimate, an energy function minimization procedure is applied to the 3D estimate. The optimization is performed using a Metropolis-Hasting sampling strategy.
54 citations
Cited by
More filters
TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.
Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.
6,384 citations
TL;DR: This survey reviews recent trends in video-based human capture and analysis, as well as discussing open problems for future research to achieve automatic visual analysis of human movement.
Abstract: This survey reviews advances in human motion capture and analysis from 2000 to 2006, following a previous survey of papers up to 2000 [T.B. Moeslund, E. Granum, A survey of computer vision-based human motion capture, Computer Vision and Image Understanding, 81(3) (2001) 231-268.]. Human motion capture continues to be an increasingly active research area in computer vision with over 350 publications over this period. A number of significant research advances are identified together with novel methodologies for automatic initialization, tracking, pose estimation, and movement recognition. Recent research has addressed reliable tracking and pose estimation in natural scenes. Progress has also been made towards automatic understanding of human actions and behavior. This survey reviews recent trends in video-based human capture and analysis, as well as discussing open problems for future research to achieve automatic visual analysis of human movement.
2,738 citations
TL;DR: A novel approach for recognizing DTs is proposed and its simplifications and extensions to facial image analysis are also considered and both the VLBP and LBP-TOP clearly outperformed the earlier approaches.
Abstract: Dynamic texture (DT) is an extension of texture to the temporal domain. Description and recognition of DTs have attracted growing attention. In this paper, a novel approach for recognizing DTs is proposed and its simplifications and extensions to facial image analysis are also considered. First, the textures are modeled with volume local binary patterns (VLBP), which are an extension of the LBP operator widely used in ordinary texture analysis, combining motion and appearance. To make the approach computationally simple and easy to extend, only the co-occurrences of the local binary patterns on three orthogonal planes (LBP-TOP) are then considered. A block-based method is also proposed to deal with specific dynamic events such as facial expressions in which local information and its spatial locations should also be taken into account. In experiments with two DT databases, DynTex and Massachusetts Institute of Technology (MIT), both the VLBP and LBP-TOP clearly outperformed the earlier approaches. The proposed block-based method was evaluated with the Cohn-Kanade facial expression database with excellent results. The advantages of our approach include local processing, robustness to monotonic gray-scale changes, and simple computation
2,653 citations
[...]
TL;DR: In this paper, the authors studied the effect of local derivatives on the detection of intensity edges in images, where the local difference of intensities is computed for each pixel in the image.
Abstract: Most of the signal processing that we will study in this course involves local operations on a signal, namely transforming the signal by applying linear combinations of values in the neighborhood of each sample point. You are familiar with such operations from Calculus, namely, taking derivatives and you are also familiar with this from optics namely blurring a signal. We will be looking at sampled signals only. Let's start with a few basic examples. Local difference Suppose we have a 1D image and we take the local difference of intensities, DI(x) = 1 2 (I(x + 1) − I(x − 1)) which give a discrete approximation to a partial derivative. (We compute this for each x in the image.) What is the effect of such a transformation? One key idea is that such a derivative would be useful for marking positions where the intensity changes. Such a change is called an edge. It is important to detect edges in images because they often mark locations at which object properties change. These can include changes in illumination along a surface due to a shadow boundary, or a material (pigment) change, or a change in depth as when one object ends and another begins. The computational problem of finding intensity edges in images is called edge detection. We could look for positions at which DI(x) has a large negative or positive value. Large positive values indicate an edge that goes from low to high intensity, and large negative values indicate an edge that goes from high to low intensity. Example Suppose the image consists of a single (slightly sloped) edge:
1,829 citations
31 Aug 2011
TL;DR: This highly anticipated new edition provides a comprehensive account of face recognition research and technology, spanning the full range of topics needed for designing operational face recognition systems, as well as offering challenges and future directions.
Abstract: This highly anticipated new edition provides a comprehensive account of face recognition research and technology, spanning the full range of topics needed for designing operational face recognition systems. After a thorough introductory chapter, each of the following chapters focus on a specific topic, reviewing background information, up-to-date techniques, and recent results, as well as offering challenges and future directions. Features: fully updated, revised and expanded, covering the entire spectrum of concepts, methods, and algorithms for automated face detection and recognition systems; provides comprehensive coverage of face detection, tracking, alignment, feature extraction, and recognition technologies, and issues in evaluation, systems, security, and applications; contains numerous step-by-step algorithms; describes a broad range of applications; presents contributions from an international selection of experts; integrates numerous supporting graphs, tables, charts, and performance data.
1,609 citations