scispace - formally typeset
Search or ask a question

Showing papers on "Object detection published in 2007"


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A simple method for the visual saliency detection is presented, independent of features, categories, or other forms of prior knowledge of the objects, and a fast method to construct the corresponding saliency map in spatial domain is proposed.
Abstract: The ability of human visual system to detect visual saliency is extraordinarily fast and reliable. However, computational modeling of this basic intelligent behavior still remains a challenge. This paper presents a simple method for the visual saliency detection. Our model is independent of features, categories, or other forms of prior knowledge of the objects. By analyzing the log-spectrum of an input image, we extract the spectral residual of an image in spectral domain, and propose a fast method to construct the corresponding saliency map in spatial domain. We test this model on both natural pictures and artificial images such as psychological patterns. The result indicate fast and robust saliency detection of our method.

3,464 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: An approach for measuring similarity between visual entities (images or videos) based on matching internal self-similarities, measured densely throughout the image/video, at multiple scales, while accounting for local and global geometric distortions is presented.
Abstract: We present an approach for measuring similarity between visual entities (images or videos) based on matching internal self-similarities. What is correlated across images (or across video sequences) is the internal layout of local self-similarities (up to some distortions), even though the patterns generating those local self-similarities are quite different in each of the images/videos. These internal self-similarities are efficiently captured by a compact local "self-similarity descriptor"', measured densely throughout the image/video, at multiple scales, while accounting for local and global geometric distortions. This gives rise to matching capabilities of complex visual data, including detection of objects in real cluttered images using only rough hand-sketches, handling textured objects with no clear boundaries, and detecting complex actions in cluttered video data with no prior learning. We compare our measure to commonly used image-based and video-based similarity measures, and demonstrate its applicability to object detection, retrieval, and action detection.

1,162 citations


Proceedings ArticleDOI
Tie Liu, Jian Sun, Nanning Zheng, Xiaoou Tang1, Heung-Yeung Shum1 
17 Jun 2007
TL;DR: A set of novel features including multi-scale contrast, center-surround histogram, and color spatial distribution are proposed to describe a salient object locally, regionally, and globally for salient object detection.
Abstract: We study visual attention by detecting a salient object in an input image. We formulate salient object detection as an image segmentation problem, where we separate the salient object from the image background. We propose a set of novel features including multi-scale contrast, center-surround histogram, and color spatial distribution to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. We also constructed a large image database containing tens of thousands of carefully labeled images by multiple users. To our knowledge, it is the first large image database for quantitative evaluation of visual attention algorithms. We validate our approach on this image database, which is public available with this paper.

1,010 citations


Journal ArticleDOI
TL;DR: A multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views) and considerably reduce the computational cost of multiclass object detection.
Abstract: We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (runtime) computational complexity and the (training-time) sample complexity scale linearly with the number of classes to be detected. We present a multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required and, therefore, the runtime cost of the classifier, is observed to scale approximately logarithmically with the number of classes. The features selected by joint training are generic edge-like features, whereas the features chosen by training each class separately tend to be more object-specific. The generic features generalize better and considerably reduce the computational cost of multiclass object detection

812 citations


Journal ArticleDOI
TL;DR: An automatic road-sign detection and recognition system based on support vector machines that is able to detect and recognize circular, rectangular, triangular, and octagonal signs and, hence, covers all existing Spanish traffic-sign shapes.
Abstract: This paper presents an automatic road-sign detection and recognition system based on support vector machines (SVMs). In automatic traffic-sign maintenance and in a visual driver-assistance system, road-sign detection and recognition are two of the most important functions. Our system is able to detect and recognize circular, rectangular, triangular, and octagonal signs and, hence, covers all existing Spanish traffic-sign shapes. Road signs provide drivers important information and help them to drive more safely and more easily by guiding and warning them and thus regulating their actions. The proposed recognition system is based on the generalization properties of SVMs. The system consists of three stages: 1) segmentation according to the color of the pixel; 2) traffic-sign detection by shape classification using linear SVMs; and 3) content recognition based on Gaussian-kernel SVMs. Because of the used segmentation stage by red, blue, yellow, white, or combinations of these colors, all traffic signs can be detected, and some of them can be detected by several colors. Results show a high success rate and a very low amount of false positives in the final recognition stage. From these results, we can conclude that the proposed algorithm is invariant to translation, rotation, scale, and, in many situations, even to partial occlusions

687 citations


Journal ArticleDOI
TL;DR: Based on the SLAM with DATMO framework, practical algorithms are proposed which deal with issues of perception modeling, data association, and moving object detection.
Abstract: Simultaneous localization, mapping and moving object tracking (SLAMMOT) involves both simultaneous localization and mapping (SLAM) in dynamic environments and detecting and tracking these dynamic objects. In this paper, a mathematical framework is established to integrate SLAM and moving object tracking. Two solutions are described: SLAM with generalized objects, and SLAM with detection and tracking of moving objects (DATMO). SLAM with generalized objects calculates a joint posterior over all generalized objects and the robot. Such an approach is similar to existing SLAM algorithms, but with additional structure to allow for motion modeling of generalized objects. Unfortunately, it is computationally demanding and generally infeasible. SLAM with DATMO decomposes the estimation problem into two separate estimators. By maintaining separate posteriors for stationary objects and moving objects, the resulting estimation problems are much lower dimensional than SLAM with generalized objects. Both SLAM and moving object tracking from a moving vehicle in crowded urban areas are daunting tasks. Based on the SLAM with DATMO framework, practical algorithms are proposed which deal with issues of perception modeling, data association, and moving object detection. The implementation of SLAM with DATMO was demonstrated using data collected from the CMU Navlab11 vehicle at high speeds in crowded urban environments. Ample experimental results shows the feasibility of the proposed theory and algorithms.

662 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: A real-time liveness detection approach against photograph spoofing in face recognition, by recognizing spontaneous eyeblinks, which is a non-intrusive manner, which outperforms the cascaded Adaboost and HMM in task of eyeblink detection.
Abstract: We present a real-time liveness detection approach against photograph spoofing in face recognition, by recognizing spontaneous eyeblinks, which is a non-intrusive manner. The approach requires no extra hardware except for a generic webcamera. Eyeblink sequences often have a complex underlying structure. We formulate blink detection as inference in an undirected conditional graphical framework, and are able to learn a compact and efficient observation and transition potentials from data. For purpose of quick and accurate recognition of the blink behavior, eye closity, an easily-computed discriminative measure derived from the adaptive boosting algorithm, is developed, and then smoothly embedded into the conditional model. An extensive set of experiments are presented to show effectiveness of our approach and how it outperforms the cascaded Adaboost and HMM in task of eyeblink detection.

611 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work proposes a novel iterative approach that first infers scene geometry using belief propagation and then resolves interactions between objects using a global optimization procedure, which leads to a robust solution in few iterations, while allowing object detection to benefit from geometry estimation and vice versa.
Abstract: In this paper, we address the challenging problem of simultaneous pedestrian detection and ground-plane estimation from video while walking through a busy pedestrian zone. Our proposed system integrates robust stereo depth cues, ground-plane estimation, and appearance-based object detection in a principled fashion using a graphical model. Object-object occlusions lead to complex interactions in this model that make an exact solution computationally intractable. We therefore propose a novel iterative approach that first infers scene geometry using belief propagation and then resolves interactions between objects using a global optimization procedure. This approach leads to a robust solution in few iterations, while allowing object detection to benefit from geometry estimation and vice versa. We quantitatively evaluate the performance of our proposed approach on several challenging test sequences showing strolls through busy shopping streets. Comparisons to various baseline systems show that it outperforms both a system using no scene geometry and one just relying on structure-from-motion without dense stereo.

575 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A novel approach for classifying points lying on a Riemannian manifold by incorporating the a priori information about the geometry of the space.
Abstract: We present a new algorithm to detect humans in still images utilizing covariance matrices as object descriptors. Since these descriptors do not lie on a vector space, well known machine learning techniques are not adequate to learn the classifiers. The space of d-dimensional nonsingular covariance matrices can be represented as a connected Riemannian manifold. We present a novel approach for classifying points lying on a Riemannian manifold by incorporating the a priori information about the geometry of the space. The algorithm is tested on INRIA human database where superior detection rates are observed over the previous approaches.

540 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper shows that formulating the problem in a Naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust, and it scales well to handle large number of classes.
Abstract: While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a Naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust. Furthermore, it scales well to handle large number of classes. To recognize the patches surrounding keypoints, our classifier uses hundreds of simple binary features and models class posterior probabilities. We make the problem computationally tractable by assuming independence between arbitrary sets of features. Even though this is not strictly true, we demonstrate that our classifier nevertheless performs remarkably well on image datasets containing very significant perspective changes.

519 citations


Journal ArticleDOI
TL;DR: An integrated local surface descriptor for surface representation and object recognition is introduced and, in order to speed up the search process and deal with a large set of objects, model local surface patches are indexed into a hash table.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work proposes a technique for event recognition in crowded videos that reliably identifies actions in the presence of partial occlusion and background clutter, enabling robustness against occlusions and actor variability.
Abstract: Real-world actions occur often in crowded, dynamic environments. This poses a difficult challenge for current approaches to video event detection because it is difficult to segment the actor from the background due to distracting motion from other objects in the scene. We propose a technique for event recognition in crowded videos that reliably identifies actions in the presence of partial occlusion and background clutter. Our approach is based on three key ideas: (1) we efficiently match the volumetric representation of an event against oversegmented spatio-temporal video volumes; (2) we augment our shape-based features using flow; (3) rather than treating an event template as an atomic entity, we separately match by parts (both in space and time), enabling robustness against occlusions and actor variability. Our experiments on human actions, such as picking up a dropped object or waving in a crowd show reliable detection with few false positives.

Proceedings ArticleDOI
10 Apr 2007
TL;DR: This paper proposes an approach that utilizes a supervised learning technique to create a classifier that facilitates the detection of people in two dimensional range scans and applies AdaBoost to train a strong classifier from simple features of groups of neighboring beams corresponding to legs in range data.
Abstract: This paper addresses the problem of detecting people in two dimensional range scans. Previous approaches have mostly used pre-defined features for the detection and tracking of people. We propose an approach that utilizes a supervised learning technique to create a classifier that facilitates the detection of people. In particular, our approach applies AdaBoost to train a strong classifier from simple features of groups of neighboring beams corresponding to legs in range data. Experimental results carried out with laser range data illustrate the robustness of our approach even in cluttered office environments

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A new hierarchical representation for two-dimensional objects that captures shape information at multiple levels of resolution is described, based on a hierarchical description of an object's boundary, which leads to richer geometric models and more accurate recognition results.
Abstract: We describe a new hierarchical representation for two-dimensional objects that captures shape information at multiple levels of resolution. This representation is based on a hierarchical description of an object's boundary and can be used in an elastic matching framework, both for comparing pairs of objects and for detecting objects in cluttered images. In contrast to classical elastic models, our representation explicitly captures global shape information. This leads to richer geometric models and more accurate recognition results. Our experiments demonstrate classification results that are significantly better than the current state-of-the-art in several shape datasets. We also show initial experiments in matching shapes to cluttered images.

Journal ArticleDOI
TL;DR: A new background-subtraction technique fusing contours from thermal and visible imagery for persistent object detection in urban settings is presented, evaluated quantitatively and compared with other low- and high-level fusion techniques using manually segmented data.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A linear programming relaxation scheme for the class of multiple object tracking problems where the inter-object interaction metric is convex and the intra-object term quantifying object state continuity may use any metric is found to be able to find the global optimum with high probability.
Abstract: We propose a linear programming relaxation scheme for the class of multiple object tracking problems where the inter-object interaction metric is convex and the intra-object term quantifying object state continuity may use any metric. The proposed scheme models object tracking as a multi-path searching problem. It explicitly models track interaction, such as object spatial layout consistency or mutual occlusion, and optimizes multiple object tracks simultaneously. The proposed scheme does not rely on track initialization and complex heuristics. It has much less average complexity than previous efficient exhaustive search methods such as extended dynamic programming and is found to be able to find the global optimum with high probability. We have successfully applied the proposed method to multiple object tracking in video streams.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: The semantics of image labels are used to integrate prior knowledge about inter-class relationships into the visual appearance learning and to build and train a semantic hierarchy of discriminative classifiers and how to use it to perform object detection.
Abstract: In this paper we propose to use lexical semantic networks to extend the state-of-the-art object recognition techniques. We use the semantics of image labels to integrate prior knowledge about inter-class relationships into the visual appearance learning. We show how to build and train a semantic hierarchy of discriminative classifiers and how to use it to perform object detection. We evaluate how our approach influences the classification accuracy and speed on the Pascal VOC challenge 2006 dataset, a set of challenging real-world images. We also demonstrate additional features that become available to object recognition due to the extension with semantic inference tools- we can classify high-level categories, such as animals, and we can train part detectors, for example a window detector, by pure inference in the semantic network.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: It is demonstrated that it is possible to automatically learn object models from video of household activities and employ these models for activity recognition, without requiring any explicit human labeling.
Abstract: We propose an approach to activity recognition based on detecting and analyzing the sequence of objects that are being manipulated by the user. In domains such as cooking, where many activities involve similar actions, object-use information can be a valuable cue. In order for this approach to scale to many activities and objects, however, it is necessary to minimize the amount of human-labeled data that is required for modeling. We describe a method for automatically acquiring object models from video without any explicit human supervision. Our approach leverages sparse and noisy readings from RFID tagged objects, along with common-sense knowledge about which objects are likely to be used during a given activity, to bootstrap the learning process. We present a dynamic Bayesian network model which combines RFID and video data to jointly infer the most likely activity and object labels. We demonstrate that our approach can achieve activity recognition rates of more than 80% on a real-world dataset consisting of 16 household activities involving 33 objects with significant background clutter. We show that the combination of visual object recognition with RFID data is significantly more effective than the RFID sensor alone. Our work demonstrates that it is possible to automatically learn object models from video of household activities and employ these models for activity recognition, without requiring any explicit human labeling.

Journal ArticleDOI
TL;DR: A unified shot boundary detection system based on graph partition model is presented and it is shown that the proposed approach is among the best in the evaluation of TRECVID 2005.
Abstract: This paper conducts a formal study of the shot boundary detection problem. First, a general formal framework of shot boundary detection techniques is proposed. Three critical techniques, i.e., the representation of visual content, the construction of continuity signal and the classification of continuity values, are identified and formulated in the perspective of pattern recognition. Meanwhile, the major challenges to the framework are identified. Second, a comprehensive review of the existing approaches is conducted. The representative approaches are categorized and compared according to their roles in the formal framework. Based on the comparison of the existing approaches, optimal criteria for each module of the framework are discussed, which will provide practical guide for developing novel methods. Third, with all the above issues considered, we present a unified shot boundary detection system based on graph partition model. Extensive experiments are carried out on the platform of TRECVID. The experiments not only verify the optimal criteria discussed above, but also show that the proposed approach is among the best in the evaluation of TRECVID 2005. Finally, we conclude the paper and present some further discussions on what shot boundary detection can learn from other related fields

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A system that integrates fully automatic scene geometry estimation, 2D object detection, 3D localization, trajectory estimation, and tracking for dynamic scene interpretation from a moving vehicle and demonstrates the performance of this integrated system on challenging real-world data showing car passages through crowded city areas.
Abstract: In this paper, we present a system that integrates fully automatic scene geometry estimation, 2D object detection, 3D localization, trajectory estimation, and tracking for dynamic scene interpretation from a moving vehicle. Our sole input are two video streams from a calibrated stereo rig on top of a car. From these streams, we estimate structure-from-motion (SfM) and scene geometry in real-time. In parallel, we perform multi-view/multi-category object recognition to detect cars and pedestrians in both camera images. Using the SfM self-localization, 2D object detections are converted to 3D observations, which are accumulated in a world coordinate frame. A subsequent tracking module analyzes the resulting 3D observations to find physically plausible spacetime trajectories. Finally, a global optimization criterion takes object-object interactions into account to arrive at accurate 3D localization and trajectory estimates for both cars and pedestrians. We demonstrate the performance of our integrated system on challenging real-world data showing car passages through crowded city areas.

Proceedings ArticleDOI
21 May 2007
TL;DR: A new method to detect falls, which are one of the greatest risk for seniors living alone, is proposed, based on a combination of motion history and human shape variation.
Abstract: Nowadays, Western countries have to face the growing population of seniors. New technologies can help people stay at home by providing a secure environment and improving their quality of life. The use of computer vision systems offers a new promising solution to analyze people behavior and detect some unusual events. In this paper, we propose a new method to detect falls, which are one of the greatest risk for seniors living alone. Our approach is based on a combination of motion history and human shape variation. Our algorithm provides promising results on video sequences of daily activities and simulated falls.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: A novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem, formulated in a hypothesis selection framework and builds upon a state-of-the-art pedestrian detector.
Abstract: We present a novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem. It is formulated in a hypothesis selection framework and builds upon a state-of-the-art pedestrian detector. At each time instant, it searches for the globally optimal set of spacetime trajectories which provides the best explanation for the current image and for all evidence collected so far, while satisfying the constraints that no two objects may occupy the same physical space, nor explain the same image pixels at any point in time. Successful trajectory hypotheses are fed back to guide object detection in future frames. The optimization procedure is kept efficient through incremental computation and conservative hypothesis pruning. The resulting approach can initialize automatically and track a large and varying number of persons over long periods and through complex scenes with clutter, occlusions, and large-scale background changes. Also, the global optimization framework allows our system to recover from mismatches and temporarily lost tracks. We demonstrate the feasibility of the proposed approach on several challenging video sequences.

Journal ArticleDOI
TL;DR: A Bayesian model to estimate the a posteriori probability of the object class, after a certain match at a node of the tree, is presented, takes into account object scale and saliency and allows for a principled setting of the matching thresholds such that unpromising paths in the tree traversal process are eliminated early on.
Abstract: This paper presents a novel probabilistic approach to hierarchical, exemplar-based shape matching. No feature correspondence is needed among exemplars, just a suitable pairwise similarity measure. The approach uses a template tree to efficiently represent and match the variety of shape exemplars. The tree is generated offline by a bottom-up clustering approach using stochastic optimization. Online matching involves a simultaneous coarse-to-fine approach over the template tree and over the transformation parameters. The main contribution of this paper is a Bayesian model to estimate the a posteriori probability of the object class, after a certain match at a node of the tree. This model takes into account object scale and saliency and allows for a principled setting of the matching thresholds such that unpromising paths in the tree traversal process are eliminated early on. The proposed approach was tested in a variety of application domains. Here, results are presented on one of the more challenging domains: real-time pedestrian detection from a moving vehicle. A significant speed-up is obtained when comparing the proposed probabilistic matching approach with a manually tuned nonprobabilistic variant, both utilizing the same template tree structure.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A robust multi-layer background subtraction technique which takes advantages of local texture features represented by local binary patterns (LBP) and photometric invariant color measurements in RGB color space and allows to implicitly smooth detection results over regions of similar intensity and preserve object boundaries.
Abstract: In this paper, we propose a robust multi-layer background subtraction technique which takes advantages of local texture features represented by local binary patterns (LBP) and photometric invariant color measurements in RGB color space. LBP can work robustly with respective to light variation on rich texture regions but not so efficiently on uniform regions. In the latter case, color information should overcome LBP's limitation. Due to the illumination invariance of both the LBP feature and the selected color feature, the method is able to handle local illumination changes such as cast shadows from moving objects. Due to the use of a simple layer-based strategy, the approach can model moving background pixels with quasi-periodic flickering as well as background scenes which may vary over time due to the addition and removal of long-time stationary objects. Finally, the use of a cross-bilateral filter allows to implicitly smooth detection results over regions of similar intensity and preserve object boundaries. Numerical and qualitative experimental results on both simulated and real data demonstrate the robustness of the proposed method.

Journal ArticleDOI
TL;DR: This paper presents a volumetric formulation for the multiview stereo problem which is amenable to a computationally tractable global optimization using Graph-cuts and uses an occlusion robust photo-consistency metric based on normalized cross correlation, which does not assume any geometric knowledge about the reconstructed object.
Abstract: This paper presents a volumetric formulation for the multiview stereo problem which is amenable to a computationally tractable global optimization using Graph-cuts. Our approach is to seek the optimal partitioning of 3D space into two regions labeled as "object" and "empty" under a cost functional consisting of the following two terms: 1) A term that forces the boundary between the two regions to pass through photo-consistent locations; and 2) a ballooning term that inflates the "object" region. To take account of the effect of occlusion on the first term, we use an occlusion robust photo-consistency metric based on normalized cross correlation, which does not assume any geometric knowledge about the reconstructed object. The globally optimal 3D partitioning can be obtained as the minimum cut solution of a weighted graph.

Proceedings ArticleDOI
25 Apr 2007
TL;DR: MeshEye is introduced, an energy-efficient smart camera mote architecture that has been designed with intelligent surveillance as the target application in mind and basic vision algorithms for object detection, acquisition, and tracking are described and illustrated on real- world data.
Abstract: Surveillance is one of the promising applications to which smart camera motes forming a vision-enabled network can add increasing levels of intelligence. We see a high degree of in-node processing in combination with distributed reasoning algorithms as the key enablers for such intelligent surveillance systems. To put these systems into practice still requires a considerable amount of research ranging from mote architectures, pixel-processing algorithms, up to distributed reasoning engines. This paper introduces MeshEye, an energy-efficient smart camera mote architecture that has been designed with intelligent surveillance as the target application in mind. Special attention is given to MeshEye's unique vision system: a low-resolution stereo vision system continuously determines position, range, and size of moving objects entering its field of view. This information triggers a color camera module to acquire a high-resolution image sub-array containing the object, which can be efficiently processed in subsequent stages. It offers reduced complexity, response time, and power consumption over conventional solutions. Basic vision algorithms for object detection, acquisition, and tracking are described and illustrated on real- world data. The paper also presents a basic power model that estimates lifetime of our smart camera mote in battery-powered operation for intelligent surveillance event processing.

Proceedings ArticleDOI
19 Mar 2007
TL;DR: A model of signal dynamics to allow tracking of transceiver-free objects is proposed based on radio signal strength indicator (RSSI), which is readily available in wireless communication, and three tracking algorithms are proposed to eliminate noise behaviors and improve accuracy.
Abstract: In traditional radio-based localization methods, the target object has to carry a transmitter (e.g., active RFID), a receiver (e.g., 802.11x detector), or a transceiver (e.g., sensor node). However, in some applications, such as safe guard systems, it is not possible to meet this precondition. In this paper, we propose a model of signal dynamics to allow tracking of transceiver-free objects. Based on radio signal strength indicator (RSSI), which is readily available in wireless communication, three tracking algorithms are proposed to eliminate noise behaviors and improve accuracy. The midpoint and intersection algorithms can be applied to track a single object without calibration, while the best-cover algorithm has potential to track multiple objects but requires calibration. Our experimental test-bed is a grid sensor array based on MICA2 sensor nodes. The experimental results show that the best side length between sensor nodes in the grid is 2 meters and the best-cover algorithm can reach localization accuracy to 0.99 m

Journal ArticleDOI
TL;DR: This paper presents novel classification algorithms for recognizing object activity using object motion trajectory, and uses hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology.
Abstract: Motion trajectories provide rich spatiotemporal information about an object's activity. This paper presents novel classification algorithms for recognizing object activity using object motion trajectory. In the proposed classification system, trajectories are segmented at points of change in curvature, and the subtrajectories are represented by their principal component analysis (PCA) coefficients. We first present a framework to robustly estimate the multivariate probability density function based on PCA coefficients of the subtrajectories using Gaussian mixture models (GMMs). We show that GMM-based modeling alone cannot capture the temporal relations and ordering between underlying entities. To address this issue, we use hidden Markov models (HMMs) with a data-driven design in terms of number of states and topology (e.g., left-right versus ergodic). Experiments using a database of over 5700 complex trajectories (obtained from UCI-KDD data archives and Columbia University Multimedia Group) subdivided into 85 different classes demonstrate the superiority of our proposed HMM-based scheme using PCA coefficients of subtrajectories in comparison with other techniques in the literature.

Proceedings ArticleDOI
02 Jul 2007
TL;DR: A passive approach to detect digital forgeries by checking inconsistencies of blocking artifact based on the estimated quantization table using the power spectrum of the DCT coefficient histogram is described.
Abstract: Digital images can be forged easily with today's widely available image processing software. In this paper, we describe a passive approach to detect digital forgeries by checking inconsistencies of blocking artifact. Given a digital image, we find that the blocking artifacts introduced during JPEG compression could be used as a "natural authentication code". A blocking artifact measure is then proposed based on the estimated quantization table using the power spectrum of the DCT coefficient histogram. Experimental results also demonstrate the validity of the proposed approach.

Journal ArticleDOI
TL;DR: A robust face detection technique along with mouth localization, processing every frame in real time (video rate), is presented and "liveness" verification barriers are proposed as applications for which a significant amount of computation is avoided when estimating motion.
Abstract: A robust face detection technique along with mouth localization, processing every frame in real time (video rate), is presented. Moreover, it is exploited for motion analysis onsite to verify "liveness" as well as to achieve lip reading of digits. A methodological novelty is the suggested quantized angle features ("quangles") being designed for illumination invariance without the need for preprocessing (e.g., histogram equalization). This is achieved by using both the gradient direction and the double angle direction (the structure tensor angle), and by ignoring the magnitude of the gradient. Boosting techniques are applied in a quantized feature space. A major benefit is reduced processing time (i.e., that the training of effective cascaded classifiers is feasible in very short time, less than 1 h for data sets of order 104). Scale invariance is implemented through the use of an image scale pyramid. We propose "liveness" verification barriers as applications for which a significant amount of computation is avoided when estimating motion. Novel strategies to avert advanced spoofing attempts (e.g., replayed videos which include person utterances) are demonstrated. We present favorable results on face detection for the YALE face test set and competitive results for the CMU-MIT frontal face test set as well as on "liveness" verification barriers.