scispace - formally typeset
Search or ask a question

Showing papers on "Object-class detection published in 2012"


Book ChapterDOI
05 Nov 2012
TL;DR: A framework for automatic modeling, detection, and tracking of 3D objects with a Kinect and shows how to build the templates automatically from 3D models, and how to estimate the 6 degrees-of-freedom pose accurately and in real-time.
Abstract: We propose a framework for automatic modeling, detection, and tracking of 3D objects with a Kinect. The detection part is mainly based on the recent template-based LINEMOD approach [1] for object detection. We show how to build the templates automatically from 3D models, and how to estimate the 6 degrees-of-freedom pose accurately and in real-time. The pose estimation and the color information allow us to check the detection hypotheses and improves the correct detection rate by 13% with respect to the original LINEMOD. These many improvements make our framework suitable for object manipulation in Robotics applications. Moreover we propose a new dataset made of 15 registered, 1100+ frame video sequences of 15 various objects for the evaluation of future competing methods.

1,114 citations


Journal ArticleDOI
TL;DR: A vision-based human--computer interface that detects voluntary eye-blinks and interprets them as control commands and test results indicate interface usefulness in offering an alternative mean of communication with computers.
Abstract: A vision-based human--computer interface is presented in the paper. The interface detects voluntary eye-blinks and interprets them as control commands. The employed image processing methods include Haar-like features for automatic face detection, and template matching based eye tracking and eye-blink detection. Interface performance was tested by 49 users (of which 12 were with physical disabilities). Test results indicate interface usefulness in offering an alternative mean of communication with computers. The users entered English and Polish text (with average time of less than 12s per character) and were able to browse the Internet. The interface is based on a notebook equipped with a typical web camera and requires no extra light sources. The interface application is available on-line as open-source software.

192 citations


01 Jan 2012
TL;DR: In this paper, a survey of various techniques related to video surveillance system improving the security is presented, which includes background subtraction with alpha, statistical method, Eigen background subtractions and temporal frame differencing to detect moving object.
Abstract: This paper presents a survey of various techniques related to video surveillance system improving the security. The goal of this paper is to review of various moving object detection and object tracking methods. This paper focuses on detection of moving objects in video surveillance system then tracking the detected objects in the scene. Moving Object detection is first low level important task for any video surveillance application. Detection of moving object is a challenging task. Tracking is required in higher level applications that require the location and shape of object in every frame. In this survey, I described Background subtraction with alpha, statistical method, Eigen background Subtraction and Temporal frame differencing to detect moving object. I also described tracking method based on point tracking, kernel tracking and silhouette tracking.

186 citations


Patent
30 May 2012
TL;DR: In this article, a system and method for control using face detection or hand gesture detection algorithms in a captured image is presented. But, it is not shown how to use the hand gestures to detect the face in the image.
Abstract: System and method for control using face detection or hand gesture detection algorithms in a captured image. Based on the existence of a detected human face or a hand gesture in an image captured by a digital camera, a control signal is generated and provided to a device. The control may provide power or disconnect power supply to the device (or part of the device circuits). The location of the detected face in the image may be used to rotate a display screen to achieve a better line of sight with a viewing person. The difference between the location of the detected face and an optimum is the error to be corrected by rotating the display to the required angular position. A hand gesture detection can be used as a replacement to a remote control for the controlled unit, such as a television set.

128 citations


21 Apr 2012
TL;DR: Their system tries to detect the critical areas of the face using an information theory approach that decomposes face images into a small set of characteristic feature images called „Eigen faces”, which are actually the principal components of the initial training set of face images.
Abstract: Their system tries to detect the critical areas of the face. The system is based on matching the image to a map of invariant facial attributes associated with specific areas of the face. PCA: The proposed system is based on an information theory approach that decomposes face images into a small set of characteristic feature images called „Eigen faces‟, which are actually the principal components of the initial training set of face images. Recognition is performed by projecting a new image into the subspace spanned by the Eigen faces („face space‟) and then classifying the face by comparing its position in the face space with the positions of the known individuals. The Eigen face approach gives us efficient way to find this lower dimensional space. Eigen faces are the Eigenvectors which are representative of each of the dimensions of this face space and they can be considered as various face features. Any face can be expressed as linear combinations of the singular vectors of the set of faces, and these singular vectors are eigenvectors of the covariance matrices.

127 citations


Proceedings ArticleDOI
06 Aug 2012
TL;DR: Starting from a set of automatically located facial points, geometric invariants are exploited for detecting replay attacks and the presented results demonstrate the effectiveness and efficiency of the proposed indices.
Abstract: Face recognition provides many advantages compared with other available biometrics, but it is particularly subject to spoofing. The most accurate methods in literature addressing this problem, rely on the estimation of the three-dimensionality of faces, which heavily increase the whole cost of the system. This paper proposes an effective and efficient solution to problem of face spoofing. Starting from a set of automatically located facial points, we exploit geometric invariants for detecting replay attacks. The presented results demonstrate the effectiveness and efficiency of the proposed indices.

125 citations


Proceedings ArticleDOI
Junjie Yan1, Zhiwei Zhang1, Zhen Lei1, Dong Yi1, Stan Z. Li1 
01 Dec 2012
TL;DR: Three scenic clues are proposed, which are non-rigid motion, face-background consistency and imaging banding effect, to conduct accurate and efficient face liveness detection, which achieves 100% accuracy on Idiap print-attack database and the best performance on self-collected face anti-spoofing database.
Abstract: Liveness detection is an indispensable guarantee for reliable face recognition, which has recently received enormous attention. In this paper we propose three scenic clues, which are non-rigid motion, face-background consistency and imaging banding effect, to conduct accurate and efficient face liveness detection. Non-rigid motion clue indicates the facial motions that a genuine face can exhibit such as blinking, and a low rank matrix decomposition based image alignment approach is designed to extract this non-rigid motion. Face-background consistency clue believes that the motion of face and background has high consistency for fake facial photos while low consistency for genuine faces, and this consistency can serve as an efficient liveness clue which is explored by GMM based motion detection method. Image banding effect reflects the imaging quality defects introduced in the fake face reproduction, which can be detected by wavelet decomposition. By fusing these three clues, we thoroughly explore sufficient clues for liveness detection. The proposed face liveness detection method achieves 100% accuracy on Idiap print-attack database and the best performance on self-collected face anti-spoofing database.

98 citations


Proceedings ArticleDOI
13 Oct 2012
TL;DR: An object detection and localization scheme for 3D objects that combines intensity and depth data is presented, proving that it is generic and highly robust to occlusions and clutter.
Abstract: Object detection and localization is a crucial step for inspection and manipulation tasks in robotic and industrial applications. We present an object detection and localization scheme for 3D objects that combines intensity and depth data. A novel multimodal, scale- and rotation-invariant feature is used to simultaneously describe the object's silhouette and surface appearance. The object's position is determined by matching scene and model features via a Hough-like local voting scheme. The proposed method is quantitatively and qualitatively evaluated on a large number of real sequences, proving that it is generic and highly robust to occlusions and clutter. Comparisons with state of the art methods demonstrate comparable results and higher robustness with respect to occlusions.

97 citations


Proceedings ArticleDOI
25 Nov 2012
TL;DR: An RGB-D database containing 1581 images (and their depth counterparts) taken from 31 persons in 17 different poses and facial expressions using a Kinect device is proposed and used in a face detection algorithm which is based on the depth information of the images.
Abstract: The very first step in many facial analysis systems is face detection. Though face detection has been studied for many years, there is not still a benchmark public database to be widely accepted among researchers for which both color and depth information are obtained by the same sensor. Most of the available 3d databases have already automatically or manually detected the face images and they are therefore mostly used for face recognition not detection. This paper purposes an RGB-D database containing 1581 images (and their depth counterparts) taken from 31 persons in 17 different poses and facial expressions using a Kinect device. The faces in the images are not extracted neither in the RGB images nor in the depth hereof, therefore they can be used for both detection and recognition. The proposed database has been used in a face detection algorithm which is based on the depth information of the images. The challenges and merits of the database have been highlighted through experimental results.

89 citations


Proceedings Article
03 Dec 2012
TL;DR: This work proposes strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations, and shows that these strategies are more elegant than sliding windows.
Abstract: The dominant visual search paradigm for object class detection is sliding windows Although simple and effective, it is also wasteful, unnatural and rigidly hardwired We propose strategies to search for objects which intelligently explore the space of windows by making sequential observations at locations decided based on previous observations Our strategies adapt to the class being searched and to the content of a particular test image, exploiting context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set In addition to being more elegant than sliding windows, we demonstrate experimentally on the PASCAL VOC 2010 dataset that our strategies evaluate two orders of magnitude fewer windows while achieving higher object detection performance

86 citations


Journal ArticleDOI
TL;DR: A figure/ground segmentation method for extraction of image regions that resemble the global properties of a model boundary structure and are perceptually salient and achieves state-of-the-art performance on several object detection and segmentation benchmarks.
Abstract: We address the problem of object detection and segmentation using global holistic properties of object shape. Global shape representations are highly susceptible to clutter inevitably present in realistic images, and thus can be applied robustly only using a precise segmentation of the object. To this end, we propose a figure/ground segmentation method for extraction of image regions that resemble the global properties of a model boundary structure and are perceptually salient. Our shape representation, called the chordiogram, is based on geometric relationships of object boundary edges, while the perceptual saliency cues we use favor coherent regions distinct from the background. We formulate the segmentation problem as an integer quadratic program and use a semidefinite programming relaxation to solve it. The obtained solutions provide a segmentation of the object as well as a detection score used for object recognition. Our single-step approach achieves state-of-the-art performance on several object detection and segmentation benchmarks.

Book ChapterDOI
28 Aug 2012
TL;DR: A novel multi-view (multi-camera) detection approach that combines single-view detections from multiple views and takes advantage of the mutual reinforcement of geometrically consistent hypotheses is proposed.
Abstract: Motivated by aiding human operators in the detection of dangerous objects in passenger luggage, such as in airports, we develop an automatic object detection approach for multi-view X-ray image data. We make three main contributions: First, we systematically analyze the appearance variations of objects in X-ray images from inspection systems. We then address these variations by adapting standard appearance-based object detection approaches to the specifics of dual-energy X-ray data and the inspection scenario itself. To that end we reduce projection distortions, extend the feature representation, and address both in-plane and out-of-plane object rotations, which are a key challenge compared to many detection tasks in photographic images. Finally, we propose a novel multi-view (multi-camera) detection approach that combines single-view detections from multiple views and takes advantage of the mutual reinforcement of geometrically consistent hypotheses. While our multi-view approach can be used atop arbitrary single-view detectors, thus also for multi-camera detection in photographic images, we evaluate our method on detecting handguns in carry-on luggage. Our results show significant performance gains from all components.

Journal ArticleDOI
TL;DR: Extensive experimental results on the BU3D database indicate the effectiveness of the proposed C-RBF network for recovering the 3-D face model from a single 2-Dface image.
Abstract: Reconstruction of a 3-D face model from a single 2-D face image is fundamentally important for face recognition and animation because the 3-D face model is invariant to changes of viewpoint, illumination, background clutter, and occlusions. Given a coupled training set that contains pairs of 2-D faces and the corresponding 3-D faces, we train a novel coupled radial basis function network (C-RBF) to recover the 3-D face model from a single 2-D face image. The C-RBF network explores: 1) the intrinsic representations of 3-D face models and those of 2-D face images; 2) mappings between a 3-D face model and its intrinsic representation; and 3) mappings between a 2-D face image and its intrinsic representation. Since a particular face can be reconstructed by its nearest neighbors, we can assume that the linear combination coefficients for a particular 2-D face image reconstruction are identical to those for the corresponding 3-D face model reconstruction. Therefore, we can reconstruct a 3-D face model by using a single 2-D face image based on the C-RBF network. Extensive experimental results on the BU3D database indicate the effectiveness of the proposed C-RBF network for recovering the 3-D face model from a single 2-D face image.

Book ChapterDOI
05 Nov 2012
TL;DR: Experimental results show that the proposed face recognition algorithm outperforms two commercial state-of-the-art face recognition SDKs (FaceVACS and PittPatt) for long distance face recognition in both daytime and nighttime operations, highlighting the need for better data capture setup and robust face matching algorithms for cross spectral matching at distances greater than 100 meters.
Abstract: Automatic face recognition capability in surveillance systems is important for security applications. However, few studies have addressed the problem of outdoor face recognition at a long distance (over 100 meters) in both daytime and nighttime environments. In this paper, we first report on a system that we have designed to collect face image database at a long distance, called the Long Distance Heterogeneous Face Database (LDHF-DB) to advance research on this topic. The LDHF-DB contains face images collected in an outdoor environment at distances of 60 meters, 100 meters, and 150 meters, with both visible light (VIS) face images captured in daytime and near infrared (NIR) face images captured in nighttime. Given this database, we have conducted two types of cross-distance face matching (matching long-distance probe to 1-meter gallery) experiments: (i) intra-spectral (VIS to VIS) face matching, and (ii) cross-spectral (NIR to VIS) face matching. The proposed face recognition algorithm consists of following three major steps: (i) Gaussian filtering to remove high frequency noise, (ii) Scale Invariant Feature Transform (SIFT) in local image regions for feature representation, and (iii) a random subspace method to build discriminant subspaces for face recognition. Experimental results show that the proposed face recognition algorithm outperforms two commercial state-of-the-art face recognition SDKs (FaceVACS and PittPatt) for long distance face recognition in both daytime and nighttime operations. These results highlight the need for better data capture setup and robust face matching algorithms for cross spectral matching at distances greater than 100 meters.

Proceedings ArticleDOI
03 Jul 2012
TL;DR: The object measurement using stereo camera is better than object detection using a single camera that was proposed in many previous research works because it is much easier to calibrate and can produce a more accurate results.
Abstract: Human has the ability to roughly estimate the distance and size of an object because of the stereo vision of human's eyes. In this project, we proposed to utilize stereo vision system to accurately measure the distance and size (height and width) of object in view. Object size identification is very useful in building systems or applications especially in autonomous system navigation. Many recent works have started to use multiple vision sensors or cameras for different type of application such as 3D image constructions, occlusion detection and etc. Multiple cameras system has becoming more popular since cameras are now very cheap and easy to deploy and utilize. The proposed measurement system consists of object detection on the stereo images and blob extraction and distance and size calculation and object identification. The system also employs a fast algorithm so that the measurement can be done in real-time. The object measurement using stereo camera is better than object detection using a single camera that was proposed in many previous research works. It is much easier to calibrate and can produce a more accurate results.

Proceedings ArticleDOI
18 Sep 2012
TL;DR: A video processing chain for detection, segmentation, and tracking of multiple moving objects is presented dealing with the mentioned challenges of camera motion, high object distance, varying object background, multiple objects near to each other, weak signal-to-noise-ratio (SNR), or compression artifacts.
Abstract: Automatic processing of videos coming from small UAVs offers high potential for advanced surveillance applications but is also very challenging. These challenges include camera motion, high object distance, varying object background, multiple objects near to each other, weak signal-to-noise-ratio (SNR), or compression artifacts. In this paper, a video processing chain for detection, segmentation, and tracking of multiple moving objects is presented dealing with the mentioned challenges. The fundament is the detection of local image features, which are not stationary. By clustering these features and subsequent object segmentation, regions are generated representing object hypotheses. Multi-object tracking is introduced using a Kalman filter and considering the camera motion. Split or merged object regions are handled by fusion of the regions and the local features. Finally, a quantitative evaluation of object segmentation and tracking is provided.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: It is argued that “one class” methods - ones that focus mainly on modelling the range of the positive class - are a useful alternative to binary discriminants in such applications, particularly in the early stages of the cascade where one-class approaches may allow simpler classifiers and faster rejection.
Abstract: An object detector must detect and localize each instance of the object class of interest in the image. Many recent detectors adopt a sliding window approach, reducing the problem to one of deciding whether the detection window currently contains a valid object instance or background. Machine learning based discriminants such as SVM and boosting are typically used for this, often in the form of classifier cascades to allow more rapid rejection of easy negatives. We argue that “one class” methods — ones that focus mainly on modelling the range of the positive class — are a useful alternative to binary discriminants in such applications, particularly in the early stages of the cascade where one-class approaches may allow simpler classifiers and faster rejection. We implement this in the form of a short cascade of efficient nearest-convex-model one-class classifiers, starting with linear distance-to-affine-hyperplane and interior-of-hypersphere classifiers and finishing with kernelized hypersphere classifiers. We show that our methods have very competitive performance on the Faces in the Wild and ESOGU face detection datasets and state-of-the-art performance on the INRIA Person dataset. As predicted, the one-class formulations provide significant reductions in classifier complexity relative to the corresponding two-class ones.

Proceedings Article
01 Nov 2012
TL;DR: This work proposes an approach for multi-pose face tracking by association of face detection responses in two stages using multiple cues based on location, size and pose, resulting in short but reliable tracklets.
Abstract: We propose an approach for multi-pose face tracking by association of face detection responses in two stages using multiple cues. The low-level stage uses a two-threshold strategy to merge detection responses based on location, size and pose, resulting in short but reliable tracklets. The high-level stage uses different cues for computing a joint similarity measure between tracklets. The facial cue compares facial features of the most frontal face detections in pairs of tracklets. The classifier cue learns a discriminative appearance model for each tracklet, using detection pairs within reliable tracklets and between overlapping tracklets as training data. The constraint cue observes the compatibility of motion of two tracklets. The association of tracklets is globally optimized with the Hungarian algorithm. We validate our approach on two challenging episodes of two TV series and report a Multiple Object Tracking Accuracy (MOTA) of 82% and 68.2%, respectively.

Proceedings ArticleDOI
25 Oct 2012
TL;DR: In the article a design embedded into a reconfigurable device which is using the Histogram of Oriented Gradients for feature extraction and SVM classification for detecting multiple objects is presented.
Abstract: Object detection and localization in a video stream is an important requirement for almost all vision systems. In the article a design embedded into a reconfigurable device which is using the Histogram of Oriented Gradients for feature extraction and SVM classification for detecting multiple objects is presented. Superior accuracy is achieved by making all computations using single precision 32-bit floating point values in all stages of image processing. The resulting implementation is fully pipelined and there is no need for external memory. Finally a working system able to detect and localize three different classes of objects in color images with resolution 640×480 @ 60fps is presented with a computational performance above 9 GFLOPS.

Proceedings Article
27 Sep 2012
TL;DR: In this work, face liveness detection approaches are categorized based on the type of liveness indicator used to help understanding different spoof attacks scenarios and their relation to the developed solutions.
Abstract: Face recognition based on 2D images is a widely used biometric approach. This is mainly due to the simplicity and high usability of this approach. Nonetheless, those solutions are vulnerable to spoof attacks made by non-real faces. In order to identify malicious attacks on such biometric systems, 2D face liveness detection approaches are developed. In this work, face liveness detection approaches are categorized based on the type of liveness indicator used. This categorization helps understanding different spoof attacks scenarios and their relation to the developed solutions. A review of the latest works dealing with face liveness detection works is presented. A discussion is made to link the state of the art solutions with the presented categorization along with the available and possible future datasets. All that aim to provide a clear path for the future development of innovative face liveness detection solutions.

01 Jan 2012
TL;DR: This research work aims to detect suspicious activities such as object exchange, entry of a new person, peeping into other's answer sheet and person exchange from the video captured by a surveillance camera during examinations by automating 'suspicious activity detection'.
Abstract: Video analytics is the method of processing a video, gathering data and analysing the data for getting domain specific information. In the current trend, besides analysing any video for information retrieval, analysing live surveillance videos for detecting activities that take place in its coverage area has become more important. Such systems will be implemented real time. Automated face recognition from surveillance videos becomes easier while using a training model such as Artificial Neural Network. Hand detection is assisted by skin color estimation. This research work aims to detect suspicious activities such as object exchange, entry of a new person, peeping into other's answer sheet and person exchange from the video captured by a surveillance camera during examinations. This requires the process of face recognition, hand recognition and detecting the contact between the face and hands of the same person and that among different persons. Automation of 'suspicious activity detection' will help decrease error rate due to manual monitoring.

Proceedings ArticleDOI
02 May 2012
TL;DR: A global motion saliency detection method based on spectral analysis, which aims to discover and localise interesting regions, of which the flows are salient in relation to the dominant crowd flows, is presented.
Abstract: To reduce cognitive overload in CCTV monitoring, it is critical to have an automated way to focus the attention of operators on interesting events taking place in crowded public scenes. We present a global motion saliency detection method based on spectral analysis, which aims to discover and localise interesting regions, of which the flows are salient in relation to the dominant crowd flows. The method is fast and does not rely on prior knowledge specific to a scene and any training videos. We demonstrate its potential on public scene videos, with applications in salient action detection, counter flow detection, and unstable crowd flow detection.

Journal ArticleDOI
TL;DR: This study presents a new method for precise detection of frontal human faces and eyes using a multi-level ellipse detector combined with a support vector machines verifier and demonstrates that the detection error propagation substantially affects the face recognition performance.
Abstract: This study presents a new method for precise detection of frontal human faces and eyes using a multi-level ellipse detector combined with a support vector machines verifier. Main contribution of this study lies in improving the accuracy of eye detection in high-quality images, which is often neglected by alternative methods. Although many approaches to face detection have been proposed recently, relatively little attention has been paid to the detection precision. It is worth noting that the detection precision is particularly important for face analysis purposes. More specifically, the authors demonstrate that the detection error propagation substantially affects the face recognition performance. With the proposed improvements the authors have managed to increase the face recognition rate by 7.7% for AR database compared with the publicly-available implementation of the well-established Viola–Jones face and eye detector.

Proceedings ArticleDOI
06 Dec 2012
TL;DR: Experimental results comparing the proposed method with a state-of-the-art commercial face matcher and densely sampled LBP on a subset of the FERET database show the effectiveness of the proposed 3D face texture model.
Abstract: 3D face modeling from 2D face images is of significant importance for face analysis, animation and recognition. Previous research on this topic mainly focused on 3D face modeling from a single 2D face image; however, a single face image can only provide a limited description of a 3D face. In many applications, for example, law enforcement, multi-view face images are usually captured for a subject during enrollment, which makes it desirable to build a 3D face texture model, given a pair of frontal and profile face images. We first determine the correspondence between un-calibrated frontal and profile face images through facial landmark alignment. An initial 3D face shape is then reconstructed from the frontal face image, followed by shape refinement utilizing the depth information provided by the profile image. Finally, face texture is extracted by mapping the frontal face image on the recovered 3D face shape. The proposed method is utilized for 2D face recognition in two scenarios: (i) normalization of probe image, and (ii) enhancing the representation capability of gallery set. Experimental results comparing the proposed method with a state-of-the-art commercial face matcher and densely sampled LBP on a subset of the FERET database show the effectiveness of the proposed 3D face texture model.

Proceedings ArticleDOI
01 Sep 2012
TL;DR: This work addresses the issue of fire and smoke detection in a scene within a video surveillance framework by means of a motion detection algorithm and a pixel selection based on the dynamics of the area in order to reduce false detection.
Abstract: This work addresses the issue of fire and smoke detection in a scene within a video surveillance framework. Detection of fire and smoke pixels is at first achieved by means of a motion detection algorithm. In addition, separation of smoke and fire pixels using colour information (within appropriate spaces, specifically chosen in order to enhance specific chromatic features) is performed. In parallel, a pixel selection based on the dynamics of the area is carried out in order to reduce false detection. The output of the three parallel algorithms are eventually fused by means of a MLP.

01 Jan 2012
TL;DR: A skin based segmentation algorithm for face detection in color images with detection of multiple faces and skin regions is proposed because skin color has proven to be a useful and robust cue for face Detection, localization and tracking.
Abstract: Because of the increasing instances of identity theft and terrorism incidences in past few years, biometrics based security system has been an area of quality research. Modern day biometrics is a cutting edge technology which enables the automated system to distinguish between a genuine person and an imposter. Automated face recognition is one of the areas of biometrics which is widely used because of the uniqueness of one human face to other human face. Automated face recognition has basically two parts one is face detection and other one is recognition of detected faces. To detect a face from an online surveillance system or an offline image, the main component that should be detected is the skin areas. This paper proposes a skin based segmentation algorithm for face detection in color images with detection of multiple faces and skin regions. Skin color has proven to be a useful and robust cue for face detection, localization and tracking.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed novel face detection method has achieved high detection rates and low false positives over a wide range of facial variations in color, position and varying lighting conditions.

Proceedings ArticleDOI
29 Oct 2012
TL;DR: In this paper, the authors proposed an image representation called Detection Bank, which is based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections and extended to video by aggregating the key frame level image representations through mean and max pooling.
Abstract: While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual tasks such as multimedia event detection and recognition Recognition or retrieval of events and activities can be improved if specific discriminative objects are detected in a video sequence In this paper, we propose an image representation, called Detection Bank, based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections This representation is extended to video by aggregating the key frame level image representations through mean and max pooling We empirically show that it captures complementary information to state-of-the-art representations such as Spatial Pyramid Matching and Object Bank These descriptors combined with our Detection Bank representation significantly outperforms any of the representations alone on TRECVID MED 2011 data

Proceedings ArticleDOI
23 May 2012
TL;DR: The modified AdaBoost algorithm based on OpenCV is presented, and experiments of real-time face detecting are given through two methods of timer and dual-thread, showing that the method of face detection with dual- thread is simpler, smoother and more precise.
Abstract: Face detection technology has widely attracted attention due to its enormous application value and market potential, such as face recognition and video surveillance system. Real-time face detection not only is one part of the automatic face recognition system but also is developing an independent research subject. So, there are many approaches to solve face detection. Here the modified AdaBoost algorithm based on OpenCV is presented, and experiments of real-time face detecting are also given through two methods of timer and dual-thread. The result shows that the method of face detection with dual-thread is simpler, smoother and more precise.

Patent
15 Nov 2012
TL;DR: In this paper, a multi-dimensional virtual beam is used to detect whether the tracked object of interest is continually present in a detection zone within the field view of the scene, and an occurrence of an event is signaled when the tracked target is continuously present in the detection zone.
Abstract: Video analytics is used to track an object of interest represented in video data representing the field of view of a scene observed by a video camera. A multi-dimensional virtual beam is used to detect whether the tracked object of interest is continually present in a detection zone within the field view of the scene. An occurrence of an event is signaled when the tracked object of interest is continually present in the detection zone during a period beginning when the tracked object of interest enters the detection zone and ending when the tracked object of interest leaves the detection zone through the opposite side, after having completely crossed through the detection zone. Use of a virtual beam detection zone reduces false alarms as compared to the numbers of incidences of false alarms of traditional detection methods, while adding several features and benefits.