scispace - formally typeset
Search or ask a question

Showing papers on "Object detection published in 2008"


Journal ArticleDOI
TL;DR: In this article, a large collection of images with ground truth labels is built to be used for object detection and recognition research, such data is useful for supervised learning and quantitative evaluation.
Abstract: We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sharing of such annotations. Using this annotation tool, we have collected a large dataset that spans many object categories, often containing multiple instances over a wide variety of images. We quantify the contents of the dataset and compare against existing state of the art datasets used for object recognition and detection. Also, we show how to extend the dataset to automatically enhance object labels with WordNet, discover object parts, recover a depth ordering of objects in a scene, and increase the number of labels using minimal user supervision and images from the web.

3,501 citations


Journal ArticleDOI
TL;DR: For certain classes that are particularly prevalent in the dataset, such as people, this work is able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.
Abstract: With the advent of the Internet, billions of images are now freely available online and constitute a dense sampling of the visual world. Using a variety of non-parametric methods, we explore this world with the aid of a large dataset of 79,302,017 images collected from the Internet. Motivated by psychophysical results showing the remarkable tolerance of the human visual system to degradations in image resolution, the images in the dataset are stored as 32 x 32 color images. Each image is loosely labeled with one of the 75,062 non-abstract nouns in English, as listed in the Wordnet lexical database. Hence the image database gives a comprehensive coverage of all object categories and scenes. The semantic information from Wordnet can be used in conjunction with nearest-neighbor methods to perform object classification over a range of semantic levels minimizing the effects of labeling noise. For certain classes that are particularly prevalent in the dataset, such as people, we are able to demonstrate a recognition performance comparable to class-specific Viola-Jones style detectors.

1,871 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A privacy-preserving system for estimating the size of inhomogeneous crowds, composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking is presented.
Abstract: We present a privacy-preserving system for estimating the size of inhomogeneous crowds, composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking. First, the crowd is segmented into components of homogeneous motion, using the mixture of dynamic textures motion model. Second, a set of simple holistic features is extracted from each segmented region, and the correspondence between features and the number of people per segment is learned with Gaussian process regression. We validate both the crowd segmentation algorithm, and the crowd counting system, on a large pedestrian dataset (2000 frames of video, containing 49,885 total pedestrian instances). Finally, we present results of the system running on a full hour of video.

1,164 citations


Journal ArticleDOI
TL;DR: A novel method for detecting and localizing objects of a visual category in cluttered real-world scenes that is applicable to a range of different object categories, including both rigid and articulated objects and able to achieve competitive object detection performance from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.
Abstract: This paper presents a novel method for detecting and localizing objects of a visual category in cluttered real-world scenes. Our approach considers object categorization and figure-ground segmentation as two interleaved processes that closely collaborate towards a common goal. As shown in our work, the tight coupling between those two processes allows them to benefit from each other and improve the combined performance. The core part of our approach is a highly flexible learned representation for object shape that can combine the information observed on different training examples in a probabilistic extension of the Generalized Hough Transform. The resulting approach can detect categorical objects in novel images and automatically infer a probabilistic segmentation from the recognition result. This segmentation is then in turn used to again improve recognition by allowing the system to focus its efforts on object pixels and to discard misleading influences from the background. Moreover, the information from where in the image a hypothesis draws its support is employed in an MDL based hypothesis verification stage to resolve ambiguities between overlapping hypotheses and factor out the effects of partial occlusion. An extensive evaluation on several large data sets shows that the proposed system is applicable to a range of different object categories, including both rigid and articulated objects. In addition, its flexible representation allows it to achieve competitive object detection performance already from training sets that are between one and two orders of magnitude smaller than those used in comparable systems.

1,084 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A network flow based optimization method for data association needed for multiple object tracking that is efficient and does not require hypotheses pruning, and compared with previous results on two public pedestrian datasets to show its improvement.
Abstract: We propose a network flow based optimization method for data association needed for multiple object tracking. The maximum-a-posteriori (MAP) data association problem is mapped into a cost-flow network with a non-overlap constraint on trajectories. The optimal data association is found by a min-cost flow algorithm in the network. The network is augmented to include an explicit occlusion model(EOM) to track with long-term inter-object occlusions. A solution to the EOM-based network is found by an iterative approach built upon the original algorithm. Initialization and termination of trajectories and potential false observations are modeled by the formulation intrinsically. The method is efficient and does not require hypotheses pruning. Performance is compared with previous results on two public pedestrian datasets to show its improvement.

1,046 citations


Journal ArticleDOI
TL;DR: A novel approach for classifying points lying on a connected Riemannian manifold using the geometry of the space of d-dimensional nonsingular covariance matrices as object descriptors.
Abstract: We present a new algorithm to detect pedestrian in still images utilizing covariance matrices as object descriptors. Since the descriptors do not form a vector space, well known machine learning techniques are not well suited to learn the classifiers. The space of d-dimensional nonsingular covariance matrices can be represented as a connected Riemannian manifold. The main contribution of the paper is a novel approach for classifying points lying on a connected Riemannian manifold using the geometry of the space. The algorithm is tested on INRIA and DaimlerChrysler pedestrian datasets where superior detection rates are observed over the previous approaches.

1,044 citations


Journal ArticleDOI
TL;DR: A novel algorithm for detection of certain types of unusual events based on multiple local monitors which collect low-level statistics that is robust and works well in crowded scenes where tracking-based algorithms are likely to fail.
Abstract: We present a novel algorithm for detection of certain types of unusual events. The algorithm is based on multiple local monitors which collect low-level statistics. Each local monitor produces an alert if its current measurement is unusual and these alerts are integrated to a final decision regarding the existence of an unusual event. Our algorithm satisfies a set of requirements that are critical for successful deployment of any large-scale surveillance system. In particular, it requires a minimal setup (taking only a few minutes) and is fully automatic afterwards. Since it is not based on objects' tracks, it is robust and works well in crowded scenes where tracking-based algorithms are likely to fail. The algorithm is effective as soon as sufficient low-level observations representing the routine activity have been collected, which usually happens after a few minutes. Our algorithm runs in real-time. It was tested on a variety of real-life crowded scenes. A ground-truth was extracted for these scenes, with respect to which detection and false-alarm rates are reported.

822 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A simple and fast approach based on Fourier transform called spectral residual (SR) was proposed, which used SR of the amplitude spectrum to obtain the saliency map, and the results are good, but the reason is questionable.
Abstract: Salient areas in natural scenes are generally regarded as the candidates of attention focus in human eyes, which is the key stage in object detection. In computer vision, many models have been proposed to simulate the behavior of eyes such as SaliencyToolBox (STB), neuromorphic vision toolkit (NVT) and etc., but they demand high computational cost and their remarkable results mostly rely on the choice of parameters. Recently a simple and fast approach based on Fourier transform called spectral residual (SR) was proposed, which used SR of the amplitude spectrum to obtain the saliency map. The results are good, but the reason is questionable.

805 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A simple yet powerful branch-and-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages and converges to a globally optimal solution typically in sublinear time is proposed.
Abstract: Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branch-and-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time. We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the chi2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competition.

801 citations


Journal ArticleDOI
TL;DR: This work proposes an approach based on self organization through artificial neural networks, widely applied in human image processing systems and more generally in cognitive science, that can handle scenes containing moving backgrounds, gradual illumination variations and camouflage, and achieves robust detection for different types of videos taken with stationary cameras.
Abstract: Detection of moving objects in video streams is the first relevant step of information extraction in many computer vision applications. Aside from the intrinsic usefulness of being able to segment video streams into moving and background components, detecting moving objects provides a focus of attention for recognition, classification, and activity analysis, making these later steps more efficient. We propose an approach based on self organization through artificial neural networks, widely applied in human image processing systems and more generally in cognitive science. The proposed approach can handle scenes containing moving backgrounds, gradual illumination variations and camouflage, has no bootstrapping limitations, can include into the background model shadows cast by moving objects, and achieves robust detection for different types of videos taken with stationary cameras. We compare our method with other modeling techniques and report experimental results, both in terms of detection accuracy and in terms of processing speed, for color video sequences that represent typical situations critical for video surveillance systems.

792 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A mobile vision system for multi-person tracking in busy environments that integrates continuous visual odometry computation with tracking-by-detection in order to track pedestrians in spite of frequent occlusions and egomotion of the camera rig is presented.
Abstract: We present a mobile vision system for multi-person tracking in busy environments. Specifically, the system integrates continuous visual odometry computation with tracking-by-detection in order to track pedestrians in spite of frequent occlusions and egomotion of the camera rig. To achieve reliable performance under real-world conditions, it has long been advocated to extract and combine as much visual information as possible. We propose a way to closely integrate the vision modules for visual odometry, pedestrian detection, depth estimation, and tracking. The integration naturally leads to several cognitive feedback loops between the modules. Among others, we propose a novel feedback connection from the object detector to visual odometry which utilizes the semantic knowledge of detection to stabilize localization. Feedback loops always carry the danger that erroneous feedback from one module is amplified and causes the entire system to become instable. We therefore incorporate automatic failure detection and recovery, allowing the system to continue when a module becomes unreliable. The approach is experimentally evaluated on several long and difficult video sequences from busy inner-city locations. Our results show that the proposed integration makes it possible to deliver stable tracking performance in scenes of previously infeasible complexity.

Proceedings ArticleDOI
04 Jun 2008
TL;DR: In this paper, a robust and real-time approach to lane marker detection in urban streets is presented, which is based on generating a top view of the road, filtering using selective oriented Gaussian filters, using RANSAC line fitting to give initial guesses to a new and fast RANAC algorithm for fitting Bezier Splines, which was then followed by a post-processing step.
Abstract: We present a robust and real time approach to lane marker detection in urban streets. It is based on generating a top view of the road, filtering using selective oriented Gaussian filters, using RANSAC line fitting to give initial guesses to a new and fast RANSAC algorithm for fitting Bezier Splines, which is then followed by a post-processing step. Our algorithm can detect all lanes in still images of the street in various conditions, while operating at a rate of 50 Hz and achieving comparable results to previous techniques.

Journal ArticleDOI
TL;DR: It is shown that kAS substantially outperform IPs for detecting shape-based classes, and the object detector is compared to the recent state-of-the-art system by Dalal and Triggs (2005).
Abstract: We present a family of scale-invariant local shape features formed by chains of k connected roughly straight contour segments (kAS), and their use for object class detection. kAS are able to cleanly encode pure fragments of an object boundary without including nearby clutter. Moreover, they offer an attractive compromise between information content and repeatability and encompass a wide variety of local shape structures. We also define a translation and scale invariant descriptor encoding the geometric configuration of the segments within a kAS, making kAS easy to reuse in other frameworks, for example, as a replacement or addition to interest points (IPs). Software for detecting and describing kAS is released at http://lear.inrialpes.fr/software. We demonstrate the high performance of kAS within a simple but powerful sliding-window object detection scheme. Through extensive evaluations, involving eight diverse object classes and more than 1,400 images, we (1) study the evolution of performance as the degree of feature complexity k varies and determine the best degree, (2) show that kAS substantially outperform IPs for detecting shape-based classes, and (3) compare our object detector to the recent state-of-the-art system by Dalal and Triggs (2005).

Journal ArticleDOI
TL;DR: The proposed work addresses anomaly detection by means of trajectory analysis, an approach with several application fields, most notably video surveillance and traffic monitoring, based on single-class support vector machine (SVM) clustering, where the novelty detection SVM capabilities are used for the identification of anomalous trajectories.
Abstract: During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the application field, most of the works in event analysis are based on two main approaches: the former based on explicit event recognition, focused on finding high-level, semantic interpretations of video sequences, and the latter based on anomaly detection. This paper deals with the second approach, where the final goal is not the explicit labeling of recognized events, but the detection of anomalous events differing from typical patterns. In particular, the proposed work addresses anomaly detection by means of trajectory analysis, an approach with several application fields, most notably video surveillance and traffic monitoring. The proposed approach is based on single-class support vector machine (SVM) clustering, where the novelty detection SVM capabilities are used for the identification of anomalous trajectories. Particular attention is given to trajectory classification in absence of a priori information on the distribution of outliers. Experimental results prove the validity of the proposed approach.

Book ChapterDOI
20 Oct 2008
TL;DR: This paper clusters image regions based on their ability to serve as context for the detection of objects and shows that the things and stuff (TAS) context model produces meaningful clusters that are readily interpretable, and helps improve detection ability over state-of-the-art detectors.
Abstract: The sliding window approach of detecting rigid objects (such as cars) is predicated on the belief that the object can be identified from the appearance in a small region around the object. Other types of objects of amorphous spatial extent (e.g., trees, sky), however, are more naturally classified based on texture or color. In this paper, we seek to combine recognition of these two types of objects into a system that leverages "context" toward improving detection. In particular, we cluster image regions based on their ability to serve as context for the detection of objects. Rather than providing an explicit training set with region labels, our method automatically groups regions based on both their appearance and their relationships to the detections in the image. We show that our things and stuff (TAS) context model produces meaningful clusters that are readily interpretable, and helps improve our detection ability over state-of-the-art detectors. We also present a method for learning the active set of relationships for a particular dataset. We present results on object detection in images from the PASCAL VOC 2005/2006 datasets and on the task of overhead car detection in satellite images, demonstrating significant improvements over state-of-the-art detectors.

Proceedings ArticleDOI
07 Apr 2008
TL;DR: A novel partition-and-detect framework for trajectory outlier detection is proposed, which partitions a trajectory into a set of line segments, and then, detects outlying line segments for trajectory outliers.
Abstract: Outlier detection has been a popular data mining task. However, there is a lack of serious study on outlier detection for trajectory data. Even worse, an existing trajectory outlier detection algorithm has limited capability to detect outlying sub- trajectories. In this paper, we propose a novel partition-and-detect framework for trajectory outlier detection, which partitions a trajectory into a set of line segments, and then, detects outlying line segments for trajectory outliers. The primary advantage of this framework is to detect outlying sub-trajectories from a trajectory database. Based on this partition-and-detect framework, we develop a trajectory outlier detection algorithm TRAOD. Our algorithm consists of two phases: partitioning and detection. For the first phase, we propose a two-level trajectory partitioning strategy that ensures both high quality and high efficiency. For the second phase, we present a hybrid of the distance-based and density-based approaches. Experimental results demonstrate that TRAOD correctly detects outlying sub-trajectories from real trajectory data.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A novel method to address the problem of estimating the number of people in surveillance scenes with people gathering and waiting by combining a MID based foreground segmentation algorithm and a HOG based head-shoulder detection algorithm to provide an accurate estimation of people counts in the observed area.
Abstract: This paper proposes a novel method to address the problem of estimating the number of people in surveillance scenes with people gathering and waiting. The proposed method combines a MID (mosaic image difference) based foreground segmentation algorithm and a HOG (histograms of oriented gradients) based head-shoulder detection algorithm to provide an accurate estimation of people counts in the observed area. In our framework, the MID-based foreground segmentation module provides active areas for the head-shoulder detection module to detect heads and count the number of people. Numerous experiments are conducted and convincing results demonstrate the effectiveness of our method.

Journal ArticleDOI
TL;DR: This paper surveys many existing schemes in the literature of background removal, surveying the common pre-processing algorithms used in different situations, presenting different background models, and the most commonly used ways to update such models and how they can be initialized.
Abstract: Identifying moving objects is a critical task for many computer vision applications; it provides a classification of the pixels into either foreground or background. A common approach used to achieve such classification is background removal. Even though there exist numerous of background removal algorithms in the literature, most of them follow a simple flow diagram, passing through four major steps, which are pre-processing, background modelling, foreground de- tection and data validation. In this paper, we survey many existing schemes in the literature of background removal, sur- veying the common pre-processing algorithms used in different situations, presenting different background models, and the most commonly used ways to update such models and how they can be initialized. We also survey how to measure the performance of any moving object detection algorithm, whether the ground truth data is available or not, presenting per- formance metrics commonly used in both cases.

Journal ArticleDOI
TL;DR: A model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework is proposed, which defines a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation.
Abstract: Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion. We propose a model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework. We define a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation. The optimal solution is obtained by using an efficient sampling method, data-driven Markov chain Monte Carlo (DDMCMC), which uses image observations for proposal probabilities. Knowledge of various aspects, including human shape, camera model, and image cues, are integrated in one theoretically sound framework. We present experimental results and quantitative evaluation, demonstrating that the resulting approach is effective for very challenging data.

Journal ArticleDOI
TL;DR: A new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale, is proposed and compared with other methods based on contour and local descriptors in a detailed evaluation over 17 challenging categories.
Abstract: Psychophysical studies show that we can recognize objects using fragments of outline contour alone. This paper proposes a new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale. The system first builds a class-specific codebook of local fragments of contour using a novel formulation of chamfer matching. These local fragments allow recognition that is robust to within-class variation, pose changes, and articulation. Boosting combines these fragments into a cascaded sliding-window classifier, and mean shift is used to select strong responses as a final set of detection. We show how learning can be performed iteratively on both training and test sets to bootstrap an improved classifier. We compare with other methods based on contour and local descriptors in our detailed evaluation over 17 challenging categories and obtain highly competitive results. The results confirm that contour is indeed a powerful cue for multiscale and multiclass visual object recognition.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: The proposed method provides a new higher-level layer to the traditional surveillance pipeline for anomalous event detection and scene model feedback and successfully used the proposed scene model to detect local as well as global anomalies in object tracks.
Abstract: We present a novel framework for learning patterns of motion and sizes of objects in static camera surveillance. The proposed method provides a new higher-level layer to the traditional surveillance pipeline for anomalous event detection and scene model feedback. Pixel level probability density functions (pdfs) of appearance have been used for background modelling in the past, but modelling pixel level pdfs of object speed and size from the tracks is novel. Each pdf is modelled as a multivariate Gaussian mixture model (GMM) of the motion (destination location & transition time) and the size (width & height) parameters of the objects at that location. Output of the tracking module is used to perform unsupervised EM-based learning of every GMM. We have successfully used the proposed scene model to detect local as well as global anomalies in object tracks. We also show the use of this scene model to improve object detection through pixel-level parameter feedback of the minimum object size and background learning rate. Most object path modelling approaches first cluster the tracks into major paths in the scene, which can be a source of error. We avoid this by building local pdfs that capture a variety of tracks which are passing through them. Qualitative and quantitative analysis of actual surveillance videos proved the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: The Constrained Local Model (CLM) algorithm is more robust and more accurate than the AAM search method, which relies on the image reconstruction error to update the model parameters, and improves localisation accuracy on photographs of human faces, magnetic resonance images of the brain and a set of dental panoramic tomograms.

Journal ArticleDOI
TL;DR: A novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem is presented, formulated in a minimum description length hypothesis selection framework, which allows the system to recover from mismatches and temporarily lost tracks.
Abstract: We present a novel approach for multi-object tracking which considers object detection and spacetime trajectory estimation as a coupled optimization problem. Our approach is formulated in a minimum description length hypothesis selection framework, which allows our system to recover from mismatches and temporarily lost tracks. Building upon a state-of-the-art object detector, it performs multiview/multicategory object recognition to detect cars and pedestrians in the input images. The 2D object detections are checked for their consistency with (automatically estimated) scene geometry and are converted to 3D observations which are accumulated in a world coordinate frame. A subsequent trajectory estimation module analyzes the resulting 3D observations to find physically plausible spacetime trajectories. Tracking is achieved by performing model selection after every frame. At each time instant, our approach searches for the globally optimal set of spacetime trajectories which provides the best explanation for the current image and for all evidence collected so far while satisfying the constraints that no two objects may occupy the same physical space nor explain the same image pixels at any point in time. Successful trajectory hypotheses are then fed back to guide object detection in future frames. The optimization procedure is kept efficient through incremental computation and conservative hypothesis pruning. We evaluate our approach on several challenging video sequences and demonstrate its performance on both a surveillance-type scenario and a scenario where the input videos are taken from inside a moving vehicle passing through crowded city areas.

Journal ArticleDOI
TL;DR: The experimental result shows that the proposed unsupervised band selection algorithms based on band similarity measurement can yield a better result in terms of information conservation and class separability than other widely used techniques.
Abstract: Band selection is a common approach to reduce the data dimensionality of hyperspectral imagery. It extracts several bands of importance in some sense by taking advantage of high spectral correlation. Driven by detection or classification accuracy, one would expect that, using a subset of original bands, the accuracy is unchanged or tolerably degraded, whereas computational burden is significantly relaxed. When the desired object information is known, this task can be achieved by finding the bands that contain the most information about these objects. When the desired object information is unknown, i.e., unsupervised band selection, the objective is to select the most distinctive and informative bands. It is expected that these bands can provide an overall satisfactory detection and classification performance. In this letter, we propose unsupervised band selection algorithms based on band similarity measurement. The experimental result shows that our approach can yield a better result in terms of information conservation and class separability than other widely used techniques.

Book ChapterDOI
20 Oct 2008
TL;DR: This work proposes to treat object localization in a principled way by posing it as a problem of predicting structured data: it model the problem not as binary classification, but as the prediction of the bounding box of objects located in images.
Abstract: Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative examples, and this classifier is subsequently applied to multiple regions within test images. We propose instead to treat object localization in a principled way by posing it as a problem of predicting structured data: we model the problem not as binary classification, but as the prediction of the bounding box of objects located in images. The use of a joint-kernelframework allows us to formulate the training procedure as a generalization of an SVM, which can be solved efficiently. We further improve computational efficiency by using a branch-and-bound strategy for localization during both training and testing. Experimental evaluation on the PASCAL VOC and TU Darmstadt datasets show that the structured training procedure improves performance over binary training as well as the best previously published scores.

Journal ArticleDOI
01 Apr 2008
TL;DR: An intrusion detection algorithm based on the AdaBoost algorithm is proposed, which has low computational complexity and error rates, as compared with algorithms of higher computational complexity, as tested on the benchmark sample data.
Abstract: Network intrusion detection aims at distinguishing the attacks on the Internet from normal use of the Internet. It is an indispensable part of the information security system. Due to the variety of network behaviors and the rapid development of attack fashions, it is necessary to develop fast machine-learning-based intrusion detection algorithms with high detection rates and low false-alarm rates. In this correspondence, we propose an intrusion detection algorithm based on the AdaBoost algorithm. In the algorithm, decision stumps are used as weak classifiers. The decision rules are provided for both categorical and continuous features. By combining the weak classifiers for continuous features and the weak classifiers for categorical features into a strong classifier, the relations between these two different types of features are handled naturally, without any forced conversions between continuous and categorical features. Adaptable initial weights and a simple strategy for avoiding overfitting are adopted to improve the performance of the algorithm. Experimental results show that our algorithm has low computational complexity and error rates, as compared with algorithms of higher computational complexity, as tested on the benchmark sample data.

Proceedings ArticleDOI
14 May 2008
TL;DR: The utility of the glandular and nuclear segmentation algorithm in accurate extraction of various morphological and nuclear features for automated grading of prostate cancer, breast cancer, and breast cancer specimens is demonstrated by distinguishing between cancerous and benign breast histology specimens.
Abstract: Automated detection and segmentation of nuclear and glandular structures is critical for classification and grading of prostate and breast cancer histopathology. In this paper, we present a methodology for automated detection and segmentation of structures of interest in digitized histopathology images. The scheme integrates image information from across three different scales: (1) low- level information based on pixel values, (2) high-level information based on relationships between pixels for object detection, and (3) domain-specific information based on relationships between histological structures. Low-level information is utilized by a Bayesian classifier to generate a likelihood that each pixel belongs to an object of interest. High-level information is extracted in two ways: (i) by a level-set algorithm, where a contour is evolved in the likelihood scenes generated by the Bayesian classifier to identify object boundaries, and (ii) by a template matching algorithm, where shape models are used to identify glands and nuclei from the low-level likelihood scenes. Structural constraints are imposed via domain- specific knowledge in order to verify whether the detected objects do indeed belong to structures of interest. In this paper we demonstrate the utility of our glandular and nuclear segmentation algorithm in accurate extraction of various morphological and nuclear features for automated grading of (a) prostate cancer, (b) breast cancer, and (c) distinguishing between cancerous and benign breast histology specimens. The efficacy of our segmentation algorithm is evaluated by comparing breast and prostate cancer grading and benign vs. cancer discrimination accuracies with corresponding accuracies obtained via manual detection and segmentation of glands and nuclei.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A novel human detection system in personal albums based on LBP (local binary pattern) descriptor is developed and carefully designed experiments demonstrate the superiority of LBP over other traditional features for human detection.
Abstract: In recent years, local pattern based object detection and recognition have attracted increasing interest in computer vision research community. However, to our best knowledge no previous work has focused on utilizing local patterns for the task of human detection. In this paper we develop a novel human detection system in personal albums based on LBP (local binary pattern) descriptor. Firstly we review the existing gradient based local features widely used in human detection, analyze their limitations and argue that LBP is more discriminative. Secondly, original LBP descriptor does not suit the human detecting problem well due to its high complexity and lack of semantic consistency, thus we propose two variants of LBP: Semantic-LBP and Fourier-LBP. Carefully designed experiments demonstrate the superiority of LBP over other traditional features for human detection. Especially we adopt a random ensemble algorithm for better comparison between different descriptors. All experiments are conducted on INRIA human database.

Journal ArticleDOI
TL;DR: Experiments show significantly improved accuracy of the proposed approach in comparison with existing tracking methods, under the condition of low frame rate data and abrupt motion of both target and camera.
Abstract: Tracking object in low frame rate video or with abrupt motion poses two main difficulties which most conventional tracking methods can hardly handle: 1) poor motion continuity and increased search space; 2) fast appearance variation of target and more background clutter due to increased search space. In this paper, we address the problem from a view which integrates conventional tracking and detection, and present a temporal probabilistic combination of discriminative observers of different lifespans. Each observer is learned from different ranges of samples, with different subsets of features, to achieve varying level of discriminative power at varying cost. An efficient fusion and temporal inference is then done by a cascade particle filter which consists of multiple stages of importance sampling. Experiments show significantly improved accuracy of the proposed approach in comparison with existing tracking methods, under the condition of low frame rate data and abrupt motion of both target and camera.

Journal ArticleDOI
TL;DR: A novel city modeling framework which builds upon this philosophy to create 3D content at high speed by integrating it with an object recognition module that automatically detects cars in the input video streams and localizes them in 3D.
Abstract: Supplying realistically textured 3D city models at ground level promises to be useful for pre-visualizing upcoming traffic situations in car navigation systems. Because this pre-visualization can be rendered from the expected future viewpoints of the driver, the required maneuver will be more easily understandable. 3D city models can be reconstructed from the imagery recorded by surveying vehicles. The vastness of image material gathered by these vehicles, however, puts extreme demands on vision algorithms to ensure their practical usability. Algorithms need to be as fast as possible and should result in compact, memory efficient 3D city models for future ease of distribution and visualization. For the considered application, these are not contradictory demands. Simplified geometry assumptions can speed up vision algorithms while automatically guaranteeing compact geometry models. In this paper, we present a novel city modeling framework which builds upon this philosophy to create 3D content at high speed. Objects in the environment, such as cars and pedestrians, may however disturb the reconstruction, as they violate the simplified geometry assumptions, leading to visually unpleasant artifacts and degrading the visual realism of the resulting 3D city model. Unfortunately, such objects are prevalent in urban scenes. We therefore extend the reconstruction framework by integrating it with an object recognition module that automatically detects cars in the input video streams and localizes them in 3D. The two components of our system are tightly integrated and benefit from each other's continuous input. 3D reconstruction delivers geometric scene context, which greatly helps improve detection precision. The detected car locations, on the other hand, are used to instantiate virtual placeholder models which augment the visual realism of the reconstructed city model.