scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Spatio-Temporal Processing for Automatic Vehicle Detection in Wide-Area Aerial Video

23 Oct 2020-IEEE Access (IEEE)-Vol. 8, pp 199562-199572
TL;DR: A spatio-temporal processing scheme to improve automatic vehicle detection performance by replacing the thresholding step of existing detection algorithms with multi-neighborhood hysteresis thresholding for foreground pixel classification is presented.
Abstract: Vehicle detection in aerial videos often requires post-processing to eliminate false detections. This paper presents a spatio-temporal processing scheme to improve automatic vehicle detection performance by replacing the thresholding step of existing detection algorithms with multi-neighborhood hysteresis thresholding for foreground pixel classification. The proposed scheme also performs spatial post-processing, which includes morphological opening and closing to shape and prune the detected objects, and temporal post-processing to further reduce false detections. We evaluate the performance of the proposed spatial processing on two local aerial video datasets and one parking vehicle dataset, and the performance of the proposed spatio-temporal processing scheme on five local aerial video datasets and one public dataset. Experimental evaluation shows that the proposed schemes improve vehicle detection performance for each of the nine algorithms when evaluated on seven datasets. Overall, the use of the proposed spatio-temporal processing scheme improves average F-score to above 0.8 and achieves an average reduction of 83.8% in false positives.
Citations
More filters
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors used pre-trained deep models to extract high-level concept and context features for training denoising autoencoder (DAE), requiring little training time (i.e., within 10 seconds on UCSD Pedestrian datasets).
Abstract: Deep learning-based video anomaly detection methods have drawn significant attention in the past few years due to their superior performance. However, almost all the leading methods for video anomaly detection rely on large-scale training datasets with long training times. As a result, many real-world video analysis tasks are still not applicable for fast deployment. On the other hand, the leading methods cannot provide interpretability due to the uninterpretable feature representations hiding the decision-making process when anomaly detection models are considered as a black box. However, the interpretability for anomaly detection is crucial since the corresponding response to the anomalies in the video is determined by their severity and nature. To tackle these problems, this paper proposes an efficient deep learning framework for video anomaly detection and provides explanations. The proposed framework uses pre-trained deep models to extract high-level concept and context features for training denoising autoencoder (DAE), requiring little training time (i.e., within 10 s on UCSD Pedestrian datasets) while achieving comparable detection performance to the leading methods. Furthermore, this framework presents the first video anomaly detection use of combing autoencoder and SHapley Additive exPlanations (SHAP) for model interpretability. The framework can explain each anomaly detection result in surveillance videos. In the experiments, we evaluate the proposed framework's effectiveness and efficiency while also explaining anomalies behind the autoencoder's prediction. On the USCD Pedestrian datasets, the DAE achieved 85.9% AUC with a training time of 5 s on the USCD Ped1 and 92.4% AUC with a training time of 2.9 s on the UCSD Ped2.

17 citations

Journal ArticleDOI
TL;DR: The combined results prove that the enhanced three-stage post-processing scheme achieves a mean average precision (mAP) of 63.9% for feature extraction methods and 82.8% for the machine learning approach.
Abstract: In low-resolution wide-area aerial imagery, object detection algorithms are categorized as feature extraction and machine learning approaches, where the former often requires a post-processing scheme to reduce false detections and the latter demands multi-stage learning followed by post-processing. In this paper, we present an approach on how to select post-processing schemes for aerial object detection. We evaluated combinations of each of ten vehicle detection algorithms with any of seven post-processing schemes, where the best three schemes for each algorithm were determined using average F-score metric. The performance improvement is quantified using basic information retrieval metrics as well as the classification of events, activities and relationships (CLEAR) metrics. We also implemented a two-stage learning algorithm using a hundred-layer densely connected convolutional neural network for small object detection and evaluated its degree of improvement when combined with the various post-processing schemes. The highest average F-scores after post-processing are 0.902, 0.704 and 0.891 for the Tucson, Phoenix and online VEDAI datasets, respectively. The combined results prove that our enhanced three-stage post-processing scheme achieves a mean average precision (mAP) of 63.9% for feature extraction methods and 82.8% for the machine learning approach.

4 citations

References
More filters
Journal ArticleDOI

37,017 citations


"Spatio-Temporal Processing for Auto..." refers methods in this paper

  • ...Simply using normalized grayscale thresholds or applying a widely used scheme such as Otsu thresholding [22] for pixel-based classification may result in an unacceptable error rate [13]–[15]....

    [...]

Proceedings ArticleDOI
20 Jun 2009
TL;DR: This paper introduces a method for salient region detection that outputs full resolution saliency maps with well-defined boundaries of salient objects that outperforms the five algorithms both on the ground-truth evaluation and on the segmentation task by achieving both higher precision and better recall.
Abstract: Detection of visually salient image regions is useful for applications like object segmentation, adaptive compression, and object recognition. In this paper, we introduce a method for salient region detection that outputs full resolution saliency maps with well-defined boundaries of salient objects. These boundaries are preserved by retaining substantially more frequency content from the original image than other existing techniques. Our method exploits features of color and luminance, is simple to implement, and is computationally efficient. We compare our algorithm to five state-of-the-art salient region detection methods with a frequency domain analysis, ground truth, and a salient object segmentation application. Our method outperforms the five algorithms both on the ground-truth evaluation and on the segmentation task by achieving both higher precision and better recall.

3,723 citations

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A simple method for the visual saliency detection is presented, independent of features, categories, or other forms of prior knowledge of the objects, and a fast method to construct the corresponding saliency map in spatial domain is proposed.
Abstract: The ability of human visual system to detect visual saliency is extraordinarily fast and reliable. However, computational modeling of this basic intelligent behavior still remains a challenge. This paper presents a simple method for the visual saliency detection. Our model is independent of features, categories, or other forms of prior knowledge of the objects. By analyzing the log-spectrum of an input image, we extract the spectral residual of an image in spectral domain, and propose a fast method to construct the corresponding saliency map in spatial domain. We test this model on both natural pictures and artificial images such as psychological patterns. The result indicate fast and robust saliency detection of our method.

3,464 citations


"Spatio-Temporal Processing for Auto..." refers background or methods in this paper

  • ...Therefore, our main goal was to investigate how the proposed spatio-temporal processing scheme improves the performance of several detection and segmentation models [7]–[12], [26], [31]–[33] adapted for detecting aerial vehicles....

    [...]

  • ...We adapted seven saliency enhancement algorithms by removing the binarization step from previously published detection and segmentation algorithms, which we refer to as the spectral residual (SR) approach [7], frequency tuning (FT) [8], maximum symmetric surround saliency (MSSS)...

    [...]

  • ...We have studied a variety of detection and segmentation algorithms in accordance with their related performance evaluation metrics [7]–[18]....

    [...]

  • ...In contrast to our prior work, we removed the binarization step in nine automatic algorithms for detection and segmentation [7]–[12], [26], [31]–[33] thereby obtaining adapted algorithms for saliency enhancement....

    [...]

Journal ArticleDOI
TL;DR: It is found that the models designed specifically for salient object detection generally work better than models in closely related areas, which provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems.
Abstract: We extensively compare, qualitatively and quantitatively, 41 state-of-the-art models (29 salient object detection, 10 fixation prediction, 1 objectness, and 1 baseline) over seven challenging data sets for the purpose of benchmarking salient object detection and segmentation methods. From the results obtained so far, our evaluation shows a consistent rapid progress over the last few years in terms of both accuracy and running time. The top contenders in this benchmark significantly outperform the models identified as the best in the previous benchmark conducted three years ago. We find that the models designed specifically for salient object detection generally work better than models in closely related areas, which in turn provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems. In particular, we analyze the influences of center bias and scene complexity in model performance, which, along with the hard cases for the state-of-the-art models, provide useful hints toward constructing more challenging large-scale data sets and better saliency models. Finally, we propose probable solutions for tackling several open problems, such as evaluation scores and data set bias, which also suggest future research directions in the rapidly growing field of salient object detection.

1,372 citations


"Spatio-Temporal Processing for Auto..." refers background or methods in this paper

  • ...Simply using normalized grayscale thresholds or applying a widely used scheme such as Otsu thresholding [22] for pixel-based classification may result in an unacceptable error rate [13]–[15]....

    [...]

  • ...iii) We used sensitivity analysis to evaluate the performance of two detection algorithms using basic information retrieval metrics [13]–[15], [30] under various overlap thresholds for classifying detections, and measured how the size of structuring elements affects the average F-score on four algorithms for spatial processing....

    [...]

  • ...Basic information retrieval (IR) metrics [13]–[15], [30] include precision, recall, F-score, and percentage of wrong...

    [...]

Proceedings ArticleDOI
03 Dec 2010
TL;DR: This paper introduces a method for salient region detection that retains the advantages of such saliency maps while overcoming their shortcomings, and compares it to six state-of-the-art salient region Detection methods using publicly available ground truth.
Abstract: Detection of visually salient image regions is useful for applications like object segmentation, adaptive compression, and object recognition. Recently, full-resolution salient maps that retain well-defined boundaries have attracted attention. In these maps, boundaries are preserved by retaining substantially more frequency content from the original image than older techniques. However, if the salient regions comprise more than half the pixels of the image, or if the background is complex, the background gets highlighted instead of the salient object. In this paper, we introduce a method for salient region detection that retains the advantages of such saliency maps while overcoming their shortcomings. Our method exploits features of color and luminance, is simple to implement and is computationally efficient. We compare our algorithm to six state-of-the-art salient region detection methods using publicly available ground truth. Our method outperforms the six algorithms by achieving both higher precision and better recall. We also show application of our saliency maps in an automatic salient object segmentation scheme using graph-cuts.

386 citations


"Spatio-Temporal Processing for Auto..." refers background in this paper

  • ...[9], Laplacian pyramid transform (LPT) [10], morphological filtering (MF) [11], variational minimax optimization (VMO) [26], [33] and contextual information saliency (CIS) detection [32]....

    [...]