scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A study on various methods used for video summarization and moving object detection for video surveillance applications

01 Sep 2018-Multimedia Tools and Applications (Springer US)-Vol. 77, Iss: 18, pp 23273-23290
TL;DR: This paper provides the various methods used for video summarization and a comparative study of different techniques and presents different object detection, object classification and object tracking algorithms available in the literature.
Abstract: With the advancement in digital video technology, video surveillance has been playing its vital role for ensuring safety and security. The surveillance systems are deployed in wide range of applications to invigilate stuffs and to analyse the activities in the environment. From the single or multi surveillance camera, a huge amount of data is generated, stored and processed for security purpose. Due to time constraints, it is a very tedious process for an analyst to go through the full content. This limitation has been overcome by the use of video summarization. The video summarization is intended to afford comprehensible analysis of video by removing duplications and extracting key frames from the video. To make an easily interpreted outline, the various available video summarization methods will try to shot the summary of the main occurrences, scenes, or objects in a frame. Depending on the applications, it is required to summarize the happenings in the scene and detect the objects (static/dynamic) which is recorded in the video. Hence this paper provides the various methods used for video summarization and a comparative study of different techniques. It also presents different object detection, object classification and object tracking algorithms available in the literature.
Citations
More filters
Journal ArticleDOI
TL;DR: This article focuses on the correlation filter-based object tracking algorithms, and all kinds of methods are summarized to present tracking results in various vision problems, and a visual tracking method based on reliability is observed.
Abstract: An important area of computer vision is real-time object tracking, which is now widely used in intelligent transportation and smart industry technologies. Although the correlation filter object tracking methods have a good real-time tracking effect, it still faces many challenges such as scale variation, occlusion, and boundary effects. Many scholars have continuously improved existing methods for better efficiency and tracking performance in some aspects. To provide a comprehensive understanding of the background, key technologies and algorithms of single object tracking, this article focuses on the correlation filter-based object tracking algorithms. Specifically, the background and current advancement of the object tracking methodologies, as well as the presentation of the main datasets are introduced. All kinds of methods are summarized to present tracking results in various vision problems, and a visual tracking method based on reliability is observed.

193 citations

Journal ArticleDOI
TL;DR: This article achieves MVS by integrating deep neural network based soft computing techniques in a two-tier framework that extracts deep features from each frame of a sequence in the lookup table and passes them to deep bidirectional long short-term memory (DB-LSTM) to acquire probabilities of informativeness and generates a summary.
Abstract: The massive amount of video data produced by surveillance networks in industries instigate various challenges in exploring these videos for many applications, such as video summarization (VS), analysis, indexing, and retrieval. The task of multiview video summarization (MVS) is very challenging due to the gigantic size of data, redundancy, overlapping in views, light variations, and interview correlations. To address these challenges, various low-level features and clustering-based soft computing techniques are proposed that cannot fully exploit MVS. In this article, we achieve MVS by integrating deep neural network based soft computing techniques in a two-tier framework. The first online tier performs target-appearance-based shots segmentation and stores them in a lookup table that is transmitted to cloud for further processing. The second tier extracts deep features from each frame of a sequence in the lookup table and pass them to deep bidirectional long short-term memory (DB-LSTM) to acquire probabilities of informativeness and generates a summary. Experimental evaluation on benchmark dataset and industrial surveillance data from YouTube confirms the better performance of our system compared to the state-of-the-art MVS methods.

106 citations


Cites background from "A study on various methods used for..."

  • ...applications [13]–[15] including indoor and outdoor CCTV automatic monitoring [16] for activities and events detection...

    [...]

Journal ArticleDOI
TL;DR: This paper provides a comprehensive and systematic review on the literature from various video surveillance system studies published from 2010 through 2019 to illustrate the research trends, datasets, methods, and frameworks used in the field of video surveillance.
Abstract: Video surveillance systems obtain a great interest as application-oriented studies that have been growing rapidly in the past decade. The most recent studies attempt to integrate computer vision, image processing, and artificial intelligence capabilities into video surveillance applications. Although there are so many achievements in the acquisition of datasets, methods, and frameworks published, there are not many papers that can provide a comprehensive picture of the current state of video surveillance system research. This paper provides a comprehensive and systematic review on the literature from various video surveillance system studies published from 2010 through 2019. Within a selected study extraction process, 220 journal-based publications were identified and analyzed to illustrate the research trends, datasets, methods, and frameworks used in the field of video surveillance, to provide an in-depth explanation about research trends that many topics raised by researchers as a focus in their researches, to provide references on public datasets that are often used by researchers as a comparison and a means of developing a test method, and to give accounts on the improvement and integration of network infrastructure design to meet the demand for multimedia data. In the end of this paper, several opportunities and challenges related to researches in the video surveillance system are mentioned.

49 citations


Cites methods from "A study on various methods used for..."

  • ...lance field dealing with implementation of algorithms to perform traditional techniques of image processing such as motion detection [24], [26], [57], [58], object detection [17], [25], [31], [57], [59]– [71], [99], object or event classification [16], [20], [34], [35], [54], [66], [144], [145],...

    [...]

Journal ArticleDOI
TL;DR: A framework for personalized visualization of synopsis video is propounded, integrating pertinent object attributes such as color, type, size, speed, travel path and direction towards generation of synopsisVideo for precise inference of user needs.
Abstract: Video synopsis is an effective technique for the efficient analysis of long videos in a short time. To generate a compact video, multiple tracks of moving objects, which we call as tubes are displayed simultaneously by rearranging them along the time axis. Contemporaneous video synopsis approaches focus on collision avoidance, or preservation of chronological order among tubes. However, generation of an adaptive personalized user-oriented synopsis video congruent to users’ preferences has yet not been thoroughly experimented. This paper propounds a framework for personalized visualization of synopsis video, integrating pertinent object attributes such as color, type, size, speed, travel path and direction towards generation of synopsis video for precise inference of user needs. The framework motivates users to interactively define queries for creation of the targeted synopsis. User queries are classified into visual-queries, temporal-queries, spatial-queries, and spatio-temporal queries concomitant with the visual and spatio-temporal attributes. Tubes relevant to a user-query are selected, and grouped according to original behavioral interactions followed by their rearrangement, to generate synopsis video with fewer false collisions. To evaluate the proffered technique, two evaluation metrics are proposed and extensive experiments of publicly available surveillance videos are conducted. The experimental results demonstrate the propriety and usability of the newer approach.

11 citations

Journal ArticleDOI
01 May 2020-Optik
TL;DR: A new method for moving object detection using background subtraction technique with a new color feature descriptor that tries to extract the color feature of a pixel by looking at its neighbors and is able to accurately detect and extract moving objects.

8 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: This work presents a new benchmark dataset and evaluation methodology for the area of video object segmentation, named DAVIS (Densely Annotated VIdeo Segmentation), and provides a comprehensive analysis of several state-of-the-art segmentation approaches using three complementary metrics.
Abstract: Over the years, datasets and benchmarks have proven their fundamental importance in computer vision research, enabling targeted progress and objective comparisons in many fields. At the same time, legacy datasets may impend the evolution of a field due to saturated algorithm performance and the lack of contemporary, high quality data. In this work we present a new benchmark dataset and evaluation methodology for the area of video object segmentation. The dataset, named DAVIS (Densely Annotated VIdeo Segmentation), consists of fifty high quality, Full HD video sequences, spanning multiple occurrences of common video object segmentation challenges such as occlusions, motionblur and appearance changes. Each video is accompanied by densely annotated, pixel-accurate and per-frame ground truth segmentation. In addition, we provide a comprehensive analysis of several state-of-the-art segmentation approaches using three complementary metrics that measure the spatial extent of the segmentation, the accuracy of the silhouette contours and the temporal coherence. The results uncover strengths and weaknesses of current approaches, opening up promising directions for future works.

1,656 citations


"A study on various methods used for..." refers background or methods in this paper

  • ...Region similarity J The Jaccard index J defined as the intersection over- union of the predictable segmentation and the ground truth mask [34]....

    [...]

  • ...Some of segmentation quality evaluation methods and quantitative parameters that addresses the temporal aspect of video sequences is suggested below [34]....

    [...]

  • ...In order to be robust to small inaccuracies, the contour-based precision Pc and recall Rc between the contour points of c(M) and c(G) can be computed, via a bipartite graph matching [34]....

    [...]

Journal ArticleDOI
TL;DR: This paper demonstrates that motion will be exploited most effectively, if it is regarded over larger time windows, and suggests working with a paradigm that starts with semi-dense motion cues first and that fills up textureless areas afterwards based on color.
Abstract: Motion is a strong cue for unsupervised object-level grouping. In this paper, we demonstrate that motion will be exploited most effectively, if it is regarded over larger time windows. Opposed to classical two-frame optical flow, point trajectories that span hundreds of frames are less susceptible to short-term variations that hinder separating different objects. As a positive side effect, the resulting groupings are temporally consistent over a whole video shot, a property that requires tedious post-processing in the vast majority of existing approaches. We suggest working with a paradigm that starts with semi-dense motion cues first and that fills up textureless areas afterwards based on color. This paper also contributes the Freiburg-Berkeley motion segmentation (FBMS) dataset, a large, heterogeneous benchmark with 59 sequences and pixel-accurate ground truth annotation of moving objects.

581 citations


"A study on various methods used for..." refers background in this paper

  • ...[35] proposed a dense point tracker with the variational optical flow....

    [...]

Posted Content
TL;DR: Long Short-Term Memory (LSTM), a special type of recurrent neural networks are used to model the variable-range dependencies entailed in the task of video summarization to improve summarization by reducing the discrepancies in statistical properties across those datasets.
Abstract: We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the problem as a structured prediction problem on sequential data, our main idea is to use Long Short-Term Memory (LSTM), a special type of recurrent neural networks to model the variable-range dependencies entailed in the task of video summarization. Our learning models attain the state-of-the-art results on two benchmark video datasets. Detailed analysis justifies the design of the models. In particular, we show that it is crucial to take into consideration the sequential structures in videos and model them. Besides advances in modeling techniques, we introduce techniques to address the need of a large number of annotated data for training complex learning models. There, our main idea is to exploit the existence of auxiliary annotated video datasets, albeit heterogeneous in visual styles and contents. Specifically, we show domain adaptation techniques can improve summarization by reducing the discrepancies in statistical properties across those datasets.

441 citations

Book ChapterDOI
08 Oct 2016
TL;DR: In this paper, the task of video summarization is cast as a structured prediction problem, and LSTM is used to model the variable-range temporal dependency among video frames to derive both representative and compact video summaries.
Abstract: We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the task as a structured prediction problem, our main idea is to use Long Short-Term Memory (LSTM) to model the variable-range temporal dependency among video frames, so as to derive both representative and compact video summaries. The proposed model successfully accounts for the sequential structure crucial to generating meaningful video summaries, leading to state-of-the-art results on two benchmark datasets. In addition to advances in modeling techniques, we introduce a strategy to address the need for a large amount of annotated data for training complex learning approaches to summarization. There, our main idea is to exploit auxiliary annotated video summarization datasets, in spite of their heterogeneity in visual styles and contents. Specifically, we show that domain adaptation techniques can improve learning by reducing the discrepancies in the original datasets’ statistical properties.

411 citations

Journal ArticleDOI
TL;DR: This paper analyzes the ME structure in HEVC and proposes a parallel framework to decouple ME for different partitions on many-core processors and achieves more than 30 and 40 times speedup for 1920 × 1080 and 2560 × 1600 video sequences, respectively.
Abstract: High Efficiency Video Coding (HEVC) provides superior coding efficiency than previous video coding standards at the cost of increasing encoding complexity. The complexity increase of motion estimation (ME) procedure is rather significant, especially when considering the complicated partitioning structure of HEVC. To fully exploit the coding efficiency brought by HEVC requires a huge amount of computations. In this paper, we analyze the ME structure in HEVC and propose a parallel framework to decouple ME for different partitions on many-core processors. Based on local parallel method (LPM), we first use the directed acyclic graph (DAG)-based order to parallelize coding tree units (CTUs) and adopt improved LPM (ILPM) within each CTU (DAGILPM), which exploits the CTU-level and prediction unit (PU)-level parallelism. Then, we find that there exist completely independent PUs (CIPUs) and partially independent PUs (PIPUs). When the degree of parallelism (DP) is smaller than the maximum DP of DAGILPM, we process the CIPUs and PIPUs, which further increases the DP. The data dependencies and coding efficiency stay the same as LPM. Experiments show that on a 64-core system, compared with serial execution, our proposed scheme achieves more than 30 and 40 times speedup for 1920 × 1080 and 2560 × 1600 video sequences, respectively.

366 citations


"A study on various methods used for..." refers background in this paper

  • ...[8, 9] proposed a one-stage Supervised Deep Hashing framework (SDHP) to learn high-quality binary codes for intelligent transport system....

    [...]