A Study on Application Scenario of Video Summarization

doi:10.1109/ICECA.2018.8474699

Home
/
Papers
/
A Study on Application Scenario of Video Summarization

Proceedings Article•DOI•

A Study on Application Scenario of Video Summarization

Vinay Rajpoot¹, Sheetal Girase¹•Institutions (1)

Maharashtra Institute of Technology¹

29 Mar 2018-pp 936-943

TL;DR: Different applications of video summarization and methods for generating the summary of a video are categorized and some of the techniques which fit in a particular application are presented.

read less

Abstract: We are living in the era where the video contents are largely available on the internet and hard disk drives. Since the duration of videos ranges from few minutes to many hours, the progression of a system for decisive content classification, browsing, indexing, retrieval and storage become essential. Also, with the advancement of the digital video technologies, huge chunks of video data are producing every day. For processing these data, video summarization plays an important role, which is a process of generating a synopsis of the video. In this paper, we present application scenarios of video summarization and some of the techniques which fit in a particular application. With fast evolution of video technology, nowadays many new multimedia applications are available. We categorize different applications of video summarization and methods for generating the summary of a video. In this paper, we introduce different types of videos and need of summarization for each type of video.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

An analytical study of information extraction from unstructured and multidimensional big data

[...]

Kiran Adnan¹, Rehan Akbar¹•Institutions (1)

Universiti Tunku Abdul Rahman¹

01 Dec 2019-Journal of Big Data

TL;DR: This research work addresses the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data and presents a systematic literature review of state-of-the-art techniques for a variety of big data.

...read moreread less

Abstract: Process of information extraction (IE) is used to extract useful information from unstructured or semi-structured data. Big data arise new challenges for IE techniques with the rapid growth of multifaceted also called as multidimensional unstructured data. Traditional IE systems are inefficient to deal with this huge deluge of unstructured big data. The volume and variety of big data demand to improve the computational capabilities of these IE systems. It is necessary to understand the competency and limitations of the existing IE techniques related to data pre-processing, data extraction and transformation, and representations for huge volumes of multidimensional unstructured data. Numerous studies have been conducted on IE, addressing the challenges and issues for different data types such as text, image, audio and video. Very limited consolidated research work have been conducted to investigate the task-dependent and task-independent limitations of IE covering all data types in a single study. This research work address this limitation and present a systematic literature review of state-of-the-art techniques for a variety of big data, consolidating all data types. Recent challenges of IE are also identified and summarized. Potential solutions are proposed giving future research directions in big data IE. The research is significant in terms of recent trends and challenges related to big data analytics. The outcome of the research and recommendations will help to improve the big data analytics by making it more productive.

...read moreread less

102 citations

Proceedings Article•DOI•

An Appraisal of the Approaches Opted to Evaluate Video Summarization Techniques

[...]

Mudit Saxena¹, M. Gangadharappa²•Institutions (2)

ABES Engineering College¹, Ambedkar Institute of Advanced Communication Technologies and Research²

17 Mar 2021

TL;DR: A comprehensive review of the background work of video summarization and techniques employed by various researchers to summarize the videos can be found in this paper, where the authors provide a brief glimpse on the various applications of the existing summarization techniques.

...read moreread less

Abstract: Video summarization is an effective tool which is instrumental in evaluating the contents of the video. Video summaries help in searching, browsing and retrieving heavy video files. This paper aims to provide a comprehensive review about the background work of video summarization and techniques employed by various researchers to summarize the videos. In addition to this, the paper provides a brief glimpse on the various applications of the existing video summarization techniques. The paper concludes with sharing the challenges faced by the existing video summarization techniques and the future scope expected in the video summarization domains.

...read moreread less

References

PDF

Open Access

More filters

Large-scale Video Classiﬁcation with Convolutional Neural Networks

[...]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei - Show less +2 more

01 Jan 2014

5,117 citations

Proceedings Article•DOI•

Large-Scale Video Classification with Convolutional Neural Networks

[...]

Andrej Karpathy¹, George Toderici¹, Sanketh Shetty¹, Thomas Leung¹, Rahul Sukthankar¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Stanford University¹

23 Jun 2014

TL;DR: This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

...read moreread less

4,876 citations

"A Study on Application Scenario of ..." refers background in this paper

...The convolutional neural networks (CNNs) are a type of deep model which consist of multiple layers for extracting the features from videos [29]....
[...]

Journal Article•DOI•

3D Convolutional Neural Networks for Human Action Recognition

[...]

Shuiwang Ji¹, Wei Xu², Ming Yang, Kai Yu³•Institutions (3)

Old Dominion University¹, Facebook², Baidu³

01 Jan 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Wang et al. as mentioned in this paper developed a novel 3D CNN model for action recognition, which extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.

...read moreread less

Abstract: We consider the automated recognition of human actions in surveillance videos. Most current methods build classifiers based on complex handcrafted features computed from the raw inputs. Convolutional neural networks (CNNs) are a type of deep model that can act directly on the raw inputs. However, such models are currently limited to handling 2D inputs. In this paper, we develop a novel 3D CNN model for action recognition. This model extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. The developed model generates multiple channels of information from the input frames, and the final feature representation combines information from all channels. To further boost the performance, we propose regularizing the outputs with high-level features and combining the predictions of a variety of different models. We apply the developed models to recognize human actions in the real-world environment of airport surveillance videos, and they achieve superior performance in comparison to baseline methods.

...read moreread less

4,545 citations

Proceedings Article•

3D Convolutional Neural Networks for Human Action Recognition

[...]

Shuiwang Ji¹, Wei Xu², Ming Yang, Kai Yu³•Institutions (3)

Arizona State University¹, Facebook², Baidu³

21 Jun 2010

TL;DR: A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.

...read moreread less

Abstract: We consider the fully automated recognition of actions in uncontrolled environment. Most existing work relies on domain knowledge to construct complex handcrafted features from inputs. In addition, the environments are usually assumed to be controlled. Convolutional neural networks (CNNs) are a type of deep models that can act directly on the raw inputs, thus automating the process of feature construction. However, such models are currently limited to handle 2D inputs. In this paper, we develop a novel 3D CNN model for action recognition. This model extracts features from both spatial and temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. The developed model generates multiple channels of information from the input frames, and the final feature representation is obtained by combining information from all channels. We apply the developed model to recognize human actions in real-world environment, and it achieves superior performance without relying on handcrafted features.

...read moreread less

4,087 citations

"A Study on Application Scenario of ..." refers background in this paper

...For dealing with motion information, 3D CNN architecture is proposed in [40], [24]....
[...]

Journal Article•DOI•

Automatic soccer video analysis and summarization

[...]

Ahmet Ekin¹, A.M. Tekalp¹, R. Mehrotra²•Institutions (2)

University of Rochester¹, Eastman Kodak Company²

01 Jul 2003-IEEE Transactions on Image Processing

TL;DR: The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection,and penalty-box detection.

...read moreread less

Abstract: We propose a fully automatic and computationally efficient framework for analysis and summarization of soccer videos using cinematic and object-based features. The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection, and penalty-box detection. The system can output three types of summaries: i) all slow-motion segments in a game; ii) all goals in a game; iii) slow-motion segments classified according to object-based features. The first two types of summaries are based on cinematic features only for speedy processing, while the summaries of the last type contain higher-level semantics. The proposed framework is efficient, effective, and robust. It is efficient in the sense that there is no need to compute object-based features when cinematic features are sufficient for the detection of certain events, e.g., goals in soccer. It is effective in the sense that the framework can also employ object-based features when needed to increase accuracy (at the expense of more computation). The efficiency, effectiveness, and robustness of the proposed framework are demonstrated over a large data set, consisting of more than 13 hours of soccer video, captured in different countries and under different conditions.

...read moreread less

943 citations