Unsupervised video summarization framework using keyframe extraction and video skimming
Shruti Jadon,Mahmood Jasim +1 more
TLDR
This paper attempts to solve video summarization through unsupervised learning by employing traditional vision-based algorithmic methodologies for accurate feature extraction from video frames and proposes a deep learning-based feature extraction followed by multiple clustering methods to find an effective way of summarizing a video by interesting key-frame extraction.Abstract:
Video is one of the robust sources of information and the consumption of online and offline videos has reached an unprecedented level in the last few years. A fundamental challenge of extracting information from videos is a viewer has to go through the complete video to understand the context, as opposed to an image where the viewer can extract information from a single frame. Apart from context understanding, it almost impossible to create a universal summarized video for everyone, as everyone has their own bias of keyframe, e.g; In a soccer game, a coach person might consider those frames which consist of information on player placement, techniques, etc; however, a person with less knowledge about a soccer game, will focus more on frames which consist of goals and score-board. Therefore, if we were to tackle problem video summarization through a supervised learning path, it will require extensive personalized labeling of data. In this paper, we attempt to solve video summarization through unsupervised learning by employing traditional vision-based algorithmic methodologies for accurate feature extraction from video frames. We have also proposed a deep learning-based feature extraction followed by multiple clustering methods to find an effective way of summarizing a video by interesting key-frame extraction. We have compared the performance of these approaches on the SumMe dataset and showcased that using deep learning-based feature extraction has been proven to perform better in case of dynamic viewpoint videos.read more
Citations
More filters
Book ChapterDOI
Improving Siamese Networks for One-Shot Learning Using Kernel-Based Activation Functions
TL;DR: This paper presents a method to improve on their accuracy using Kafnets (kernel-based non-parametric activation functions for neural networks) by learning proper embeddings with relatively less number of epochs and achieves strong results which exceed those of ReLU based deep learning models.
Proceedings ArticleDOI
A Multimodal Corpus for Emotion Recognition in Sarcasm
TL;DR: Exhaustive experimentation with multimodal (text, audio, and video) fusion models establishes a benchmark for exact emotion recognition in sarcasm and outperforms the state-of-art sarcasm detection.
Journal Article
Video Summarization Techniques: A Review
TL;DR: This paper presents a review and comparative analysis of video summarization techniques and discussion is made related to the domain directions, applications, pros/cons, and challenges for existingVideo summarization approaches.
Book ChapterDOI
Deep Learning Framework Based on Audio–Visual Features for Video Summarization
TL;DR: In this paper , the structural similarity index is used to check similarity between the frames, while mel-frequency cepstral coefficient (MFCC) helps in extracting features from the corresponding audio signals.
References
More filters
Proceedings ArticleDOI
Histograms of oriented gradients for human detection
Navneet Dalal,Bill Triggs +1 more
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Journal ArticleDOI
Face Description with Local Binary Patterns: Application to Face Recognition
TL;DR: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features that is assessed in the face recognition problem under different challenges.
Journal ArticleDOI
Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods
TL;DR: In this paper, the authors compare the role of smoothing/regularization processes that are required in local and global differential methods for optic flow computation, and propose a simple confidence measure that minimizes energy functionals.
Journal ArticleDOI
Image enhancement based on equal area dualistic sub-image histogram equalization method
Yu Wang,Qian Chen,Baeomin Zhang +2 more
TL;DR: The simulation results indicate that the algorithm can not only enhance the image information effectively but also preserve the original image luminance well enough to make it possible to be used in a video system directly.