VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method

doi:10.1016/J.PATREC.2010.08.004

Home
/
Papers
/
VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method

Journal Article•DOI•

VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method

Sandra Avila¹, Ana Paula B. Lopes¹, Antonio da Luz¹, Arnaldo de Albuquerque Araújo¹•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Jan 2011-Pattern Recognition Letters (North-Holland)-Vol. 32, Iss: 1, pp 56-68

TL;DR: VSUMM is presented, a methodology for the production of static video summaries that is based on color feature extraction from video frames and k-means clustering algorithm and develops a novel approach for the evaluation of video static summaries.

read less

About: This article is published in Pattern Recognition Letters.The article was published on 2011-01-01. It has received 627 citations till now. The article focuses on the topics: Video tracking & Video compression picture types.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Creating Summaries from User Videos

[...]

Michael Gygli¹, Helmut Grabner¹, Hayko Riemenschneider¹, Luc Van Gool¹, Luc Van Gool² - Show less +1 more•Institutions (2)

ETH Zurich¹, Katholieke Universiteit Leuven²

06 Sep 2014

TL;DR: This paper proposes a novel approach and a new benchmark for video summarization, which focuses on user videos, which are raw videos containing a set of interesting events, and generates high-quality results, comparable to manual, human-created summaries.

...read moreread less

Abstract: This paper proposes a novel approach and a new benchmark for video summarization. Thereby we focus on user videos, which are raw videos containing a set of interesting events. Our method starts by segmenting the video by using a novel “superframe” segmentation, tailored to raw videos. Then, we estimate visual interestingness per superframe using a set of low-, mid- and high-level features. Based on this scoring, we select an optimal subset of superframes to create an informative and interesting summary. The introduced benchmark comes with multiple human created summaries, which were acquired in a controlled psychological experiment. This data paves the way to evaluate summarization methods objectively and to get new insights in video summarization. When evaluating our method, we find that it generates high-quality results, comparable to manual, human-created summaries.

...read moreread less

592 citations

Cites background or methods from "VSUMM: A mechanism designed to prod..."

...human comparison has already been used successfully for keyframes [1,15]....
[...]
...Keyframes are typically extracted using change detection [5] or clustering based on low-level features [1] or objects [18]....
[...]
...One way of coping with the search challenge is visual indexing, where keyframes are selected such that they best summarize the video [28,5,1,13,18,15,16]....
[...]

Proceedings Article•

Diverse Sequential Subset Selection for Supervised Video Summarization

[...]

Boqing Gong¹, Wei-Lun Chao¹, Kristen Grauman², Fei Sha¹•Institutions (2)

University of Southern California¹, University of Texas at Austin²

08 Dec 2014

TL;DR: This work proposes the sequential determinantal point process (seqDPP), a probabilistic model for diverse sequential subset selection, which heeds the inherent sequential structures in video data, thus overcoming the deficiency of the standard DPP.

...read moreread less

Abstract: Video summarization is a challenging problem with great application potential. Whereas prior approaches, largely unsupervised in nature, focus on sampling useful frames and assembling them as summaries, we consider video summarization as a supervised subset selection problem. Our idea is to teach the system to learn from human-created summaries how to select informative and diverse subsets, so as to best meet evaluation metrics derived from human-perceived quality. To this end, we propose the sequential determinantal point process (seqDPP), a probabilistic model for diverse sequential subset selection. Our novel seqDPP heeds the inherent sequential structures in video data, thus overcoming the deficiency of the standard DPP, which treats video frames as randomly permutable items. Meanwhile, seqDPP retains the power of modeling diverse subsets, essential for summarization. Our extensive results of summarizing videos from 3 datasets demonstrate the superior performance of our method, compared to not only existing unsupervised methods but also naive applications of the standard DPP model.

...read moreread less

463 citations

Proceedings Article•DOI•

Unsupervised Video Summarization with Adversarial LSTM Networks

[...]

Behrooz Mahasseni¹, Michael Lam¹, Sinisa Todorovic¹•Institutions (1)

Oregon State University¹

01 Jul 2017

TL;DR: This paper addresses the problem of unsupervised video summarization, formulated as selecting a sparse subset of video frames that optimally represent the input video, with a novel generative adversarial framework.

...read moreread less

Abstract: This paper addresses the problem of unsupervised video summarization, formulated as selecting a sparse subset of video frames that optimally represent the input video. Our key idea is to learn a deep summarizer network to minimize distance between training videos and a distribution of their summarizations, in an unsupervised way. Such a summarizer can then be applied on a new video for estimating its optimal summarization. For learning, we specify a novel generative adversarial framework, consisting of the summarizer and discriminator. The summarizer is the autoencoder long short-term memory network (LSTM) aimed at, first, selecting video frames, and then decoding the obtained summarization for reconstructing the input video. The discriminator is another LSTM aimed at distinguishing between the original video and its reconstruction from the summarizer. The summarizer LSTM is cast as an adversary of the discriminator, i.e., trained so as to maximally confuse the discriminator. This learning is also regularized for sparsity. Evaluation on four benchmark datasets, consisting of videos showing diverse events in first-and third-person views, demonstrates our competitive performance in comparison to fully supervised state-of-the-art approaches.

...read moreread less

456 citations

Posted Content•

Video Summarization with Long Short-term Memory

[...]

Ke Zhang¹, Wei-Lun Chao¹, Fei Sha², Kristen Grauman³•Institutions (3)

University of Southern California¹, University of California², University of Texas at Austin³

26 May 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Long Short-Term Memory (LSTM), a special type of recurrent neural networks are used to model the variable-range dependencies entailed in the task of video summarization to improve summarization by reducing the discrepancies in statistical properties across those datasets.

...read moreread less

Abstract: We propose a novel supervised learning technique for summarizing videos by automatically selecting keyframes or key subshots. Casting the problem as a structured prediction problem on sequential data, our main idea is to use Long Short-Term Memory (LSTM), a special type of recurrent neural networks to model the variable-range dependencies entailed in the task of video summarization. Our learning models attain the state-of-the-art results on two benchmark video datasets. Detailed analysis justifies the design of the models. In particular, we show that it is crucial to take into consideration the sequential structures in videos and model them. Besides advances in modeling techniques, we introduce techniques to address the need of a large number of annotated data for training complex learning models. There, our main idea is to exploit the existence of auxiliary annotated video datasets, albeit heterogeneous in visual styles and contents. Specifically, we show domain adaptation techniques can improve summarization by reducing the discrepancies in statistical properties across those datasets.

...read moreread less

441 citations

Book Chapter•DOI•

Category-Specific Video Summarization

[...]

Danila Potapov¹, Matthijs Douze¹, Zaid Harchaoui¹, Cordelia Schmid¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

06 Sep 2014

TL;DR: In large video collections with clusters of typical categories, such as “birthday party” or “flash-mob”, category-specific video summarization can produce higher quality video summaries than unsupervised approaches that are blind to the video category.

...read moreread less

Abstract: In large video collections with clusters of typical categories, such as “birthday party” or “flash-mob”, category-specific video summarization can produce higher quality video summaries than unsupervised approaches that are blind to the video category.

...read moreread less

430 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A new look at the statistical model identification

[...]

Hirotugu Akaike

01 Dec 1974-IEEE Transactions on Automatic Control

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.

...read moreread less

Abstract: The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as the procedure for statistical model identification. The classical maximum likelihood estimation procedure is reviewed and a new estimate minimum information theoretical criterion (AIC) estimate (MAICE) which is designed for the purpose of statistical identification is introduced. When there are several competing models the MAICE is defined by the model and the maximum likelihood estimates of the parameters which give the minimum of AIC defined by AIC = (-2)log-(maximum likelihood) + 2(number of independently adjusted parameters within the model). MAICE provides a versatile procedure for statistical model identification which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure. The practical utility of MAICE in time series analysis is demonstrated with some numerical examples.

...read moreread less

47,133 citations

"VSUMM: A mechanism designed to prod..." refers methods in this paper

...Furthermore, techniques to estimate the number of clusters can be exploited, for example, Akaike’s Information Criterion (AIC) (Akaike, 1974) or Minimum Description Length (MDL) (Rissanen, 1978)....
[...]

Some methods for classification and analysis of multivariate observations

[...]

James B. MacQueen

01 Jan 1967

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

...read moreread less

24,320 citations

"VSUMM: A mechanism designed to prod..." refers methods in this paper

...The k-means clustering algorithm (MacQueen, 1967) is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem (Duda et al., 2001)....
[...]

Book•

Pattern Classification

[...]

Peter E. Hart, Richard O. Duda, David G. Stork

01 Jan 1973

20,541 citations

Proceedings Article•

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

[...]

Martin Ester¹, Hans-Peter Kriegel¹, Jörg Sander¹, Xiaowei Xu¹•Institutions (1)

Ludwig Maximilian University of Munich¹

02 Aug 1996

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLAR-ANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

...read moreread less

17,056 citations

Proceedings Article•

A density-based algorithm for discovering clusters in large spatial Databases with Noise

[...]

Martin Ester¹, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Jan 1996

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

Abstract: Clustering algorithms are attractive for the task of class identification in spatial databases. However, the application to large spatial databases rises the following requirements for clustering algorithms: minimal requirements of domain knowledge to determine the input parameters, discovery of clusters with arbitrary shape and good efficiency on large databases. The well-known clustering algorithms offer no solution to the combination of these requirements. In this paper, we present the new clustering algorithm DBSCAN relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. We performed an experimental evaluation of the effectiveness and efficiency of DBSCAN using synthetic data and real data of the SEQUOIA 2000 benchmark. The results of our experiments demonstrate that (1) DBSCAN is significantly more effective in discovering clusters of arbitrary shape than the well-known algorithm CLARANS, and that (2) DBSCAN outperforms CLARANS by a factor of more than 100 in terms of efficiency.

...read moreread less

14,297 citations