scispace - formally typeset
Open AccessJournal ArticleDOI

Shot-boundary detection: unraveled and resolved?

Reads0
Chats0
TLDR
A conceptual solution to the shot-boundary detection problem is presented in the form of a statistical detector that is based on minimization of the average detection-error probability and the performance of the detector is demonstrated regarding two most widely used types of shot boundaries: hard cuts and dissolves.
Abstract
Partitioning a video sequence into shots is the first step toward video-content analysis and content-based video browsing and retrieval. A video shot is defined as a series of interrelated consecutive frames taken contiguously by a single camera and representing a continuous action in time and space. As such, shots are considered to be the primitives for higher level content analysis, indexing, and classification. The objective of this paper is twofold. First, we analyze the shot-boundary detection problem in detail and identify major issues that need to be considered in order to solve this problem successfully. Then, we present a conceptual solution to the shot-boundary detection problem in which all issues identified in the previous step are considered. This solution is provided in the form of a statistical detector that is based on minimization of the average detection-error probability. We model the required statistical functions using a robust metric for visual content discontinuities (based on motion compensation) and take into account all (a priori) knowledge that we found relevant to shot-boundary detection. This knowledge includes the shot-length distribution, visual discontinuity patterns at shot boundaries, and characteristic temporal changes of visual features around a boundary. Major advantages of the proposed detector are its robust and sequence-independent performance, while there is also the possibility to detect different types of shot boundaries simultaneously. We demonstrate the performance of our detector regarding two most widely used types of shot boundaries: hard cuts and dissolves.

read more

Content maybe subject to copyright    Report

90 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 2, FEBRUARY 2002
Shot-Boundary Detection: Unraveled and Resolved?
Alan Hanjalic, Member, IEEE
Abstract—Partitioning a video sequence into shots is the first
step toward video-content analysis and content-based video
browsing and retrieval. A video shot is defined as a series of inter-
related consecutive frames taken contiguously by a single camera
and representing a continuous action in time and space. As such,
shots are considered to be the primitives for higher level content
analysis, indexing, and classification. The objective of this paper
is twofold. First, we analyze the shot-boundary detection problem
in detail and identify major issues that need to be considered
in order to solve this problem successfully. Then, we present a
conceptual solution to the shot-boundary detection problem in
which all issues identified in the previous step are considered. This
solution is provided in the form of a statistical detector that is
based on minimization of the average detection-error probability.
We model the required statistical functions using a robust metric
for visual content discontinuities (based on motion compensation)
and take into account all (a priori) knowledge that we found
relevant to shot-boundary detection. This knowledge includes
the shot-length distribution, visual discontinuity patterns at shot
boundaries, and characteristic temporal changes of visual features
around a boundary. Major advantages of the proposed detector
are its robust and sequence-independent performance, while there
is also the possibility to detect different types of shot boundaries
simultaneously. We demonstrate the performance of our detector
regarding two most widely used types of shot boundaries: hard
cuts and dissolves.
Index Terms—Shot-boundary detection, video analysis, video
databases, video retrieval.
I. INTRODUCTION
T
HE DEVELOPMENT of shot-boundary detection algo-
rithms has the longest and richest history in the area of
content-based video analysis and retrieval—longest, because
this area was actually initiated some decade ago by the attempts
to detect hard cuts in a video, and richest, because a vast
majority of all works published in this area so far address in one
way or another the problem of shot-boundary detection. This
is not surprising, since detection of shot boundaries provides
a base for nearly all video abstraction and high-level video
segmentation approaches. Therefore, solving the problem
of shot-boundary detection is one of the major prerequisites
for revealing higher level video content structure. Moreover,
other research areas can profit considerably from successful
automation of shot-boundary detection processes as well.
A good example is the area of video restoration. There, the
restoration efficiency can be improved by comparing each shot
with previous ones and—if a similar shot in terms of visual
characteristics is found in the past—by adopting the restoration
Manuscript received February 7, 2000; revised November 23, 2001. This
paper was recommended by Associate Editor S.-F. Chang.
The author is with the Faculty of Information Technology and Systems, De-
partment of Mediamatics, Delft University of Technology, 2628 CD Delft, The
Netherlands (e-mail: A.Hanjalic@its.tudelft.nl).
Publisher Item Identifier S 1051-8215(02)02015-3.
settings already used before. Further, in the process of coloring
black-and-white movies, the knowledge about shot boundaries
provides time stamps where switch to a different gray-to-color
look-up table should take place.
However, despite countless proposed approaches and tech-
niques so far, robust algorithms for detecting various types of
shot boundaries have not been found yet. We relate here the at-
tribute “robust” to the following major criteria:
1) excellent detection performance for all types of shot
boundaries (hard cuts and gradual transitions);
2) constant quality of the detection performance for any ar-
bitrary sequence, with minimized need for manual fine-
tuning of detection parameters in different sequences.
Regarding the usage of shot-boundary detection algorithms in
the processesofvideo restoration and coloring, fulfillingthe two
aforementioned criteria is the major prerequisite to a successful
automation of these processes. If the detection performance is
poor, substantial involvement of the operator is required in order
to correct wrong restoration settings or gray-to-color look-up
table. Moreover, if the detection performance is sequence de-
pendent, it can be difficult for the operator to find optimal de-
tector settings for each sequence to be restored or colored. For
the processes of high-level video content analysis, fulfilling of
the aforementioned criteria by the shot-boundary detector has
even a larger importance. First, bad detection performance may
negatively influence the performance of subsequent high-level
video analysis modules (e.g., movie segmentation into episodes,
movie abstraction, broadcast news segmentation into reports).
Second, if we cannot expect a video restoration/coloring oper-
ator (expert) to adjust the shot-boundary detector settings to dif-
ferent sequences, this can be expected even less from a nonpro-
fessional user of commercial video-retrieval equipment.
The objective of this paper is twofold. We first analyze the
problem of shot-boundary detection in detail and identify all is-
sues that need to be considered in order to solve this problem in
view of the two criteria listed above. Then we present a concep-
tual solution to the shot-boundary detection problem in which
all issues identifiedin the previous step are considered and using
which we aim at fulfilling the two aforementioned robustness
criteria. This solution is provided in the form of a statistical
detector that is based on minimization of the average detec-
tion-error probability.
The paper is structured as follows. Section II gives a detailed
analysis of the shot-change detection problem, while Section III
provides an extensive overview of the solutions to this problem,
proposed so far. The main purpose of Sections II and III is to
unravel the shot-boundary detection problem and so to explain
our motivation for developing our statistical detector in the first
place and also to justify the choices made in the process of de-
tector development. We present our statistical detector in detail
1051–8215/02$17.00 © 2002 IEEE

HANJALIC: SHOT-BOUNDARY DETECTION: UNRAVELED AND RESOLVED? 91
Fig. 1. The problem of unseparated ranges
R
and
R
.
in Section IV, while in Section V we demonstrate its perfor-
mance for the two most widely used types of shot boundaries:
hard cuts and dissolves. We conclude this paper with a discus-
sion in Section VI.
II. S
HOT-BOUNDARY DETECTION:APROBLEM ANALYSIS
The basis of detecting shot boundaries in video sequences is
the fact that frames surrounding a boundary generally display a
significantchange in their visual contents. The detection process
is then the recognition of considerable discontinuities in the vi-
sual-content flow of a video sequence. In the first step of this
process, feature extraction is performed, where the features de-
pict various aspects of the visual content of a video. Then, a
metric is used to quantify the feature variation from frame
to
frame
, with being the inter-frame distance (skip) and
. The discontinuity value is the magnitude of
this variation and serves as an input into the detector. There, it
is compared against a threshold
. If the threshold is exceeded,
a shot boundary between frames
and is detected.
To be able to draw reliable conclusions about the presence
or absence of a shot boundary between frames
and ,we
need to use the features and metrics for computing the discon-
tinuity values
that are as discriminating as possible.
This means that a clear separation should exist between discon-
tinuity-value ranges for measurements performed within shots
and at shot boundaries. In the following, we will refer to these
ranges as
and , respectively. The problem of having unsep-
arated ranges
and is illustrated in Fig. 1, where some dis-
continuity values within shot 1 belong to the overlap area. Such
values
make it difficult to decide about the pres-
ence or absence of a shot boundary between frames
and
without avoiding detection mistakes, i.e., missed or falsely de-
tected boundaries.
We realistically assume that the visual-content differences
between consecutive frames within the same shot are mainly
caused by two factors: object/camera motion and lighting
changes. Depending on the magnitude of these factors, the
computed discontinuity values within shots vary and sometimes
lie in the overlap area, as shown in Fig. 1. Thus, the easiest
way of obtaining good discrimination between ranges
and
is to use features and metrics that are insensitive to motion
and lighting changes. Even more, since different types of
sequences can globally be characterized by their average rates
and magnitudes of object/camera motion and lighting changes
(e.g., high-action movies versus stationary dramas), eliminating
these distinguishing factors also provides a high level of con-
sistency of ranges
and across different sequences. If the
ranges
and are consistent, the parameters of the detection
system (e.g., the threshold
) can first be optimized on a set
of training sequences to maximize the detection reliability,
and then the system can be used to detect shot boundaries in
an arbitrary sequence without any human supervision, while
retaining a high detection reliability. In this way, selecting
features and metrics as described above would automatically
lead to a shot-boundary detector conform to the criteria defined
in the introduction to this paper.
However, while features and metrics can be found such that
the influence of motion on discontinuity values is strongly
reduced, the influence of strong and abrupt lighting changes on
discontinuity values and thus also on the detection performance
cannot be reduced that easily. For instance, one could try
working only with chromatic color components, since common
lighting changes can mostly be captured by luminance vari-
ations. But this is not an effective solution in extreme cases,
where all color components are changed. Strong and abrupt
lighting changes can result in a series of high discontinuity
values, which can be mistaken for the actual shot boundaries.
In the remainder of this paper, we refer to possible causes for
high discontinuity values within shots as extreme factors. These
factors basically include strong and abrupt lighting changes,
but also some extreme motion cases that cannot be captured
effectively by selecting features and metrics as mentioned
above.
An effective way to reduce the influence of extreme factors
on the detection performance is to embed additional informa-
tion in the shot-boundary detector. The main characteristic of
this information is that it is not based on the range of discon-
tinuity values but on some other measurements performed on
a video, that—each in its own way—indicate the presence or
absence of a shot boundary between frames
and .As
a first example, we introduce the information resulting from a
comparison of a temporal pattern created by consecutivediscon-
tinuity values (measured pattern) and known temporal patterns
that are specific for different types of shot boundaries (template
patterns). In general, we can distinguish hard cuts, which are
the most common boundaries and occur between two consecu-
tive frames, from gradual transitions, such as fades, wipes, and
dissolves, which are spread over several frames. Then, the deci-
sion about the presence or absence of a shot boundary between
frames
and made by the detector is not only based on
range information, that is, on the comparison of the disconti-
nuity value
and the threshold , but also—as shown
in Fig. 2—on the match between the measured pattern formed
by discontinuity values surrounding
and a template
pattern of a shot boundary.
Another type of additional information that can be useful in
supporting the decision process in the shot-boundary detector
results from observation of the characteristic behavior of some
visual features along frames surrounding a shot boundary for the
cases of gradual transitions. Let us for this purpose consider one
specific boundary type—a dissolve—and observe the temporal

92 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 2, FEBRUARY 2002
Fig. 2. Matching of the temporal pattern formed by
N
consecutive
discontinuity values and a temporal pattern characteristic for a shot boundary.
The quality of match between two patterns provides an indication for boundary
presence between frames
k
and
k
+
l
that can be used as additional information
in the detector.
behavior of intensity variance that is measured for every frame
within a dissolve. Since a dissolve is the result of mixing the
visual material from two neighboring shots, it can be expected
that variance values measured per frame along a dissolve ideally
reveal a downwards-parabolic pattern [2], [15], [17]. Hence, the
decision about the presence of a dissolve can be supported by
investigating the behavior of the intensity variance in the “sus-
pected” series of frames (e.g., those where pattern matching
from Fig. 2 shows good results) and by checking how well this
behavior fits the downwards-parabolic pattern.
Further improvement of the detection performance can be ob-
tained by taking into account a priori information about the
presence or absence of a shot boundary at a certain time stamp
along a video. We differentiate here between additional and a
priori information because the latter is not based on any mea-
surement performed on a video sequence. An example of a
priori information is the dependence of the probability for shot
boundary occurrence on the number of elapsed frames since the
last detected shot boundary. While it can be assumed zero at
the beginning of a shot, this probability grows and converges to
the value 0.5 with increasing number of frames in the shot. The
main purpose of this probability is to make the detection of one
shot boundary immediately after another one practically impos-
sible and so to contribute to a reduction of false detection rate.
Therefore, by properly modeling a priori probability and by se-
curing its convergence to 0.5, the influence of this probability
on the detection performance should be minimized as soon as a
reasonable shot length is reached.
In view of the discussion in previous paragraphs, combining
motion compensating features and metrics for computing the
discontinuity values with additional information that can help
reducing the influence of extreme factors and with a priori in-
formation about the shot-boundary presence or absence at a cer-
tain time stamp, we are likely to provide a solid base for creating
a detector that is optimal with respect to the criteria defined in
Section I. Such a detector is illustrated in Fig. 3.
Variation of the detection threshold for each frame
is a con-
sequence of embedding additional and a priori information into
the detector. This information regulates the detection process
by continuously adapting the threshold e.g., to the quality of
the boundary or variance pattern match for each new series of
consecutive discontinuity values and to the time elapsed since
the last detected shot boundary. The remaining task is to define
a detector where the above components are integrated such that
the resulting threshold function
providesoptimal detection
performance. We will proceed with the development of such de-
tector in Section IV after investigating the advantages and dis-
advantagesof shot-boundary detection methods published in re-
cent literature.
III. P
REVIOUS WORK ON SHOT-BOUNDARY DETECTION
Developing techniques for detecting shot boundaries in a
video has been the subject of substantial research over the last
decade. In this section, we give an overview of the relevant
literature. The overview concentrates, on the one hand, on the
capability of features and metrics to reduce the motion influ-
ence on discontinuity values. On the other hand, it investigates
existing approaches to shot-boundary detection, involving the
threshold specification, treatment of different boundary types,
and usage of additional and a priori information to improve the
detection performance.
A. From Features and Metrics to Discontinuity Values
Different methods exist for computing discontinuity values,
employing various features related to the visual content of a
video. For each selected feature, a number of suitable metrics
can be applied. Good comparisons of features and metrics used
for shot-boundary detection with respect to the quality of the
obtained discontinuity values can be found in [1], [6], [9], [13],
[15].
The simplest way of measuring the visual-content discon-
tinuity between two frames is to compute the mean absolute
change of intensity
between the frames and for all
frame pixels, i.e., for
and , where and
are the frame dimensions [12]. A modification of this tech-
nique is only counting the pixels that change considerably from
one frame to another [20]. Here, the absolute intensity change is
compared with the pre-specified threshold
, and is only con-
siderable if it exceeds that threshold, that is
with
if
else.
(1)
An important problem of the two approaches presented above is
the sensitivity of discontinuity values
to camera and
object motion. To reduce the motion influence, a modification
of the described techniques was presented in [30], where a 3
3 averaging filter was applied to frames before performing the
pixel comparison.
Much higher motion independence show the approaches
based on motion compensation. There, a block matching
procedure is applied to find for each block
in frame a
corresponding block
in frame , such that it is
most similar to the block
according to a chosen criterion
(difference formula)
, that is
(2)

HANJALIC: SHOT-BOUNDARY DETECTION: UNRAVELED AND RESOLVED? 93
Fig. 3. Shot-boundary detector where all issues are taken into account that are relevant for optimizing the detection performance.
Here, is the number of candidate blocks
considered in the procedure to find the best match for a block
.If and are neighboring frames of the same shot, the
values
cangenerally be assumed low.This is because,
for a block
, almost the identical block can be
found due to a global constancy of the visual content along con-
secutive frames of a shot. This is not the case if frames
and
surround a shot boundary because, in general, the difference be-
tween corresponding blocks in the two frames will be large due
to a radical change in visual content across a boundary. Thus,
computing the discontinuity value
as a function of
differences
is likely to provide a reliable base for de-
tecting shot boundaries.
An example of computing the discontinuity values based
on the results of block-matching procedure is given in [23].
There, a frame
is divided into nonoverlapping
blocks and the differences
are computed
by comparing pixel-intensity values within blocks. Then, the
obtained differences
are sorted and normalized
between 0 and 1 (where 0 indicates a perfect match), giving the
values
. These values are multiplied with weighting
factors
and summarized over the entire frame to give the
discontinuity values, that is
(3)
A popular alternative to pixel-based approaches is using
histograms as features. Consecutive frames within a shot con-
taining similar global visual material will show little difference
in their histograms, compared to frames on both sides of a
shot boundary. Although it can be argued that frames having
completely different visual contents can still have similar
histograms, the probability of such a case is small. Since
histograms ignore spatial changes within a frame, histogram
differences are considerably more insensitive to object motion
with a constant background than pixel-wise comparisons are.
However, a histogram difference remains sensitive to camera
motion, such as panning, tilting, or zooming.
If histograms are used as features, the discontinuity value
can be obtained by bin-wise computing the difference between
frame histograms. Both grey-level and color histograms are
used in literature, and their differences are computed by a
number of metrics. Some mostly used ones are the sum of ab-
solute differences of corresponding bins [29] and the so-called
-test [18]. Further, a metric involving histograms in the
color space [9] (Hue—color type, Value—intensity, luminance;
Chroma—saturation, the degree to which color is present)
exploits the advantage of the invariance of Hue under different
lighting conditions. This is useful in reducing the influence
of common (weak) lighting changes on discontinuity values.
Such an approach is proposed in [4], where only histograms
of
and components are used. These 1-D histograms are
combined into a 2-D surface, serving as a feature. Based on
this, the discontinuity is computed as
(4)
where
is the difference between the bins at coor-
dinates (
)in -surfaces of frames and , and
and are the resolutions of Hue and Chroma compo-
nents used to form the 2-D histogram surface.
Also the histograms computed block-wise can be used for
shot-boundary detection, as shown in [18]. There, both the
images
and are divided into 16 blocks, histograms
and are computed for blocks and and
the
-test is used to compare corresponding block histograms.
When computing the discontinuity as a sum of region-his-
togram differences, eight largest differences were discarded
to efficiently reduce the influence of motion and noise. An
alternative to this approach can be found in [27], where first the
number of blocks is increased to 48, and then the discontinuity
value is computed as the total number of blocks within a
frame, for which the block-wise histogram difference exceeds
a pre-specified threshold
, that is
(5)
with
if
else.
(6)
According to [19], the approach from [27] is much more sensi-
tive to hard cuts than the one proposed in [18]. However, since
emphasis is put on blocks, which change most from one frame
to another, the approach from [27] also becomes highly sensi-
tive to motion.

94 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 12, NO. 2, FEBRUARY 2002
Another characteristic feature that proved to be useful in de-
tecting shot boundaries is edges. As described in [16], first the
overall motion between frames is computed. Based on the mo-
tion information, two frames are registered and the number and
position of edges detected in both frames are compared. The
total difference is then expressed as the total edge change per-
centage, i.e., the percentage of edges that enter and exit from
one frame to another. Due to registration of frames prior to edge
comparison, this feature is robust against motion. However, the
complexity of computing the discontinuity values is also high.
Let
be the percentage of edge pixels in frame for which the
distance to the closest edge pixel in frame
is larger than
the pre-specified threshold
. In the same way, let be the
percentage of edge pixels in frame
, for which the distance
to the closest edge pixel in frame
is larger than the pre-spec-
ified threshold
. Then, the discontinuity value between these
frames is computed as
(7)
At last, we mention here the computation of the discontinuity
value
using the analysis of the motion field measured
between two frames. An example for this is the approach pro-
posed in [3], where the discontinuity value
between
two consecutive frames is computed as the inverse of motion
smoothness.
B. Detection Approaches
Threshold Specification for Detecting Hard Cuts: The
problem of choosing the right threshold for evaluating the com-
puted discontinuity values has not been addressed extensively
in literature. Most authors work with heuristically chosen
global thresholds [4], [18], [20]. An alternative is given in [30],
where first the statistical distribution of discontinuity values
within a shot is measured. Then the obtained distribution is
modeled by a Gaussian function with parameters
and , and
the threshold value is computed as
(8)
Here,
is the parameter related to the prespecified tolerated
probability for false detections. For instance, when
, the
probability of having falsely detected shot boundaries is 0.1%.
The specification of the parameter
can only explicitly control
the rate of false detections. The rate of missed detections is im-
plicit and cannot be regulated, since the distribution of disconti-
nuity values measured on boundaries is not taken into account.
However, even if they can be specified in a nonheuristic way,
the crucial problem related to the global threshold still remains,
as illustrated in Fig. 4. If the prespecified global threshold is too
low, many false detections will appear in the shot, where high
discontinuity values are caused by extreme factors, as defined
in Section II. If the threshold is made higher to avoid falsely
detected boundaries, then the high discontinuity value corre-
sponding to the shot boundary close to frame 500 (in Fig. 4)
will not be detected.
A much better alternative is to work with adaptive thresholds,
i.e., with thresholds computed locally. The improved detection
Fig. 4. Improved detection performance when using an adaptive threshold
function
T
(
k
)
instead of a global threshold
T
.
performance that results from using adaptive threshold func-
tion
instead of the global threshold is also illustrated
in Fig. 4. If the value of the function
is computed at each
frame
based on the extra information embedded in the detector
(Fig. 3), high discontinuity values computed within shots can be
distinguished from those computed at shot boundaries.
A method for detecting hard cuts using an adaptive threshold
is presented in [29]. There, the values
are computed using
the information about the temporal pattern that is characteristic
for hard cuts. The authors compute the discontinuity values with
the inter-frame distance
. As shown in Fig. 5, the
last computed consecutive discontinuity values are considered,
forming a sliding window. The presence of a shot boundary is
checked at each window position, in the middle of the window,
according to the following criterion:
if
abrupt shot boundary
(9)
In other words, a hard cut is detected between frames
and
if the discontinuity value is the window maximum
and
times larger than the second largest discontinuity value
within the window. The parameter can be understood as
the shape parameter of the boundary pattern. This pattern is
characterized by an isolated sharp peak in a series of discon-
tinuity values. Applying (9) to such a series at each position of
a sliding window is nothing else than matching the ideal pat-
tern shape and the actual behavior of discontinuity values found
within the window. The major weakness of this approach is the
heuristically chosen and fixed parameter
. Because is fixed,
the detection procedure is too coarse and too inflexible, and
because it is chosen heuristically, one cannot make statements
about the scope of its validity.
In order to make the threshold specification in [29] less
heuristic, a detection approach was proposed in [11], which
combines the sliding window methodology with the Gaussian
distribution of discontinuity values proposed in [30]. Instead
of choosing the form parameter
heuristically, this parameter

Citations
More filters
Journal ArticleDOI

Automatic soccer video analysis and summarization

TL;DR: The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection,and penalty-box detection.
Journal ArticleDOI

A Survey on Visual Content-Based Video Indexing and Retrieval

TL;DR: Methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, and video retrieval including query interfaces are analyzed.
Journal ArticleDOI

A Formal Study of Shot Boundary Detection

TL;DR: A unified shot boundary detection system based on graph partition model is presented and it is shown that the proposed approach is among the best in the evaluation of TRECVID 2005.
Journal ArticleDOI

Video shot boundary detection: Seven years of TRECVid activity

TL;DR: An overview of the TRECVid shot boundary detection task, a high-level overview ofThe most significant of the approaches taken, and a comparison of performances are presented, focussing on one year (2005) as an example.
Journal ArticleDOI

Information theory-based shot cut/fade detection and video summarization

TL;DR: It is demonstrated that the method detects both fades and abrupt cuts with high accuracy and it is shown that it captures satisfactorily the visual content of the shot.
References
More filters
Book

Random variables and stochastic processes

TL;DR: An electromagnetic pulse counter having successively operable, contact-operating armatures that are movable to a rest position, an intermediate position and an active position between the main pole and the secondary pole of a magnetic circuit.
Book

Film Art: An Introduction

TL;DR: In this paper, Bordwell and Thompson's Film Art has been the best-selling and most widely respected introduction to the analysis of cinema, supporting a skills-centered approach supported by examples from many periods and countries.
Journal ArticleDOI

Automatic partitioning of full-motion video

TL;DR: A twin-comparison approach has been developed to solve the problem of detecting transitions implemented by special effects, and a motion analysis algorithm is applied to determine whether an actual transition has occurred.
Journal ArticleDOI

Rapid scene analysis on compressed video

TL;DR: Experimental results show that the proposed rapid scene analysis algorithms are fast and effective in detecting abrupt scene changes, gradual transitions including fade-ins and fade-outs, flashlight scenes and in deriving intrashot variations.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the contributions mentioned in the paper "Shot-boundary detection: unraveled and resolved?" ?

The objective of this paper is twofold. First, the authors analyze the shot-boundary detection problem in detail and identify major issues that need to be considered in order to solve this problem successfully. Then, the authors present a conceptual solution to the shot-boundary detection problem in which all issues identified in the previous step are considered. This solution is provided in the form of a statistical detector that is based on minimization of the average detection-error probability. The authors demonstrate the performance of their detector regarding two most widely used types of shot boundaries: hard cuts and dissolves. 

Due to small frame dimensions in the resulting DC sequence, the authors selected the dimensions of the blocks used in the block-matching procedure as 4 4 pixels. 

If histograms are used as features, the discontinuity value can be obtained by bin-wise computing the difference between frame histograms. 

When computing the discontinuity as a sum of region-histogram differences, eight largest differences were discarded to efficiently reduce the influence of motion and noise. 

The basis of detecting shot boundaries in video sequences is the fact that frames surrounding a boundary generally display a significant change in their visual contents. 

An effective way to reduce the influence of extreme factors on the detection performance is to embed additional information in the shot-boundary detector. 

The facts that the detection method presented in this paper can operate on a wide range of video sequences without human supervision, and keep the constant high detection quality for each of them, are the major advantages the proposed detector has over the methods from recent literature. 

For this purpose the authors compute the discontinuity values by compensating the motion between video frames using a blockmatching procedure similar to the one proposed in [23] 

The shape of the distribution in Fig. 8(a) indicates that a good analytic estimate for this distribution and so for the likelihood function can be found in the family of functions given as(19)Using the similar principle of global shape matching, the distribution in Fig. 8(b) and so the likelihood function can best be modeled using a Gaussian function(20)The most suitable parameter combinations and are then found experimentally, such that the rate of detection mistakes for the training sequences is minimized. 

The simplest way of measuring the visual-content discontinuity between two frames is to compute the mean absolute change of intensity between the frames and for all frame pixels, i.e., for and , where andare the frame dimensions [12]. 

the decision about the presence of a dissolve can be supported by investigating the behavior of the intensity variance in the “suspected” series of frames (e.g., those where pattern matching from Fig. 2 shows good results) and by checking how well this behavior fits the downwards-parabolic pattern. 

by properly modeling a priori probability and by securing its convergence to 0.5, the influence of this probability on the detection performance should be minimized as soon as a reasonable shot length is reached. 

while features and metrics can be found such that the influence of motion on discontinuity values is strongly reduced, the influence of strong and abrupt lighting changes on discontinuity values and thus also on the detection performance cannot be reduced that easily.