scispace - formally typeset
Open AccessBook ChapterDOI

Video Fragmentation and Reverse Search on the Web.

Reads0
Chats0
TLDR
This chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web.
Abstract
This chapter is focused on methods and tools for video fragmentation and reverse search on the web. These technologies can assist journalists when they are dealing with fake news—which nowadays are being rapidly spread via social media platforms—that rely on the reuse of a previously posted video from a past event with the intention to mislead the viewers about a contemporary event. The fragmentation of a video into visually and temporally coherent parts and the extraction of a representative keyframe for each defined fragment enables the provision of a complete and concise keyframe-based summary of the video. Contrary to straightforward approaches that sample video frames with a constant step, the generated summary through video fragmentation and keyframe extraction is considerably more effective for discovering the video content and performing a fragment-level search for the video on the web. This chapter starts by explaining the nature and characteristics of this type of reuse-based fake news in its introductory part, and continues with an overview of existing approaches for temporal fragmentation of single-shot videos into sub-shots (the most appropriate level of temporal granularity when dealing with user-generated videos) and tools for performing reverse search of a video on the web. Subsequently, it describes two state-of-the-art methods for video sub-shot fragmentation—one relying on the assessment of the visual coherence over sequences of frames, and another one that is based on the identification of camera activity during the video recording—and presents the InVID web application that enables the fine-grained (at the fragment-level) reverse search for near-duplicates of a given video on the web. In the sequel, the chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web. Finally, it draws conclusions about the effectiveness of the presented technologies and outlines our future plans for further advancing them.

read more

Content maybe subject to copyright    Report

Chapter 1
Video fragmentation and reverse search on the
Web
Evlampios Apostolidis, Konstantinos Apostolidis, Ioannis Patras, Vasileios
Mezaris
Abstract This chapter is focused on methods and tools for video fragmentation and
reverse search on the Web. These technologies can assist journalists when they are
dealing with fake news - which nowadays are rapidly spread via social media plat-
forms - that rely on the reuse of a previously posted video from a past event with
the intention to mislead the viewers about a contemporary event. The fragmentation
of a video into visually and temporally coherent parts and the extraction of a rep-
resentative keyframe for each defined fragment enables the provision of a complete
and concise keyframe-based summary of the video. Contrary to straightforward ap-
proaches that sample video frames with a constant step, the generated summary
through video fragmentation and keyframe extraction is considerably more effec-
tive for discovering the video content and performing a fragment-level search for
the video on the Web. This chapter starts by explaining the nature and character-
istics of this type of reuse-based fake news in its introductory part, and continues
with an overview of existing approaches for temporal fragmentation of single-shot
videos into sub-shots (the most appropriate level of temporal granularity when deal-
ing with user-generated videos) and tools for performing reverse search of a video
on the Web. Subsequently it describes two state-of-the-art methods for video sub-
Evlampios Apostolidis
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki,
Greece and School of Electronic Engineering and Computer Science, Queen Mary University,
London, UK, e-mail: apostolid@iti.gr
Konstantinos Apostolidis
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki,
Greece, e-mail: kapost@iti.gr
Ioannis Patras
School of Electronic Engineering and Computer Science, Queen Mary University, London, UK,
e-mail: i.patras@qmul.ac.uk
Vasileios Mezaris
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki,
Greece, e-mail: bmezaris@iti.gr
1

2 E. Apostolidis et al.
shot fragmentation - one relying on the assessment of the visual coherence over
sequences of frames, and another one that is based on the identification of camera
activity during the video recording - and presents the InVID web application that
enables the fine-grained (at the fragment-level) reverse search for near-duplicates
of a given video on the Web. In the sequel the chapter reports the findings of a
series of experimental evaluations regarding the efficiency of the above mentioned
technologies, which indicate their competence to generate a concise and complete
keyframe-based summary of the video content, and the use of this fragment-level
representation for fine-grained reverse video search on the Web. Finally, it draws
conclusions about the effectiveness of the presented technologies and outlines our
future plans for further advancing them.
1.1 Introduction
The recent advances in video capturing technology made possible the embedding of
powerful, high-resolution video sensors into portable devices, such as camcorders,
digital cameras, tablets and smartphones. Most of these technologies now offer net-
work connectivity and file sharing functionalities. The latter, combined with the rise
and widespread use of social networks (such as Facebook, Twitter, Instagram) and
video sharing platforms (such as YouTube, Vimeo, DailyMotion) resulted in a enor-
mous increase in the number of videos captured and shared online by amateur users
on a daily basis. These user-generated videos (UGVs) can nowadays be recorded at
any time and place using smartphones, tablets and a variety of video cameras (such
as GoPro action cameras) that can be attached to sticks, body parts or even drones.
The ubiquitous use of video capturing devices supported by the convenience of the
user to share videos through social networks and video sharing platforms, leads to a
wealth of online available UGVs.
Over the last years these online shared UGVs are, in many cases, the only ev-
idence of a breaking or evolving story. The sudden and unexpected appearance of
these events make their timely coverage by news or media organization impossi-
ble. However, the existence (in most cases) of eyewitnesses capturing the story with
their smartphones and instantly sharing the recorded video (even live, i.e. during
its recording) via social networks, makes the UGV the only and highly valuable
source of information about the breaking event. In this newly formed technological
environment that facilitates information diffusion through a variety of social me-
dia platforms, journalists and investigators alike are increasingly turning to these
platforms to find media recordings of events. Newsrooms in TV stations and online
news platforms make use of video to illustrate and report on news events, and since
professional journalists are not always at the scene of a breaking or evolving story
(as mentioned above), it is the content shared by users that can be used for reporting
the story. Nevertheless, the rise of social media as a news source has also seen a rise
in fake news, i.e. the spread of deliberate misinformation or disinformation on these

1 Video fragmentation and reverse search on the Web 3
platforms. Based on this unfortunate fact, the online shared user-generated content
comes into question and people’s trust in journalism is severely shaken.
One type of fakes, probably the easiest to do and thus one of most commonly
found by journalists, relies on the reuse of a video from an earlier event with the
claim that it shows a contemporary event. An example of such a fake is depicted in
Fig. 1.1. In this figure, the image on the left is a screenshot of a video showing a
hurricane that strikes in Dolores, Uruguay on May 29 2016, the image on the middle
is a screenshot of the same video with the claim that is shows Hurricane Otto that
strikes in Bocas del Toro, Panama on November 24 2016, and the image on the
right is a screenshot of a tweet that uses the same video with the claim that is shows
the activity of Hurricane Irma in the islands near the United States on September 9
2017.
Fig. 1.1: Example of a fake news based on the reuse of a video from a hurricane
in Uruguay (image on the left) to deliberately mislead people about the strike of
hurricane Otto in Panama (image in the middle) and the strike of hurricane Irma in
the US islands (image on the right).
The identification and debunking of such fakes requires the detection of the orig-
inal video through the search for prior occurrences of this video (or parts of it)
on the Web. Early approaches for performing this task were based on manually
taking screenshots of the video in the player and uploading these images for per-
forming reverse image search using the corresponding functionality of popular Web
search engines (e.g. Google search). This process can be highly laborious and time-
demanding, while its efficiency depends on a limited set of manually taken screen-
shots of the video. However, the in-time identification of media posted online, which
(claim to) illustrate a (breaking) news event is for many journalists the foremost
challenge in order to meet deadlines to publish a news story online or fill a news
broadcast with content. The time needed for extensive and effective search regard-
ing the posted video, in combination with the lack of expertise by many journalists
and the time-pressure to publish the story, can seriously affect the credibility of the
published news item. And the publication or re-publication of fake news can sig-
nificantly harm the reliability of the entire news organization. An example of miss-
verification of a fake video by an Italian news organization is presented in Fig. 1.2.
A video from the filming of the “World War Z” movie (left part of Fig. 1.2) was
used in a tweet claiming to show a Hummer attack against police in Notre-Dame,

4 E. Apostolidis et al.
Paris, France on June 6 2017 (middle part of Fig. 1.2) and another tweet claiming
to show an attack at Gare Centrale, Brussels, Belgium two weeks later (right part of
Fig. 1.2). The fake tweet about the Paris attack was used in a new item published by
the aforementioned news organization, causing a strong defeat in its trustworthiness.
Fig. 1.2: Example of a fake news based on the reuse of a video from the filming of
the “World War Z” movie (image on the left) to deliberately mislead people about
a Hummer attack attack in Notre-Dame, Paris (image in the middle) and at Gare
Centrale in Brussels (image on the right).
Several tools that enable the identification of near-duplicates of a video on the
Web have been developed over the last years, a fact that indicates the usefulness
and applicability of this process by journalists and members of the media verifica-
tion community. Nevertheless, the existing solutions (presented in details in Sec-
tion 1.2.2) exhibit several limitations that restrict the effectiveness of the video re-
verse search task. In particular, some of these solutions rely on a limited set of video
thumbnails provided by the video sharing platform (e.g. the YouTube DataViewer
of Amnesty International
1
and the Custom Reverse Image Search of IntelTech-
niques
2
). Other technologies demand the extraction of video frames for performing
reverse image search (e.g. the TinEye search engine
3
and the Karma Decay
4
web
application). A number of tools enable this reverse search on closed collections of
videos, that significantly limit the boundaries of investigation (e.g. the Berify
5
, the
RevIMG
6
and the Videntifier
7
platforms). Last but not least, a commonality among
the aforementioned technologies is that none of them supports the analysis of locally
stored videos.
1
https://citizenevidence.amnestyusa.org/
2
https://inteltechniques.com/osint/reverse.video.html
3
https://tineye.com/
4
http://karmadecay.com/
5
https://berify.com/
6
http://www.revimg.com/
7
http://www.videntifier.com

1 Video fragmentation and reverse search on the Web 5
Aiming to offer a more effective approach for reverse video search on the Web,
in InVID we developed: a) an algorithm for temporal fragmentation of (single-shot)
UGVs into sub-shots (presented in Section 1.3.1.1), and b) a web application that
integrates this algorithm and makes possible the time-efficient and at the fragment-
level reverse search for near-duplicates of a given video on the Web (described in
Section 1.3.2. The developed algorithm allows the identification of visually and
temporally coherent parts of the processed video, and the extraction of a dynamic
number of keyframes in a manner that secures a complete and concise representation
of the defined - visually discrete - parts of the video. Moreover, the compatibility
of the web application with several video sharing platforms and social networks is
further extended by the ability to directly process videos that are locally stored in
the user’s machine. In a nutshell, our complete technology assists users to quickly
discover the temporal structure of the video, extract detailed information about the
video content and use this data in their reverse video search queries.
In the following, Section 1.2 discusses the current state of the art on methods
for video sub-shot fragmentation (Section 1.2.1) and tools for reverse video search
on the Web (Section 1.2.2. Then Section 1.3 is dedicated to the presentation of two
advanced approaches for video sub-shot fragmentation - the InVID method that re-
lies on the visual resemblance of the video content (see Section 1.3.1.1) and another
algorithm that is based on the extraction of motion information (see Section 1.3.1.2)
- and the description of the InVID web application for reverse video search on the
Web (see Section 1.3.2). Subsequently, Section 1.4 reports the extracted findings re-
garding the performance of the aforementioned methods (see Section 1.4.1) and tool
(see Section 1.4.2), while the last Section 1.5 concludes the document and presents
our future plans on this research area.
1.2 Related Work
This part presents the related work, both in terms of methods for temporal frag-
mentation of uninterruptedly captured (i.e. single-shot) videos into sub-shots (Sec-
tion 1.2.1) and tools for finding near-duplicates of a given video on the Web (Sec-
tion 1.2.2).
1.2.1 Video Fragmentation
A variety of methods dealing with the temporal fragmentation of single-shot videos
have been proposed over the last couple of decades. Most of them are related to
approaches for video summarization and keyframe selection (e.g. [21, 9, 29, 15]),
some focus on the analysis of egocentric or wearable videos (e.g. [27, 41, 19]),
others aim to address the need for detecting duplicates of videos (e.g. [8]), a number
of them is related to the indexing and annotation of personal videos (e.g. [28]),

Figures
Citations
More filters
Journal ArticleDOI

Examining fake news comments on Facebook: an application of situational theory of problem solving in content analysis

Ying Shin Chin, +1 more
- 03 May 2022 - 
TL;DR: In this article , a content analysis was conducted on 2189 comments derived from 45 fake news on Facebook between 2017 to 2019, Chi-square test was subsequently performed to analyze the association between the variables.
Journal ArticleDOI

The Challenges of Studying Misinformation on Video-Sharing Platforms During Crises and Mass-Convergence Events

TL;DR: In this paper , the authors draw on prior experiences to outline three core challenges faced in studying video-sharing platforms in high-stakes and fast-paced settings: navigating the unique affordances of VSPs, understanding VSP content and determining its authenticity, and novel user behaviors on VSP for spreading misinformation.
References
More filters

Pyramidal implementation of the Lucas Kanade feature tracker description of the algorithm

TL;DR: It is essential to define the notion of similarity in a 2D neighborhood sense and the image velocity d is defined as being the vector that minimizes the residual function defined as follows.
Proceedings ArticleDOI

Story-Driven Summarization for Egocentric Video

TL;DR: A video summarization approach that discovers the story of an egocentric video, and defines a random-walk based metric of influence between sub shots that reflects how visual objects contribute to the progression of events.
Journal ArticleDOI

Procedural content generation for games: A survey

TL;DR: This is the first comprehensive survey of the field of PCG-G, and introduces a comprehensive, six-layered taxonomy of game content: bits, space, systems, scenarios, design, and derived.
Journal ArticleDOI

Video summarization and scene detection by graph modeling

TL;DR: In this application, video summaries that emphasize both content balance and perceptual quality can be generated directly from a temporal graph that embeds both the structure and attention information.
Proceedings ArticleDOI

The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval

TL;DR: The experimental results show that the descriptor enclosing six for luminance and three for each chrominance coefficient achieves the best trade-off between the storage cost and retrieval efficiency.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What contributions have the authors mentioned in the paper "Video fragmentation and reverse search on the web" ?

In this paper, the authors focus on methods and tools for video fragmentation and reverse search on the Web. 

Regarding the future outlook of the presented technologies, motivated by the adoption and use of the developed web application for reverse video search by hundreds of users on a daily basis ( through its integration into the InVID Verification Plugin19 ), their work will focus on: a ) the user-based evaluation of the efficiency of 19 Available at: http: //www. invid-project. 3. 1. 2 to produce a comprehensive and thorough keyframe-based summary of the video content ; b ) the possibility to combine the algorithms of Sections 1. 3. 1. 1 and 1. 3. 1. 2 in order to exploit the fragmentation accuracy of the latter one and the visual discrimination efficiency of the former one ( especially on the keyframe selection part of the process ) ; c ) the exploitation of the performance of modern deep-network architectures ( such as DCNNs and LSTMs ) for advancing the accuracy of the video fragmentation process ; and d ) the further improvement of the keyframe selection process to minimize the possibility of extracting black on blurred video frames of limited usability for the user, thus aiming to an overall amelioration of the tool ’ s effectiveness. 

The ubiquitous use of video capturing devices supported by the convenience of the user to share videos through social networks and video sharing platforms, leads to a wealth of online available UGVs. 

Driven by the lack of publicly available datasets for evaluating the performance of video sub-shot fragmentation algorithms16, the authors built their own ground-truth dataset. 

Last but not least, three recently developed platforms that assist the detection and retrieval of images and videos are the Berify, the RevIMG and the Videntifier. 

One of the easiest ways to produce fake news (such fakes are known as “easy fakes” in the media verification community) is based on the reuse of a video from an earlier circumstance with the assertion that it presents a current event, with the aim to deliberately misguide the viewers about the event. 

the in-time identification of media posted online, which (claim to) illustrate a (breaking) news event is for many journalists the foremost challenge in order to meet deadlines to publish a news story online or fill a news broadcast with content. 

The conducted motion between a pair of neighboring frames is estimated by computing the region-level optical flow based on the procedure depicted in Fig. 1.5, which consists of the following steps:• each frame undergoes an image resizing process that maintains the original aspect ratio and makes the frame width equal to w, and then it is spatially fragmented into four quartiles;• the most prominent corners in each quartile are detected based on the algorithm of [38]; • the detected corners are used for estimating the optical flow at the region-level by utilizing the Pyramidal Lucas Kanade (PLK) method; • based on the extracted optical flow, a mean displacement vector is computed for each quartile, and the four spatially distributed vectors are treated as a regionlevel representation of the motion activity between the pair of frames. 

For each one of the tested approaches the number of correct detections (where the detected boundary can lie within a temporal window around the respective groundtruth boundary, equal to twice the video frame-rate), misdetections and false alarms were counted and the algorithms’ performance was expressed in terms of Precision (P), Recall (R) and F-Score (F), similarly to [1, 2]. 

the keyframe selection strategy of the first alternative combined with the competitive performance of the InVID approach in most examined cases, indicates the InVID method as the most efficient one in generating keyframe-based video summaries that are well-balanced according to the determined criteria for the descriptiveness (completion) and representativeness (conciseness) of the keyframe collection. 

Most of them are related to approaches for video summarization and keyframe selection (e.g. [21, 9, 29, 15]), some focus on the analysis of egocentric or wearable videos (e.g. [27, 41, 19]), others aim to address the need for detecting duplicates of videos (e.g. [8]), a number of them is related to the indexing and annotation of personal videos (e.g. [28]),while there is a group of methods that targeted the indexing and summarization of rushes video (e.g. [12, 25, 4, 36]). 

One type of fakes, probably the easiest to do and thus one of most commonly found by journalists, relies on the reuse of a video from an earlier event with the claim that it shows a contemporary event. 

The recent advances in video capturing technology made possible the embedding of powerful, high-resolution video sensors into portable devices, such as camcorders, digital cameras, tablets and smartphones. 

Contrary to the use of experimentally-defined thresholds for categorizing the detected camera motion, [18] describes a generic approach for motion-based video parsing that estimates the affine motion parameters, either based on motion vectors of the MPEG-2 stream or by applying a frame-to-frame image registration process, factorizes their values via Singular Value Decomposition (SVD) and imports them into three multi-class Support Vector Machines (SVMs) to recognize the camera motion type and direction between successive video frames. 

Driven by this observation, the algorithm does not apply the aforementioned pair-wise similarity estimation on every pair of consecutive video frames, but only for neigboring frames selected via a frame-sampling strategy which keeps 3 equally distant frames per second.