Open AccessBook ChapterDOI

Video Fragmentation and Reverse Search on the Web.

- pp 53-90

Chats0

TLDR

This chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web.

Abstract:

This chapter is focused on methods and tools for video fragmentation and reverse search on the web. These technologies can assist journalists when they are dealing with fake news—which nowadays are being rapidly spread via social media platforms—that rely on the reuse of a previously posted video from a past event with the intention to mislead the viewers about a contemporary event. The fragmentation of a video into visually and temporally coherent parts and the extraction of a representative keyframe for each defined fragment enables the provision of a complete and concise keyframe-based summary of the video. Contrary to straightforward approaches that sample video frames with a constant step, the generated summary through video fragmentation and keyframe extraction is considerably more effective for discovering the video content and performing a fragment-level search for the video on the web. This chapter starts by explaining the nature and characteristics of this type of reuse-based fake news in its introductory part, and continues with an overview of existing approaches for temporal fragmentation of single-shot videos into sub-shots (the most appropriate level of temporal granularity when dealing with user-generated videos) and tools for performing reverse search of a video on the web. Subsequently, it describes two state-of-the-art methods for video sub-shot fragmentation—one relying on the assessment of the visual coherence over sequences of frames, and another one that is based on the identification of camera activity during the video recording—and presents the InVID web application that enables the fine-grained (at the fragment-level) reverse search for near-duplicates of a given video on the web. In the sequel, the chapter reports the findings of a series of experimental evaluations regarding the efficiency of the above-mentioned technologies, which indicate their competence to generate a concise and complete keyframe-based summary of the video content, and the use of this fragment-level representation for fine-grained reverse video search on the web. Finally, it draws conclusions about the effectiveness of the presented technologies and outlines our future plans for further advancing them.

Content maybe subject to copyright Report

Chapter 1

Video fragmentation and reverse search on the

Web

Evlampios Apostolidis, Konstantinos Apostolidis, Ioannis Patras, Vasileios

Mezaris

Abstract This chapter is focused on methods and tools for video fragmentation and

reverse search on the Web. These technologies can assist journalists when they are

dealing with fake news - which nowadays are rapidly spread via social media plat-

forms - that rely on the reuse of a previously posted video from a past event with

the intention to mislead the viewers about a contemporary event. The fragmentation

of a video into visually and temporally coherent parts and the extraction of a rep-

resentative keyframe for each deﬁned fragment enables the provision of a complete

and concise keyframe-based summary of the video. Contrary to straightforward ap-

proaches that sample video frames with a constant step, the generated summary

through video fragmentation and keyframe extraction is considerably more effec-

tive for discovering the video content and performing a fragment-level search for

the video on the Web. This chapter starts by explaining the nature and character-

istics of this type of reuse-based fake news in its introductory part, and continues

with an overview of existing approaches for temporal fragmentation of single-shot

videos into sub-shots (the most appropriate level of temporal granularity when deal-

ing with user-generated videos) and tools for performing reverse search of a video

on the Web. Subsequently it describes two state-of-the-art methods for video sub-

Evlampios Apostolidis

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki,

Greece and School of Electronic Engineering and Computer Science, Queen Mary University,

London, UK, e-mail: apostolid@iti.gr

Konstantinos Apostolidis

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki,

Greece, e-mail: kapost@iti.gr

Ioannis Patras

School of Electronic Engineering and Computer Science, Queen Mary University, London, UK,

e-mail: i.patras@qmul.ac.uk

Vasileios Mezaris

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki,

Greece, e-mail: bmezaris@iti.gr

2 E. Apostolidis et al.

shot fragmentation - one relying on the assessment of the visual coherence over

sequences of frames, and another one that is based on the identiﬁcation of camera

activity during the video recording - and presents the InVID web application that

enables the ﬁne-grained (at the fragment-level) reverse search for near-duplicates

of a given video on the Web. In the sequel the chapter reports the ﬁndings of a

series of experimental evaluations regarding the efﬁciency of the above mentioned

technologies, which indicate their competence to generate a concise and complete

keyframe-based summary of the video content, and the use of this fragment-level

representation for ﬁne-grained reverse video search on the Web. Finally, it draws

conclusions about the effectiveness of the presented technologies and outlines our

future plans for further advancing them.

1.1 Introduction

The recent advances in video capturing technology made possible the embedding of

powerful, high-resolution video sensors into portable devices, such as camcorders,

digital cameras, tablets and smartphones. Most of these technologies now offer net-

work connectivity and ﬁle sharing functionalities. The latter, combined with the rise

and widespread use of social networks (such as Facebook, Twitter, Instagram) and

video sharing platforms (such as YouTube, Vimeo, DailyMotion) resulted in a enor-

mous increase in the number of videos captured and shared online by amateur users

on a daily basis. These user-generated videos (UGVs) can nowadays be recorded at

any time and place using smartphones, tablets and a variety of video cameras (such

as GoPro action cameras) that can be attached to sticks, body parts or even drones.

The ubiquitous use of video capturing devices supported by the convenience of the

user to share videos through social networks and video sharing platforms, leads to a

wealth of online available UGVs.

Over the last years these online shared UGVs are, in many cases, the only ev-

idence of a breaking or evolving story. The sudden and unexpected appearance of

these events make their timely coverage by news or media organization impossi-

ble. However, the existence (in most cases) of eyewitnesses capturing the story with

their smartphones and instantly sharing the recorded video (even live, i.e. during

its recording) via social networks, makes the UGV the only and highly valuable

source of information about the breaking event. In this newly formed technological

environment that facilitates information diffusion through a variety of social me-

dia platforms, journalists and investigators alike are increasingly turning to these

platforms to ﬁnd media recordings of events. Newsrooms in TV stations and online

news platforms make use of video to illustrate and report on news events, and since

professional journalists are not always at the scene of a breaking or evolving story

(as mentioned above), it is the content shared by users that can be used for reporting

the story. Nevertheless, the rise of social media as a news source has also seen a rise

in fake news, i.e. the spread of deliberate misinformation or disinformation on these

1 Video fragmentation and reverse search on the Web 3

platforms. Based on this unfortunate fact, the online shared user-generated content

comes into question and people’s trust in journalism is severely shaken.

One type of fakes, probably the easiest to do and thus one of most commonly

found by journalists, relies on the reuse of a video from an earlier event with the

claim that it shows a contemporary event. An example of such a fake is depicted in

Fig. 1.1. In this ﬁgure, the image on the left is a screenshot of a video showing a

hurricane that strikes in Dolores, Uruguay on May 29 2016, the image on the middle

is a screenshot of the same video with the claim that is shows Hurricane Otto that

strikes in Bocas del Toro, Panama on November 24 2016, and the image on the

right is a screenshot of a tweet that uses the same video with the claim that is shows

the activity of Hurricane Irma in the islands near the United States on September 9

2017.

Fig. 1.1: Example of a fake news based on the reuse of a video from a hurricane

in Uruguay (image on the left) to deliberately mislead people about the strike of

hurricane Otto in Panama (image in the middle) and the strike of hurricane Irma in

the US islands (image on the right).

The identiﬁcation and debunking of such fakes requires the detection of the orig-

inal video through the search for prior occurrences of this video (or parts of it)

on the Web. Early approaches for performing this task were based on manually

taking screenshots of the video in the player and uploading these images for per-

forming reverse image search using the corresponding functionality of popular Web

search engines (e.g. Google search). This process can be highly laborious and time-

demanding, while its efﬁciency depends on a limited set of manually taken screen-

shots of the video. However, the in-time identiﬁcation of media posted online, which

(claim to) illustrate a (breaking) news event is for many journalists the foremost

challenge in order to meet deadlines to publish a news story online or ﬁll a news

broadcast with content. The time needed for extensive and effective search regard-

ing the posted video, in combination with the lack of expertise by many journalists

and the time-pressure to publish the story, can seriously affect the credibility of the

published news item. And the publication or re-publication of fake news can sig-

niﬁcantly harm the reliability of the entire news organization. An example of miss-

veriﬁcation of a fake video by an Italian news organization is presented in Fig. 1.2.

A video from the ﬁlming of the “World War Z” movie (left part of Fig. 1.2) was

used in a tweet claiming to show a Hummer attack against police in Notre-Dame,

4 E. Apostolidis et al.

Paris, France on June 6 2017 (middle part of Fig. 1.2) and another tweet claiming

to show an attack at Gare Centrale, Brussels, Belgium two weeks later (right part of

Fig. 1.2). The fake tweet about the Paris attack was used in a new item published by

the aforementioned news organization, causing a strong defeat in its trustworthiness.

Fig. 1.2: Example of a fake news based on the reuse of a video from the ﬁlming of

the “World War Z” movie (image on the left) to deliberately mislead people about

a Hummer attack attack in Notre-Dame, Paris (image in the middle) and at Gare

Centrale in Brussels (image on the right).

Several tools that enable the identiﬁcation of near-duplicates of a video on the

Web have been developed over the last years, a fact that indicates the usefulness

and applicability of this process by journalists and members of the media veriﬁca-

tion community. Nevertheless, the existing solutions (presented in details in Sec-

tion 1.2.2) exhibit several limitations that restrict the effectiveness of the video re-

verse search task. In particular, some of these solutions rely on a limited set of video

thumbnails provided by the video sharing platform (e.g. the YouTube DataViewer

of Amnesty International

and the Custom Reverse Image Search of IntelTech-

niques

). Other technologies demand the extraction of video frames for performing

reverse image search (e.g. the TinEye search engine

and the Karma Decay

web

application). A number of tools enable this reverse search on closed collections of

videos, that signiﬁcantly limit the boundaries of investigation (e.g. the Berify

, the

RevIMG

and the Videntiﬁer

platforms). Last but not least, a commonality among

the aforementioned technologies is that none of them supports the analysis of locally

stored videos.

https://citizenevidence.amnestyusa.org/

https://inteltechniques.com/osint/reverse.video.html

https://tineye.com/

http://karmadecay.com/

https://berify.com/

http://www.revimg.com/

http://www.videntiﬁer.com

1 Video fragmentation and reverse search on the Web 5

Aiming to offer a more effective approach for reverse video search on the Web,

in InVID we developed: a) an algorithm for temporal fragmentation of (single-shot)

UGVs into sub-shots (presented in Section 1.3.1.1), and b) a web application that

integrates this algorithm and makes possible the time-efﬁcient and at the fragment-

level reverse search for near-duplicates of a given video on the Web (described in

Section 1.3.2. The developed algorithm allows the identiﬁcation of visually and

temporally coherent parts of the processed video, and the extraction of a dynamic

number of keyframes in a manner that secures a complete and concise representation

of the deﬁned - visually discrete - parts of the video. Moreover, the compatibility

of the web application with several video sharing platforms and social networks is

further extended by the ability to directly process videos that are locally stored in

the user’s machine. In a nutshell, our complete technology assists users to quickly

discover the temporal structure of the video, extract detailed information about the

video content and use this data in their reverse video search queries.

In the following, Section 1.2 discusses the current state of the art on methods

for video sub-shot fragmentation (Section 1.2.1) and tools for reverse video search

on the Web (Section 1.2.2. Then Section 1.3 is dedicated to the presentation of two

advanced approaches for video sub-shot fragmentation - the InVID method that re-

lies on the visual resemblance of the video content (see Section 1.3.1.1) and another

algorithm that is based on the extraction of motion information (see Section 1.3.1.2)

- and the description of the InVID web application for reverse video search on the

Web (see Section 1.3.2). Subsequently, Section 1.4 reports the extracted ﬁndings re-

garding the performance of the aforementioned methods (see Section 1.4.1) and tool

(see Section 1.4.2), while the last Section 1.5 concludes the document and presents

our future plans on this research area.

1.2 Related Work

This part presents the related work, both in terms of methods for temporal frag-

mentation of uninterruptedly captured (i.e. single-shot) videos into sub-shots (Sec-

tion 1.2.1) and tools for ﬁnding near-duplicates of a given video on the Web (Sec-

tion 1.2.2).

1.2.1 Video Fragmentation

A variety of methods dealing with the temporal fragmentation of single-shot videos

have been proposed over the last couple of decades. Most of them are related to

approaches for video summarization and keyframe selection (e.g. [21, 9, 29, 15]),

some focus on the analysis of egocentric or wearable videos (e.g. [27, 41, 19]),

others aim to address the need for detecting duplicates of videos (e.g. [8]), a number

of them is related to the indexing and annotation of personal videos (e.g. [28]),

HTML Viewer

Figures

Fig. 1.4: An example of the smoothed series of similarity scores (gray curve), the identified sub-shot boundaries (yellow vertical lines) and the selected representative keyframe (blue vertical lines) for each one of them.

Fig. 1.12: Top row: a sequence of video frames (sampled for space and presentation efficiency) fragmented by the proposed algorithm into two sub-shots; one related to a horizontal and one related to an upward camera movement. Bottom row: a sequence of video frames (sampled for space and presentation efficiency) fragmented by the proposed algorithm into two sub-shots; one related to a camera zooming in and one related to camera zooming out.

Table 1.3: Time-efficiency of the evaluated sub-shot fragmentation approaches.

Fig. 1.7: Application of Algorithm 1 for a single normalized displacement vector: (a) initial values Vx, (b) low-pass filtered values V ′x , (c) detected candidate sub-shot boundaries in Ix, (d) selected sub-shot boundaries in Bx; red parts denote fragments with left displacement, orange parts denote fragments with right displacement and green parts denote fragments with no or minor movement.

Table 1.1: Indicative list of fakes debunked using the web application for video fragmentation and reverse keyframe search.

Fig. 1.13: The keyframe collections generated for an AFP-selected video by the three tested approaches. The top left corresponds to the InVID method, the bottom left corresponds to the second alternative and the right-sided corresponds to the first alternative.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Examining fake news comments on Facebook: an application of situational theory of problem solving in content analysis

Ying Shin Chin, +1 more

- 03 May 2022 -

Media Asia

TL;DR: In this article , a content analysis was conducted on 2189 comments derived from 45 fake news on Facebook between 2017 to 2019, Chi-square test was subsequently performed to analyze the association between the variables.

...read moreread less

Journal ArticleDOI

The Challenges of Studying Misinformation on Video-Sharing Platforms During Crises and Mass-Convergence Events

Sukrit Venkatagiri, +2 more

- 25 Mar 2023 -

arXiv.org

TL;DR: In this paper , the authors draw on prior experiences to outline three core challenges faced in studying video-sharing platforms in high-stakes and fast-paced settings: navigating the unique affordances of VSPs, understanding VSP content and determining its authenticity, and novel user behaviors on VSP for spreading misinformation.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Martin A. Fischler, +1 more

- 01 Jun 1981 -

Communications of The ACM

TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.

...read moreread less

Proceedings ArticleDOI

Object recognition from local scale-invariant features

David G. Lowe

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

...read moreread less

Journal ArticleDOI

Speeded-Up Robust Features (SURF)

Herbert Bay, +3 more

- 01 Jun 2008 -

Computer Vision and Image Understanding

TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.

...read moreread less

Proceedings ArticleDOI

ORB: An efficient alternative to SIFT or SURF

Ethan Rublee, +3 more

TL;DR: This paper proposes a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise, and demonstrates through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations.

...read moreread less

Proceedings ArticleDOI

Good features to track

Jianbo Shi, +1 more

TL;DR: A feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world are proposed.

...read moreread less

Collapse

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "Video fragmentation and reverse search on the web" ?

In this paper, the authors focus on methods and tools for video fragmentation and reverse search on the Web.

Q2. What have the authors stated for future works in "Video fragmentation and reverse search on the web" ?

Regarding the future outlook of the presented technologies, motivated by the adoption and use of the developed web application for reverse video search by hundreds of users on a daily basis ( through its integration into the InVID Verification Plugin19 ), their work will focus on: a ) the user-based evaluation of the efficiency of 19 Available at: http: //www. invid-project. 3. 1. 2 to produce a comprehensive and thorough keyframe-based summary of the video content ; b ) the possibility to combine the algorithms of Sections 1. 3. 1. 1 and 1. 3. 1. 2 in order to exploit the fragmentation accuracy of the latter one and the visual discrimination efficiency of the former one ( especially on the keyframe selection part of the process ) ; c ) the exploitation of the performance of modern deep-network architectures ( such as DCNNs and LSTMs ) for advancing the accuracy of the video fragmentation process ; and d ) the further improvement of the keyframe selection process to minimize the possibility of extracting black on blurred video frames of limited usability for the user, thus aiming to an overall amelioration of the tool ’ s effectiveness.

Q3. What is the main reason for the rise of UGVs?

The ubiquitous use of video capturing devices supported by the convenience of the user to share videos through social networks and video sharing platforms, leads to a wealth of online available UGVs.

Q4. Why did the authors build their own ground-truth dataset?

Driven by the lack of publicly available datasets for evaluating the performance of video sub-shot fragmentation algorithms16, the authors built their own ground-truth dataset.

Q5. What are the three recent platforms that assist the detection and retrieval of near duplicates of an?

Last but not least, three recently developed platforms that assist the detection and retrieval of images and videos are the Berify, the RevIMG and the Videntifier.

Q6. What is the way to produce fake news?

One of the easiest ways to produce fake news (such fakes are known as “easy fakes” in the media verification community) is based on the reuse of a video from an earlier circumstance with the assertion that it presents a current event, with the aim to deliberately misguide the viewers about the event.

Q7. What is the main challenge of the in-time identification of media?

the in-time identification of media posted online, which (claim to) illustrate a (breaking) news event is for many journalists the foremost challenge in order to meet deadlines to publish a news story online or fill a news broadcast with content.

Q8. What is the procedure used to estimate the motion between a pair of neighboring frames?

The conducted motion between a pair of neighboring frames is estimated by computing the region-level optical flow based on the procedure depicted in Fig. 1.5, which consists of the following steps:• each frame undergoes an image resizing process that maintains the original aspect ratio and makes the frame width equal to w, and then it is spatially fragmented into four quartiles;• the most prominent corners in each quartile are detected based on the algorithm of [38]; • the detected corners are used for estimating the optical flow at the region-level by utilizing the Pyramidal Lucas Kanade (PLK) method; • based on the extracted optical flow, a mean displacement vector is computed for each quartile, and the four spatially distributed vectors are treated as a regionlevel representation of the motion activity between the pair of frames.

Q9. What was the performing algorithm of the tested approaches?

For each one of the tested approaches the number of correct detections (where the detected boundary can lie within a temporal window around the respective groundtruth boundary, equal to twice the video frame-rate), misdetections and false alarms were counted and the algorithms’ performance was expressed in terms of Precision (P), Recall (R) and F-Score (F), similarly to [1, 2].

Q10. What is the effective method for generating keyframe summaries?

the keyframe selection strategy of the first alternative combined with the competitive performance of the InVID approach in most examined cases, indicates the InVID method as the most efficient one in generating keyframe-based video summaries that are well-balanced according to the determined criteria for the descriptiveness (completion) and representativeness (conciseness) of the keyframe collection.

Q11. What are the main methods that are related to the analysis of rushes video?

Most of them are related to approaches for video summarization and keyframe selection (e.g. [21, 9, 29, 15]), some focus on the analysis of egocentric or wearable videos (e.g. [27, 41, 19]), others aim to address the need for detecting duplicates of videos (e.g. [8]), a number of them is related to the indexing and annotation of personal videos (e.g. [28]),while there is a group of methods that targeted the indexing and summarization of rushes video (e.g. [12, 25, 4, 36]).

Q12. What is the common type of fake news?

One type of fakes, probably the easiest to do and thus one of most commonly found by journalists, relies on the reuse of a video from an earlier event with the claim that it shows a contemporary event.

Q13. What technology has made it possible to embed powerful, high-resolution video sensors into portable devices?

The recent advances in video capturing technology made possible the embedding of powerful, high-resolution video sensors into portable devices, such as camcorders, digital cameras, tablets and smartphones.

Q14. What is the general approach for motion-based video parsing?

Contrary to the use of experimentally-defined thresholds for categorizing the detected camera motion, [18] describes a generic approach for motion-based video parsing that estimates the affine motion parameters, either based on motion vectors of the MPEG-2 stream or by applying a frame-to-frame image registration process, factorizes their values via Singular Value Decomposition (SVD) and imports them into three multi-class Support Vector Machines (SVMs) to recognize the camera motion type and direction between successive video frames.

Q15. How does the algorithm compute the similarity scores of video frames?

Driven by this observation, the algorithm does not apply the aforementioned pair-wise similarity estimation on every pair of consecutive video frames, but only for neigboring frames selected via a frame-sampling strategy which keeps 3 equally distant frames per second.

Video Fragmentation and Reverse Search on the Web.

Figures

Citations

Examining fake news comments on Facebook: an application of situational theory of problem solving in content analysis

The Challenges of Studying Misinformation on Video-Sharing Platforms During Crises and Mass-Convergence Events

References

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Object recognition from local scale-invariant features

Speeded-Up Robust Features (SURF)

ORB: An efficient alternative to SIFT or SURF

Good features to track

Related Papers (5)

Frame clustering technique towards single video summarization

Video Summarization using Keyframe Extraction and Video Skimming.

Searching in Video Collections Using Sketches and Sample Images – The Cineast System

Unsupervised video summarization framework using keyframe extraction and video skimming

High-level feature detection from video in TRECVid: a 5-year retrospective of achievements

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "Video fragmentation and reverse search on the web" ?

Q2. What have the authors stated for future works in "Video fragmentation and reverse search on the web" ?

Q3. What is the main reason for the rise of UGVs?

Q4. Why did the authors build their own ground-truth dataset?

Q5. What are the three recent platforms that assist the detection and retrieval of near duplicates of an?

Q6. What is the way to produce fake news?

Q7. What is the main challenge of the in-time identification of media?

Q8. What is the procedure used to estimate the motion between a pair of neighboring frames?

Q9. What was the performing algorithm of the tested approaches?

Q10. What is the effective method for generating keyframe summaries?

Q11. What are the main methods that are related to the analysis of rushes video?

Q12. What is the common type of fake news?

Q13. What technology has made it possible to embed powerful, high-resolution video sensors into portable devices?

Q14. What is the general approach for motion-based video parsing?

Q15. How does the algorithm compute the similarity scores of video frames?