scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Benchmark of Visual Storytelling in Social Media

TL;DR: The SocialStories benchmark, comprised of total of 40 curated stories covering sports and cultural events, provides the experimental setup and introduces novel quantitative metrics to perform a rigorous evaluation of visual storytelling with social media data.
Abstract: Media editors in the newsroom are constantly pressed to provide a "like-being there" coverage of live events. Social media provides a disorganised collection of images and videos that media professionals need to grasp before publishing their latest news updated. Automated news visual storyline editing with social media content can be very challenging, as it not only entails the task of finding the right content but also making sure that news content evolves coherently over time. To tackle these issues, this paper proposes a benchmark for assessing social media visual storylines. The SocialStories benchmark, comprised by total of 40 curated stories covering sports and cultural events, provides the experimental setup and introduces novel quantitative metrics to perform a rigorous evaluation of visual storytelling with social media data.

Summary (2 min read)

1 INTRODUCTION

  • Editorial coverage of events is often a challenging task, in that media professionals need to identify interesting stories, summarise each story, and illustrate the story episodes, in order to inform the public about how an event unfolded over time.
  • The authors created three types of storylines: news article, investigative topics, and review topics.
  • The authors propose a new metric that assesses the quality of a visual storyline in terms of its relevance and transition between segment illustrations.

2.1 SocialStories: Event Data and Storylines

  • To enable social media visual storyline illustration, a data collection strategy was designed to create a suitable corpora, limiting the number of retrieved documents to those posted during the span of the event.
  • Events adequate for storytelling were selected, namely those with strong social-dynamics in terms of temporal variations 1HTTPS://NOVASEARCH.ORG/DATASETS/. with respect to their semantics (textual vocabulary and visual content).
  • Le Tour de France (TDF) is one of the main road cycling race competitions.
  • The authors keyword-based approach, consists of querying the social media APIs with a set of keyword terms.
  • Therefore a set of relevant hashtags grouping content of the same topic was also manually defined.

2.2 Visual Storyline Quality Metric

  • Media editors are constantly judging the quality of news material to decide if it deserves being published.
  • The task is highly skillful and deriving a methodology from such process is not straightforward.
  • The first step towards the quantification of visual storyline quality concerns the human-judgement of these different dimensions.
  • Once a visual storyline is generated, annotators will judge the relevance of the story segment illustration as: si=0: the image/video is not relevant to the story segment; si=1: the image/video is relevant to the story segment; si=2: the image/video is highly relevant to the story segment.
  • Given the underlying subjectivity of the task, the values of α or β that optimally represents the human perception of visual stories, are in fact average values.

3.1 Protocol and Ground-truth

  • The goal of this experiment is demonstrate the robustness of the proposed benchmark.
  • Target storylines and segments were obtained using several methods, resulting in a total of 40 generated storylines (20 for each event), each comprising 3 to 4 segments.
  • Ground truth for both relevant segment illustrations, transitions and global story quality were obtained as described in the following section.
  • Stories were visualised and assessed in a specifically designed prototype interface.
  • Using the subjective assessment of the annotators, the score proposed in Section 2.2 was calculated for each story.

3.2 Quality Metric vs Human Judgement

  • To do so, the authors computed the metric based on the relevance of segments and transitions between segments, and related it to the overall story rating assigned by annotators.
  • Figure 3 compares the annotator rating to the quality metric.
  • These values show that linear increments in the ratings provided by the annotators were matched by the metric.
  • Thus, these results show that the metric Quality effectively emulates the human perception of visual storyline quality.

3.3 Automatic Visual Storytelling

  • Figure 4 (a) presents the influence of illustrations in the story Quality metric introduced in Section 2.2.
  • In scenarios where relevant content is scarce, the approach is hindered by noise.
  • Hence, the performance of these baselines was lower than that of Text Retrieval.
  • The CNN Dense baseline, minimises distance between representations extracted from the penultimate layer of the visual concept detector.
  • Additionally, and similarly to what was observed while assessing the segment illustration baselines, Figure 4 shows that creating storylines with good transitions is easier for the TDF dataset than for the EdFest dataset.

4 CONCLUSIONS

  • This paper addressed the problem of automatic visual story editing using social media data that run in TRECVID2018.
  • Media professionals are asked to cover large events and are required to manually process large amounts of social media data to create event plots and select appropriate pieces of content for each segment.
  • The main contribution of this paper is a benchmark to assess the overall quality of a visual story based on the relevance of individual illustrations and transitions between consecutive segment illustrations.
  • It was shown that the proposed experimental test-bed proved to be effective in the assessment of story editing and composition with social media material.
  • This work has been partially funded by the GoLocal CMU-Portugal project Ref. CMUP-ERI/TIC/0046/2014, by the COGNITUS H2020 ICT project No 687605 and by the project NOVA LINCS Ref.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A Benchmark of Visual Stor ytelling in Social Media
Gonçalo Marcelino
NOVALINCS
Uni. NOVA de Lisboa, Portugal
goncalo.bfm@gmail.com
David Semedo
NOVALINCS
Uni. NOVA de Lisboa, Portugal
df.semedo@campus.fct.unl.pt
André Mourão
NOVALINCS
Uni. NOVA de Lisboa, Portugal
a.mourao@campus.fct.unl.pt
Saverio Blasi
BBC Research and Development
London, UK
saverio.blasi@bbc.co.uk
Marta Mrak
BBC Research and Development
London, UK
marta.mrak@bbc.co.uk
João Magalhães
NOVALINCS
Uni. NOVA de Lisboa, Portugal
jm.magalhaes@fct.unl.pt
ABSTRACT
Media editors in the newsroom are constantly pressed to provide
a "like-being there" coverage of live events. Social media provides
a disorganised collection of images and videos that media profes-
sionals need to grasp before publishing their latest news updated.
Automated news visual storyline editing with social media content
can be very challenging, as it not only entails the task of nding
the right content but also making sure that news content evolves
coherently over time. To tackle these issues, this paper proposes
a benchmark for assessing social media visual storylines. The So-
cialStories benchmark, comprised by total of 40 curated stories cov-
ering sports and cultural events, provides the experimental setup
and introduces novel quantitative metrics to perform a rigorous
evaluation of visual storytelling with social media data.
KEYWORDS
Storytelling, social media, benchmark
ACM Reference Format:
Gonçalo Marcelino, David Semedo, André Mourão, Saverio Blasi, Marta
Mrak, and João Magalhães. 2019. A Benchmark of Visual Storytelling in
Social Media. In International Conference on Multimedia Retrieval (ICMR ’19),
June 10–13, 2019, Ottawa, ON, Canada. ACM, New York, NY, USA, 5 pages.
https://doi.org/10.1145/3323873.3325047
1 INTRODUCTION
Editorial coverage of events is often a challenging task, in that media
professionals need to identify interesting stories, summarise each
story, and illustrate the story episodes, in order to inform the public
about how an event unfolded over time. Thanks to its widespread
adoption, social media services oer a vast amount of available
content, both textual and visual, and is therefore ideal to support
the creation and illustration of these event stories [46, 12, 16].
The timeline of an event, e.g. a music festival, a sport tourna-
ment or a natural disaster [
13
], contains visual and textual pieces
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
ICMR ’19, June 10–13, 2019, Ottawa, ON, Canada
©
2019 Copyright held by the owner/author(s). Publication rights licensed to Associa-
tion for Computing Machinery.
ACM ISBN 978-1-4503-6765-3/19/06.. . $15.00
https://doi.org/10.1145/3323873.3325047
of information that are strongly correlated. There are several ways
of presenting the same event, by covering specic storylines, each
oering dierent perspectives. These storylines, illustrated in Fig-
ure 1, refer to a story topic and related subtopics, and are structured
into story segments that should describe narrow occurrences over
the course of the event. More formally, we dene a Visual Storyline
as a sequence of segments, referring to an event topic, with each
segment being dened by a textual description and comprising an
image or a video.
Figure 1: Visual storyline editing task: a news story topic and
story segments can be illustrated by social media content.
In Figure 1 we illustrate the newsroom workow tackled by this
paper: once the story topic and story segments are created, the me-
dia editor selects images/videos from social media platforms and
organise the retrieved content according to a coherent narrative, i.e.
the visual storyline. Many social media platforms, including Twitter,
Flickr or YouTube provide a stream of multimodal social media
content, naturally yielding an unltered event timeline. These time-
lines can be listened to, mined [
2
,
3
,
6
], and exploited to gather
visual content, specically image and video [11, 14].
The primary contribution of this paper is the introduction of a
quality metric to assess visual storylines. This metric is designed to
evaluate the quality of an automatically illustrated storyline, based
on computational aspects that attempt to mimic the human-driven
editorial perspective. The quality metric focuses on the relevance
Spotlight Presentation 3
ICMR ’19, June 10–13, 2019, Ottawa, ON, Canada
324

Table 1: Dataset statistics for each event, including both terms and hashtags. The event and crawling time spans are shown.
Ev
ent
Stories Docs Docs w/images Docs w/videos Crawling span Crawling seeds
EdFest
20
82,348
Twitter: 15439 Twitter: 3690 From: 2016-07-01 Terms Edinburgh Festival, Edfest, Edinburgh Festival 2016, Edfest 2016
Flickr: 5908 Youtube: 293 Until: 2017-01-01 Hashtags #edfest, #edfringe, #EdinburghFestival, #edinburghfest
TDF
20
325,074
Twitter: 34865 Twitter: 8677 From: 2016-06-01 Terms le tour de france, le tour de france 2016, tour de france
Flickr: 6442 Youtube: 983 Until: 2017-01-01 Hashtags #TDF2016, #TDF
of each individual segment’s illustration and on the general ow
of the storyline, i.e. the transitions from segment’s illustration to
segment’s illustration. The second contribution of this paper is a
social media dataset with news storylines to allow the research of
visual storytelling with social media data. The benchmark, devel-
oped for TRECVID2018 [
1
], provides a rigorous setting to research
the underpinnings of visual social-storytelling. The most relevant
aspect of this benchmark is the realistic nature of the storylines,
that mimic the newsroom media editorial process: some stories
were manually investigated and inferred from social media and
other stories were constructed from existing news articles.
2 SOCIAL STORIES BENCHMARK
1
Assessing the success of news visual storyline creation is a complex
task. In this section, we address this task and propose the SocialSto-
ries benchmark. Visual storytelling datasets like [
7
] and [
8
] contain
sequences of image-caption pairs, that capture a specic activity,
e.g., "playing frisb ee with a dog". A characteristic of these stories
is that the sequence of visual elements is very coherent (visually
and textually), which is highly unlikely to occur in social media.
Hence, these stories do not match the ones a journalist or a media
professional needs to create and illustrate on a daily basis. This
highlights the importance of a suitable experimental test-bed for
the task at hand.
The SocialStories benchmark provides the experimental setup
and metrics to perform a rigorous evaluation of the task of creating
visual storylines from social media data. The core aspects of the
SocialStories benchmark are:
Storylines:
We created three types of storylines: news ar-
ticle, investigative topics, and review topics. These were
either obtained from newswire articles, or created manually
through data inspection.
Assessing Visual Storylines Quality:
Assessing the qual-
ity of a sequence of information is a novel and challenging
task. We propose a new metric that assesses the quality of
a visual storyline in terms of its relevance and transition
between segment illustrations.
The following sections detail the types of storylines considered in
the benchmark and proposes a story quality assessment metric.
2.1 SocialStories: Event Data and Storylines
To enable social media visual storyline illustration, a data collection
strategy was designed to create a suitable corpora, limiting the
number of retrieved documents to those posted during the span of
the event. Events adequate for storytelling were selected, namely
those with strong social-dynamics in terms of temporal variations
1
HTTPS://NOVASEARCH.ORG/DATASETS/.
with respect to their semantics (textual vocabulary and visual con-
tent). In other words, the unfolding of the event stories is encoded
in each collection. Events that span over multiple days like music
festivals, sports competitions, etc., are examples of good storyline
candidates. Taking the aforementioned aspects into account, the
data for the following events was crawled (Table 1):
The Edinburgh Festival (EdFest)
consists of a celebration
of the performing arts, gathering dance, opera, music and
theatre performers from all over the world. The event takes
place in Edinburgh, Scotland and has a duration of 3 weeks
in August.
Le Tour de France (TDF)
is one of the main road cycling race
competitions. The event takes place in France (16 days), Spain
(1 day), Andorra (3 days) and Switzerland (3 days).
2.1.1 Crawling Strategy. Our keyword-based approach, consists
of querying the social media APIs with a set of keyword terms.
Thus, a curated list of keywords was manually selected for each
event. Furthermore, hashtags in social media play the essential role
of grouping similar content (e.g. content belonging to the same
event) [
9
]. Therefore a set of relevant hashtags grouping content
of the same topic was also manually dened. The data collected
is detailed in Table 1. With no loss of generality, Twitter data was
used for the experiments reported in the following sections.
2.1.2 Story Segments. For each event, newsworthy, informative
and interesting topics are considered, containing diverse visual
material either in terms of low-level visual aspects (colour, back-
grounds, shapes, etc.) and/or semantic visual aspects. Each storyline
contains 3 to 4 story segments.
2.2 Visual Storyline Quality Metric
Media editors are constantly judging the quality of news material
to decide if it deserves being published. The task is highly skillful
and deriving a methodology from such process is not straightfor-
ward. The task of identifying visual material suitable to describe
each story segment is, from the perspective of media profession-
als, highly subjective. The motivation for why some content may
be used to illustrate specic segments can derive from a variety
of factors. While subjective preference obviously plays a part in
this process (which cannot be replicated by an automated process),
other factors are also important which come from common practice
and general guidelines, and which can be mimicked by objective
quality assessment metrics.
The rst step towards the quantication of visual storyline qual-
ity concerns the human-judgement of these dierent dimensions.
This is achieved in a sound manner by judging specic objective
characteristics of the story Figure 2 illustrates the visual storyline
quality assessment framework. In particular, storyline illustrations
Spotlight Presentation 3
ICMR ’19, June 10–13, 2019, Ottawa, ON, Canada
325

Figure 2: Benchmarking visual storytelling creation.
are assessed in terms of relevance of illustrations (blue links in Fig-
ure 2) and coherence of transitions (red links in Figure 2). Once a
visual storyline is generated, annotators will judge the relevance of
the story segment illustration as:
s
i
=0: the image/video is not relevant to the story segment;
s
i
=1: the image/video is relevant to the story segment;
s
i
=2: the image/video is highly relevant to the story segment.
Similarly with respect to the coherence of a visual storyline, each
story transition is judged by annotators as the degree of anity
between pairs of story segment illustrations:
t
i
=0: there is no relation between the segment illustrations;
t
i
=1: there is a relation between the two segments;
t
i
=2: there is an appealing semantic and visual coherence
between the two segment illustrations.
These two dimensions can be used to obtain an overall expression
of the "quality" of a given illustration for a story of
N
segments.
This is formalised by the expression:
Quality = α · s
1
+
(1 α)
2(N 1)
N
Õ
i =2
pairwiseQ(i) (1)
The function
pairwiseQ(i)
denes quantitatively the perceived qual-
ity of two neighbouring segment illustrations based on their rele-
vance and transition:
pairwiseQ(i) = β · (s
i
+ s
i 1
)
| {z }
segments illustration
+ (1 β) · (s
i 1
· s
i
+ t
i 1
)
| {z }
transition
(2)
where
α
weights the importance of the rst segment, and
β
weights
the trade-o between relevance of segment illustrations and coher-
ence of transitions towards the overall quality of the story.
Given the underlying subjectivity of the task, the values of
α
or
β
that optimally represents the human perception of visual stories,
are in fact average values. Nevertheless, we posit the following
two reasonable criteria: (i) illustrating with non-relevant elements
(
s
i
=
0) completely breaks the story perception and should be
penalised. Thus, we consider values of
β >
0
.
5; and (ii) the rst
image/video perceived is assumed to be more important, as it should
grab the attention towards consuming the rest of the story. Thus,
α
is a boost to the rst story segment
s
1
. It was empirically found
that
α =
0
.
1 and
β =
0
.
6 adequately represent human perception
of visual stories editing.
3 EVALUATION
3.1 Protocol and Ground-truth
Protocol.
The goal of this experiment is demonstrate the robust-
ness of the proposed benchmark. Target storylines and segments
were obtained using several methods, resulting in a total of 40
generated storylines (20 for each event), each comprising 3 to 4
segments. Ground truth for both relevant segment illustrations,
transitions and global story quality were obtained as described in
the following section.
Ground-truth. Three annotators were presented with each story
title, and asked to rate (i) each segment illustration as relevant or
non-relevant, (ii) the transitions between each of the segments, and
nally, (iii) the overall story quality. Stories were visualised and
assessed in a specically designed prototype interface. It presents
media in a sequential manner to create the right story mindset to
the user. Using the subjective assessment of the annotators, the
score proposed in Section 2.2 was calculated for each story.
3.2 Quality Metric vs Human Judgement
In order to test the stability of the metric proposed to emulate the
human’s perception of visual stories quality, we resorted to crowd-
sourcing. To do so, we computed the metric based on the relevance
of segments and transitions between segments, and related it to the
overall story rating assigned by annotators. Figure 3 compares the
annotator rating to the quality metric. These values show that linear
increments in the ratings provided by the annotators were matched
by the metric. As can be seen, the relation is strong and relatively
stable, which is a good indicator of the metric stability. Thus, these
results show that the metric Quality eectively emulates the human
perception of visual storyline quality.
3.3 Automatic Visual Storytelling
3.3.1 Results: Illustrations. Figure 4
(a)
presents the inuence of
illustrations in the story Quality metric introduced in Section 2.2.
A Text Retrieval baseline (BM25) was the best performing base-
line for both events. This shows the importance of considering the
text of social media documents when choosing the images to illus-
trate the segments. Approaches based on social signals - (#Retweets
and #Duplicates) - also attained good results. Particularly, when
inspecting the storylines that result from using the #Duplicates base-
line (Figure 4), an increase in aesthetic quality of the images selected
for illustration can be noticed. This shows that social signals are a
powerful indicator of the quality of social media content. However,
in scenarios where relevant content is scarce, the approach is hin-
dered by noise. This is especially problematic in cases where there
are no strong social signals associated with the available content.
A visual concept detector baseline, based on a pre-trained VGG-
16 [
15
] CNN, did not perform as expected. A Concept Pool method
Spotlight Presentation 3
ICMR ’19, June 10–13, 2019, Ottawa, ON, Canada
326

0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Quality Metric
Human Judgment
Figure 3: Correlation be-
tween human judgement
and quality metric.
0.539
0.436
0.500
0.497
0.368
0.528
0.468
0.450
0.467
0.428
0.438
0.359
0.30
0.35
0.40
0.45
0.50
0.55
0.60
BM25 #Retweets #Duplicates Concept
Pool
Concept
Query
Temporal
model
Quality
(a) Illustrations quality
0.929
0.926
0.905
0.881
0.886
0.884
0.959 0.959
0.961
0.945
0.937
0.92
0.85
0.90
0.95
1.00
Colour
histograms
CNN
Dense
Colour
moments
Luminance Visual
entropy
Visual
concepts
Quality
(b) Transitions quality.
Figure 4: Analysis of quality metric in terms of illustration relevance and transition consis-
tency.
selects the image with the 10 most popular visual concepts. It failed
often in correctly attributing concepts to the visual content of both
datasets. As an example, in TDF, for segments featuring cyclists,
unrelated concepts such as "unicycle", "bathing_cap" or "ballplayer"
appeared very frequently. However, even though the extracted
concepts lack precision, the concepts are extracted consistently
among dierent pieces of content. This improves the performance
of the method, since concepts are used to assess similarities in the
content. The Concept Query is a pseudo-relevance feedback-based
method that performed even worse. Hence, the performance of
these baselines was lower than that of Text Retrieval. Nevertheless,
they were able to eectively retrieve relevant content in situations
where relevant content that could be found through text retrieval
was scarce or non-existing.
Finally, we tested a temporal smoothing method [
10
], Temp. Mod-
eling to analyse the importance of temporal evidence. It performed
the worst on the EdFest test set, while performing second best for
the TDF test set. This high variation in performance is justied
by the strong impact the type of story being illustrated has on the
behaviour of this baseline. For story segments that take place over
the course of the whole event, the approach is not well suited as
the probabilities attributed to each segment illustration candidate
are virtually identical, and therefore the baseline does not dier-
entiate well between visual elements. However, in story segments
with large variations on the amount of content posted per day, the
approach provides a noticeably good performance.
Overall, as shown by the results presented in Figure 4, gener-
ating storylines for Tour de France stories is an easier task than
doing so for the Edinburgh Festival stories. As a result 4 of the 6
baselines tested performed better in the task of illustrating Tour de
France stories. This happens due to the heterogeneous nature of
the media and stories associated with Edinburgh Festival (where
very dierent activities happen) which makes the task of retriev-
ing relevant content more dicult. Additionally, the availability of
fewer images and videos for Edinburgh Festival accentuates this
problem, as there is clearly less data to exploit when creating the
storylines.
3.3.2 Results: Transitions. Figure 4(b) shows the performance
of the proposed transitions baselines on the task of illustrating the
EdFest and TDF storylines, using the story quality metric introduced
in Section 2.2, calculated based on the judgements of the annotators.
All baselines use the same pool of manually selected relevant visual
content, for each segment.
The CNN Dense baseline, minimises distance between represen-
tations extracted from the penultimate layer of the visual concept
detector. This baseline provided the best performance, highlighting
the importance of taking into account semantics when optimising
the quality of transitions. Semantics may not be enough to evaluate
the quality of transitions though. In fact, using single concepts as is
the case for the Visual Concepts baseline provides very poor results,
stressing the importance of taking into account multiple aspects.
The second and third best performing baseline focus on min-
imising the colour dierence between images in a storyline: Colour
Histograms and Colour Moments. This supports the assumption that
illustrating storylines using content with similar colour palettes is
a solid way to optimise the quality of visual storylines. Conversely,
illustrating storylines by selecting images with similar degrees of
entropy and luminance, using the Visual Entropy and Luminance
baselines, provided worst results. Additionally, and similarly to
what was observed while assessing the segment illustration base-
lines, Figure 4 shows that creating storylines with good transitions
is easier for the TDF dataset than for the EdFest dataset.
4 CONCLUSIONS
This paper addressed the problem of automatic visual story editing
using social media data that run in TRECVID2018. Media profes-
sionals are asked to cover large events and are required to manually
process large amounts of social media data to create event plots and
select appropriate pieces of content for each segment. We tackle
the novel task of automating this process, enabling media profes-
sionals to take full advantage of social media content. The main
contribution of this paper is a benchmark to assess the overall quality
of a visual story based on the relevance of individual illustrations
and transitions between consecutive segment illustrations. It was
shown that the proposed experimental test-bed proved to be eec-
tive in the assessment of story editing and composition with social
media material.
Acknowledgements.
This work has been partially funded by the
GoLocal CMU-Portugal project Ref. CMUP-ERI/TIC/0046/2014, by
the COGNITUS H2020 ICT project No 687605 and by the project
NOVA LINCS Ref. UID/CEC/04516/2013.
Spotlight Presentation 3
ICMR ’19, June 10–13, 2019, Ottawa, ON, Canada
327

REFERENCES
[1]
George Awad, Asad Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal
Godil, David Joy, Andrew Delgado, Alan F. Smeaton, Yvette Graham, Wessel
Kraaij, Georges Quénot, Joao Magalhaes, David Semedo, and Saverio Blasi. 2018.
TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning
and Matching, Video Storytelling Linking and Video Search. In Proceedings of
TRECVID 2018. NIST, USA.
[2]
Deepayan Chakrabarti and Kunal Punera. 2011. Event Summarization Using
Tweets. In International AAAI Conference on Web and Social Media.
[3]
Freddy Chong Tat Chua and Sitaram Asur. 2013. Automatic Summarization of
Events from Social Media.. In ICWSM, Emre Kiciman, Nicole B. Ellison, Bernie
Hogan, Paul Resnick, and Ian Soboro (Eds.). The AAAI Press.
[4]
Diogo Delgado, Joao Magalhaes, and Nuno Correia. 2010. Automated illustra-
tion of news stories. In 2010 IEEE Fourth International Conference on Semantic
Computing. IEEE, 73–78.
[5]
Erika Doggett and Alejandro Cantarero. 2016. Identifying Eyewitness News-
worthy Events on Twitter. In SocialNLP@EMNLP.
[6]
Mengdie Hu, Shixia Liu, Furu Wei, Yingcai Wu, John Stasko, and Kwan-Liu Ma.
2012. Breaking News on Twitter. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems (CHI ’12).
[7]
Ting-Hao K. Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Jacob
Devlin, Aishwarya Agrawal, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv
Batra, et al
.
2016. Visual Storytelling. In 15th Annual Conference of the North
American Chapter of the Association for Computational Linguistics (NAACL 2016).
[8]
Gunhee Kim, Seungwhan Moon, and Leonid Sigal. 2015. Ranking and retrieval
of image sequences from multiple paragraph queries. In IEEE Conference on
Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12,
2015. IEEE Computer Society, 1993–2001.
[9]
David Laniado and Peter Mika. 2010. Making Sense of Twitter. In Proceedings
of the 9th International Semantic Web Conference on The Semantic Web - Volume
Part I (ISWC’10). Springer-Verlag, Berlin, Heidelberg, 470–485.
[10]
Flávio Martins, João Magalhães, and Jamie Callan. 2016. Barbara Made the News:
Mining the Behavior of Crowds for Time-Aware Learning to Rank. In Proceedings
of the Ninth ACM International Conference on Web Search and Data Mining (WSDM
’16). ACM, New York, NY, USA, 667–676. https://doi.org/10.1145/2835776.2835825
[11]
Philip J. McParlane, Andrew James McMinn, and Joemon M. Jose. 2014. "Picture
the Scene...";: Visually Summarising Social Media Events. In ACM CIKM.
[12]
Steve Paulussen and Pieter Ugille. 2008. User generated content in the news-
room: Professional and organisational constraints on participatory journalism.
Westminster Papers in Communication & Culture 5, 2 (2008).
[13]
Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake Shakes
Twitter Users: Real-time Event Detection by Social Sensors. In Proceedings of the
19th International Conference on World Wide Web (WWW ’10). 10.
[14]
Manos Schinas, Symeon Papadopoulos, Yiannis Kompatsiaris, and Pericles A.
Mitkas. 2015. Visual Event Summarization on Social Media using Topic Mod-
elling and Graph-based Ranking Algorithms. In Proceedings of the 5th ACM on
International Conference on Multimedia Retrieval - ICMR ’15.
[15]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Net-
works for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 (2014).
[16]
Peter Tolmie, Rob Procter, Dave W. Randall, Mark Rounceeld, Christian Burger,
Geraldine Wong Sak Hoi, Arkaitz Zubiaga, and Maria Liakata. 2017. Supporting
the Use of User Generated Content in Journalistic Practice. In CHI.
Spotlight Presentation 3
ICMR ’19, June 10–13, 2019, Ottawa, ON, Canada
328
Citations
More filters
Proceedings ArticleDOI
21 Oct 2019
TL;DR: Looking forward, this penetration of AI opens new challenges, such as interpretability of deep learning (to enable use AI in an accountable way as well as to enable AI-inspired low-complexity algorithms) and applicability in systems which require low- complexity solutions and/or do not have enough training data.
Abstract: Numerous breakthroughs in multimedia signal processing are being enabled thanks to applications of machine learning in tasks such as multimedia creation, enhancement, classification and compression [1]. Notably, in the context of production and distribution of television programmes, it has been successfully demonstrated how Artificial Intelligence (AI) can support innovation in the creative sector. In the context of delivering TV programmes of stunning visual quality, the applications of deep learning have enabled significant advances when the original content is of poor quality / resolution, or when delivery channels are very limited. Examples when the enhancement of originally poor quality is needed include new content forms (e.g. user generated content) and historical content (e.g. archives), while limitations of delivery channels can, first of all, be addressed by improving content compression. As a state-of-the-art example, the benefits of deep-learning solutions have been recently demonstrated within an end-to-end platform for management of user generated content [2], where deep learning is applied to increase video resolution, evaluate video quality and enrich the video by providing automatic metadata. Within this particular application space where large amount of user generated content is available, the progress has also been made in addressing visual story editing using social media data in automatic ways, making programmes from large amount of content faster [3]. Broadcasters are also interested in restauration of historical content more cheaply. For example, adding colour to "black and white" content has until now been an expensive and time-consuming task. However, recently new algorithms have been developed to perform the task more efficiently. Generative Adversarial Networks (GANs) have become the baseline for many image-to-image translation tasks, including image colourisation. Aiming at the generation of more naturally coloured images from "black and white" sources, newest algorithms are capable of generalisation of the colour of natural images, producing realistic and plausible results [4]. In the context of content delivery, new generations of compression standards enable significant reduction of required bandwidth [5], however, with a cost of increased computational complexity. This is another area where AI can be utilised for better efficiency - either in its simple forms as decision trees [6,7] or more advanced deep convolutional neural networks [8]. Looking forward, this penetration of AI opens new challenges, such as interpretability of deep learning (to enable use AI in an accountable way as well as to enable AI-inspired low-complexity algorithms) and applicability in systems which require low-complexity solutions and/or do not have enough training data. However, overall further benefits of these new approaches include automatization of many traditional production tasks which has the potential to transform the way content providers make their programmes in cheaper and more effective ways.

2 citations


Cites background from "A Benchmark of Visual Storytelling ..."

  • ...Within this particular application space where large amount of user generated content is available, the progress has also been made in addressing visual story editing using social media data in automatic ways, making programmes from large amount of content faster [3]....

    [...]

Proceedings ArticleDOI
10 Oct 2022
TL;DR: A context-enriched Multimodal Transformer model is proposed, NewsLXMERT, capable of jointly attending to complementary multimodal news data perspectives, to create knowledge-rich and diverse multi-modal sequences.
Abstract: The connection between news and the images that illustrate them goes beyond visual concept to natural language matching. Instead, the open-domain and event-reporting nature of news leads to semantically complex texts, in which images are used as a contextualizing element. This connection is often governed by a certain level of indirection, with journalistic criteria also playing an important role. In this paper, we address the complex challenge of connecting images to news text. A context-enriched Multimodal Transformer model is proposed, NewsLXMERT, capable of jointly attending to complementary multimodal news data perspectives. The idea is to create knowledge-rich and diverse multimodal sequences, going beyond the news headline (often lacking the necessary context) and visual objects, to effectively ground images to news pieces. A comprehensive evaluation of challenging image-news piece matching settings is conducted, where we show the effectiveness of NewsLXMERT, the importance of leveraging the additional context and demonstrate the usefulness of the obtained pre-trained news representations for transfer-learning. Finally, to shed light on the heterogeneous nature of the problem, we contribute with a systematic model-driven study that identifies image-news matching profiles, thus explaining news piece-image matches.
References
More filters
Proceedings ArticleDOI
03 Nov 2014
TL;DR: This paper investigates how images can be used as a source for summarising events in social media and proposes new techniques for their automatic selection, ranking and presentation.
Abstract: Due to the advent of social media and web 2.0, we are faced with a deluge of information; recently, research efforts have focused on filtering out noisy, irrelevant information items from social media streams and in particular have attempted to automatically identify and summarise events. However, due to the heterogeneous nature of such social media streams, these efforts have not reached fruition. In this paper, we investigate how images can be used as a source for summarising events. Existing approaches have considered only textual summaries which are often poorly written, in a different language and slow to digest. Alternatively, images are "worth 1,000 words" and are able to quickly and easily convey an idea or scene. Since images in social media can also be noisy, irrelevant and repetitive, we propose new techniques for their automatic selection, ranking and presentation. We evaluate our approach on a recently created social media event data set containing 365k tweets and 50 events, for which we extend by collecting 625k related images. By conducting two crowdsourced evaluations, we firstly show how our approach overcomes the problems of automatically collecting relevant and diverse images from noisy microblog data, before highlighting the advantages of multimedia summarisation over text based approaches.

25 citations


"A Benchmark of Visual Storytelling ..." refers background in this paper

  • ...These timelines can be listened to, mined [2, 3, 6], and exploited to gather visual content, specifically image and video [11, 14]....

    [...]

Proceedings ArticleDOI
01 Nov 2016
TL;DR: A filter for identifying posts from eyewitnesses to various event types on Twitter, including shootings, police activity, and protests is presented, which combines sociolinguistic markers and targeted language content with straightforward keywords and regular expressions to yield good accuracy in the returned tweets.
Abstract: In this paper we present a filter for identifying posts from eyewitnesses to various event types on Twitter, including shootings, police activity, and protests. The filter combines sociolinguistic markers and targeted language content with straightforward keywords and regular expressions to yield good accuracy in the returned tweets. Once a set of eyewitness posts in a given semantic context has been produced by the filter, eyewitness events can subsequently be identified by enriching the data with additional geolocation information and then applying a spatio-temporal clustering. By applying these steps we can extract a complete picture of the event as it occurs in real-time, sourced entirely from social media.

24 citations

Proceedings ArticleDOI
08 Feb 2016
TL;DR: This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic, using a novel time-aware ranking model that leverages on multiple sources of crowd signals.
Abstract: In Twitter, and other microblogging services, the generation of new content by the crowd is often biased towards immediacy: what is happening now. Prompted by the propagation of commentary and information through multiple mediums, users on the Web interact with and produce new posts about newsworthy topics and give rise to trending topics. This paper proposes to leverage on the behavioral dynamics of users to estimate the most relevant time periods for a topic. Our hypothesis stems from the fact that when a real-world event occurs it usually has peak times on the Web: a higher volume of tweets, new visits and edits to related Wikipedia articles, and news published about the event. In this paper, we propose a novel time-aware ranking model that leverages on multiple sources of crowd signals. Our approach builds on two major novelties. First, a unifying approach that given query q, mines and represents temporal evidence from multiple sources of crowd signals. This allows us to predict the temporal relevance of documents for query q. Second, a principled retrieval model that integrates temporal signals in a learning to rank framework, to rank results according to the predicted temporal relevance. Evaluation on the TREC 2013 and 2014 Microblog track datasets demonstrates that the proposed model achieves a relative improvement of 13.2% over lexical retrieval models and 6.2% over a learning to rank baseline.

14 citations


"A Benchmark of Visual Storytelling ..." refers methods in this paper

  • ...Finally, we tested a temporal smoothing method [10], Temp....

    [...]

Frequently Asked Questions (1)
Q1. What have the authors contributed in "A benchmark of visual storytelling in social media" ?

Media editors in the newsroom are constantly pressed to provide a `` like-being there '' coverage of live events. To tackle these issues, this paper proposes a benchmark for assessing social media visual storylines. The SocialStories benchmark, comprised by total of 40 curated stories covering sports and cultural events, provides the experimental setup and introduces novel quantitative metrics to perform a rigorous evaluation of visual storytelling with social media data.