scispace - formally typeset
Open AccessBook ChapterDOI

Overview of ImageCLEF 2018: Challenges, Datasets and Evaluation

TLDR
An overview of the ImageCLEF 2018 evaluation campaign is presented, with over 100 research groups registering and 31 submitting results for the tasks, shows an increasing interest in this benchmarking campaign.
Abstract
This paper presents an overview of the ImageCLEF 2018 evaluation campaign, an event that was organized as part of the CLEF (Conference and Labs of the Evaluation Forum) Labs 2018. ImageCLEF is an ongoing initiative (it started in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval with the aim of providing information access to collections of images in various usage scenarios and domains. In 2018, the 16th edition of ImageCLEF ran three main tasks and a pilot task: (1) a caption prediction task that aims at predicting the caption of a figure from the biomedical literature based only on the figure image; (2) a tuberculosis task that aims at detecting the tuberculosis type, severity and drug resistance from CT (Computed Tomography) volumes of the lung; (3) a LifeLog task (videos, images and other sources) about daily activities understanding and moment retrieval, and (4) a pilot task on visual question answering where systems are tasked with answering medical questions. The strong participation, with over 100 research groups registering and 31 submitting results for the tasks, shows an increasing interest in this benchmarking campaign.

read more

Content maybe subject to copyright    Report

Overview of ImageCLEF 2018:
Challenges, Datasets and Evaluation
Bogdan Ionescu
1(
B
)
, Henning M¨uller
2
, Mauricio Villegas
3
,
Alba Garc´ıa Seco de Herrera
4
, Carsten Eickhoff
5
, Vincent Andrearczyk
2
,
Yashin Dicente Cid
2
, Vitali Liauchuk
6
, Vassili Kovalev
6
, Sadid A. Hasan
7
,
Yuan Ling
7
, Oladimeji Farri
7
,JoeyLiu
7
, Matthew Lungren
8
,
Duc-Tien Dang-Nguyen
9
, Luca Piras
10
, Michael Riegler
11,12
, Liting Zhou
9
,
Mathias Lux
13
, and Cathal Gurrin
9
1
University Politehnica of Bucharest, Bucharest, Romania
bionescu@alpha.imag.pub.ro
2
University of Applied Sciences Western Switzerland (HES-SO), Sierre, Switzerland
3
omni:us, Berlin, Germany
4
University of Essex, Colchester, UK
5
Brown University, Providence, RI, USA
6
United Institute of Informatics Problems, Minsk, Belarus
7
Artificial Intelligence Lab, Philips Research North America, Cambridge, MA, USA
8
Department of Radiology, Stanford University, Stanford, CA, USA
9
Dublin City University, Dublin, Ireland
10
University of Cagliari and Pluribus One, Cagliari, Italy
11
University of Oslo, Oslo, Norway
12
Simula Metropolitan Center for Digital Engineering, Oslo, Norway
13
Klagenfurt University, Klagenfurt, Austria
Abstract. This paper presents an overview of the ImageCLEF 2018
evaluation campaign, an event that was organized as part of the CLEF
(Conference and Labs of the Evaluation Forum) Labs 2018. ImageCLEF
is an ongoing initiative (it started in 2003) that promotes the evalua-
tion of technologies for annotation, indexing and retrieval with the aim
of providing information access to collections of images in various usage
scenarios and domains. In 2018, the 16th edition of ImageCLEF ran three
main tasks and a pilot task: (1) a caption prediction task that aims at
predicting the caption of a figure from the biomedical literature based
only on the figure image; (2) a tuberculosis task that aims at detecting
the tuberculosis type, severity and drug resistance from CT (Computed
Tomography) volumes of the lung; (3) a LifeLog task (videos, images
and other sources) about daily activities understanding and moment
retrieval, and (4) a pilot task on visual question answering where systems
are tasked with answering medical questions. The strong participation,
with over 100 research groups registering and 31 submitting results for
the tasks, shows an increasing interest in this benchmarking campaign.
c
Springer Nature Switzerland AG 2018
P. Bellot et al. (Eds.): CLEF 2018, LNCS 11018, pp. 309–334, 2018.
https://doi.org/10.1007/978-3-319-98932-7
_28

310 B. Ionescu et al.
1 Introduction
One or two decades ago getting access to large visual data sets for research was
a problem and open data collections that could be used to compare algorithms
of researchers were rare. Now, it is getting easier to access data collections but it
is still hard to obtain annotated data with a clear evaluation scenario and strong
baselines to compare against. Motivated by this, ImageCLEF has for 16 years
been an initiative that aims at evaluating multilingual or language independent
annotation and retrieval of images [5,21,23,25,39]. The main goal of ImageCLEF
is to support the advancement of the field of visual media analysis, classification,
annotation, indexing and retrieval. It proposes novel challenges and develops the
necessary infrastructure for the evaluation of visual systems operating in different
contexts and providing reusable resources for benchmarking. It is also linked to
initiatives such as Evaluation-as-a-Service (EaaS) [17,18].
Many research groups have participated over the years in these evaluation
campaigns and even more have acquired its datasets for experimentation. The
impact of ImageCLEF can also be seen by its significant scholarly impact indi-
cated by the substantial numbers of its publications and their received cita-
tions [36].
There are other evaluation initiatives that have had a close relation with
ImageCLEF. LifeCLEF [22] was formerly an ImageCLEF task. However, due to
the need to assess technologies for automated identification and understanding
of living organisms using data not only restricted to images, but also videos
and sound, it was decided to be organised independently from ImageCLEF.
Other CLEF labs linked to ImageCLEF, in particular the medical task, are:
CLEFeHealth [14] that deals with processing methods and resources to enrich
difficult-to-understand eHealth text and the BioASQ [4] tasks from the Question
Answering lab that targets biomedical semantic indexing and question answering
but is now not a lab anymore. Due to their medical orientation, the organisation
is coordinated in close collaboration with the medical tasks in ImageCLEF. In
2017, ImageCLEF explored synergies with the MediaEval Benchmarking Initia-
tive for Multimedia Evaluation [15], which focuses on exploring the “multi” in
multimedia: speech, audio, visual content, tags, users, context. MediaEval was
founded in 2008 as VideoCLEF, a track in the CLEF Campaign.
This paper presents a general overview of the ImageCLEF 2018 evaluation
campaign
1
, which as usual was an event organised as part of the CLEF labs
2
.
The remainder of the paper is organized as follows. Section 2 presents a gen-
eral description of the 2018 edition of ImageCLEF, commenting about the overall
organisation and participation in the lab. Followed by this are sections dedicated
to the four tasks that were organised this year: Sect. 3 for the Caption Task,
Sect. 4 for the Tuberculosis Task, Sect. 5 for the Visual Question Answering
Task, and Sect. 6 for the Lifelog Task. For the full details and complete results
on the participating teams, the reader should refer to the corresponding task
1
http://imageclef.org/2018/.
2
http://clef2018.clef-initiative.eu/.

Overview of ImageCLEF 2018: Challenges, Datasets and Evaluation 311
overview papers [7,11,19,20]. The final section concludes the paper by giving an
overall discussion, and pointing towards the challenges ahead and possible new
directions for future research.
2 Overview of Tasks and Participation
ImageCLEF 2018 consisted of three main tasks and a pilot task that covered
challenges in diverse fields and usage scenarios. In 2017 [21] the proposed chal-
lenges were almost all new in comparison to 2016 [40], the only exception being
Caption Prediction that was a subtask already attempted in 2016, but for which
no participant submitted results. After such a big change, for 2018 the objective
was to continue most of the tasks from 2017. The only change was that the
2017 Remote Sensing pilot task was replaced by a novel one on Visual Question
Answering. The 2018 tasks are the following:
ImageCLEFcaption: Interpreting and summarizing the insights gained
from medical images such as radiology output is a time-consuming task that
involves highly trained experts and often represents a bottleneck in clinical
diagnosis pipelines. Consequently, there is a considerable need for automatic
methods that can approximate this mapping from visual information to con-
densed textual descriptions. The task addresses the problem of bio-medical
image concept detection and caption prediction from large amounts of train-
ing data.
ImageCLEFtuberculosis: The main objective of the task is to provide
a tuberculosis severity score based on the automatic analysis of lung CT
images of patients. Being able to extract this information from the image
data alone allows to limit lung washing and laboratory analyses to determine
the tuberculosis type and drug resistances. This can lead to quicker decisions
on the best treatment strategy, reduced use of antibiotics and lower impact
on the patient.
ImageCLEFlifelog: An increasingly wide range of personal devices, such
as smart phones, video cameras as well as wearable devices that allow cap-
turing pictures, videos, and audio clips of every moment of life are becoming
available. Considering the huge volume of data created, there is a need for
systems that can automatically analyse the data in order to categorize, sum-
marize and also to retrieve query-information that the user may desire. Hence,
this task addresses the problems of lifelog data understanding, summarization
and retrieval.
ImageCLEF-VQA-Med (pilot task): Visual Question Answering is a new
and exciting problem that combines natural language processing and com-
puter vision techniques. With the ongoing drive for improved patient engage-
ment and access to the electronic medical records via patient portals, patients
can now review structured and unstructured data from labs and images to
text reports associated with their healthcare utilization. Such access can help
them better understand their conditions in line with the details received from
their healthcare provider. Given a medical image accompanied with a set of

312 B. Ionescu et al.
clinically relevant questions, participating systems are tasked with answering
the questions based on the visual image content.
In order to participate in the evaluation campaign, the research groups first
had to register by following the instructions on the ImageCLEF 2018 web page.
To ease the overall management of the campaign, this year the challenge was
organized through the crowdAI platform
3
. To get access to the datasets, the
participants were required to submit a signed End User Agreement (EUA) form.
Table 1 summarizes the participation in ImageCLEF 2018, including the number
of registrations (counting only the ones that downloaded the EUA) and the
number of signed EUAs, indicated both per task and for the overall Lab. The
table also shows the number of groups that submitted results (runs) and the
ones that submitted a working notes paper describing the techniques used.
The number of registrations could be interpreted as the initial interest that
the community has for the evaluation. However, it is a bit misleading because
several persons from the same institution might register, even though in the
end they count as a single group participation. The EUA explicitly requires all
groups that get access to the data to participate, even though this is not enforced.
Unfortunately, the percentage of groups that submit results is often limited.
Nevertheless, as observed in studies of scholarly impact [36,37], in subsequent
years the datasets and challenges provided by ImageCLEF often get used, in
part due to the researchers that for some reason (e.g. alack of time, or other
priorities) were unable to participate in the original event or did not complete
the tasks by the deadlines.
After a decrease in participation in 2016, the participation again increased in
2017 and for 2018 it increased further. The number of signed EUAs is consider-
ably higher, mostly due to the fact that this time each task had an independent
EUA. Also, due to the change to crowdAI, the online registration became easier
and attracted other research groups than usual, which made the registration-
to-participation ratio lower than in previous years. Nevertheless, in the end, 31
groups participated and 28 working notes papers were submitted, which is a
slight increase with respect to 2017. The following four sections are dedicated to
each of the tasks. Only a short overview is reported, including general objectives,
description of the tasks and datasets and a short summary of the results.
3 The Caption Task
This task studies algorithmic approaches to medical image understanding. As
a testbed for doing so, teams were tasked with automatically “guessing” fitting
keywords or free-text captions that best describe an image from a collection of
images published in the biomedical literature.
3
https://www.crowdai.org/.

Overview of ImageCLEF 2018: Challenges, Datasets and Evaluation 313
Table 1. Key figures of participation in ImageCLEF 2018.
Task Registered &
downloaded
EUA
Signed EUA Groups that
subm. results
Submitted
working notes
Caption 84 46 8 6
Tuberculosis 85 33 11 11
VQA-Med 58 28 5 5
Lifelog 38 25 7 7
Overall 265
132
31 29
Total for all tasks, not unique groups/emails.
3.1 Task Setup
Following the structure of the 2017 edition, two sub tasks were proposed. The
first task, concept detection, aims to extract the main biomedical concepts rep-
resented in an image based only on its visual content. These concepts are UMLS
(Unified Medical Language System
R
) Concept Unique Identifiers (CUIs). The
second task, caption prediction, aims to compose coherent free-text captions
describing the image based only on the visual information. Participants were, of
course, allowed to use the UMLS CUIs extracted in the first task to compose
captions from individual concepts. Figure 1 shows an example of the information
available in the training set. An image is accompanied by a set of UMLS CUIs
and a free-text caption. Compared to 2017 the data sets was modified strongly
to respond to some of the difficulties with the task in the past [13].
3.2 Dataset
The dataset used in this task is derived from figures and their corresponding
captions extracted from biomedical articles on PubMed Central
R
(PMC)
4
. This
data set was changed strongly compared to the same task run in 2017 to reduce
the diversity on the data and limit the number of compound figures. A subset
of clinical figures was automatically obtained from the overall set of 5.8 million
PMC figures using a deep multimodal fusion of Convolutional Neural Networks
(CNN), described in [2]. In total, the dataset is comprised of 232,305 image–
caption pairs split into disjoint training (222,305 pairs) and test (10,000 pairs)
sets. For the Concept Detection subtask, concepts present in the caption text
were extracted using the QuickUMLS library [30]. After having observed a strong
breadth of concepts and image types in the 2017 edition of the task, this year’s
continuation focused on radiology artifacts, introducing a greater topical focus
to the collection.
4
https://www.ncbi.nlm.nih.gov/pmc/.

Citations
More filters
Proceedings ArticleDOI

Medical Visual Question Answering via Conditional Reasoning

TL;DR: A novel conditional reasoning framework is proposed, aiming to automatically learn effective reasoning skills for various Med-VQA tasks, and a question-conditioned reasoning module is developed to guide the importance selection over multimodal fusion features.

Overview of ImageCLEF 2018 Medical Domain Visual Question Answering Task.

TL;DR: This paper presents an overview of the inaugural edition of the ImageCLEF 2018 Medical Domain Visual Question Answering (VQA-Med) task, a pilot task proposed this year to focus on visual question answering in the medical domain.
Proceedings Article

Overview of ImageCLEFlifelog 2018: daily living understanding and lifelog moment retrieval

TL;DR: The ImageCLEFlifelog 2018 tries to overcome problems and make the task accessible for an even broader audience (eg, pre-extracted features are provided) and is divided into two subtasks (challenges) which are lifelogging moment retrieval (LMRT) and the Activities of Daily Living understanding (ADLT).
Journal ArticleDOI

Analysis of tuberculosis severity levels from CT pulmonary images based on enhanced residual deep learning architecture

TL;DR: This research investigates the application of CT pulmonary images to the detection and characterisation of TB at five levels of severity, in order to monitor the efficacy of treatment.
References
More filters
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Proceedings ArticleDOI

VQA: Visual Question Answering

TL;DR: The task of free-form and open-ended Visual Question Answering (VQA) is proposed, given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
Proceedings ArticleDOI

Verb semantics and lexical selection

Abstract: This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT). Two groups of English and Chinese verbs are examined to show that lexical selection must be based on interpretation of the sentences as well as selection restrictions placed on the verb arguments. A novel representation scheme is suggested, and is compared to representations with selection restrictions used in transfer-based MT. We see our approach as closely aligned with knowledge-based MT approaches (KBMT), and as a separate component that could be incorporated into existing systems. Examples and experimental results will show that, using this scheme, inexact matches can achieve correct lexical selection.
Posted Content

VQA: Visual Question Answering

TL;DR: The task of free-form and open-ended Visual Question Answering (VQA) is proposed, given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
Journal ArticleDOI

Evaluating performance of biomedical image retrieval systems--an overview of the medical image retrieval task at ImageCLEF 2004-2013.

TL;DR: A standard test bed was created that allows researchers to compare their approaches and ideas on increasingly large and varied data sets including generated ground truth, and the lessons learned in ten evaluation campaigns are described.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions in "Overview of imageclef 2018: challenges, datasets and evaluation" ?

This paper presents an overview of the ImageCLEF 2018 evaluation campaign, an event that was organized as part of the CLEF ( Conference and Labs of the Evaluation Forum ) Labs 2018. ImageCLEF is an ongoing initiative ( it started in 2003 ) that promotes the evaluation of technologies for annotation, indexing and retrieval with the aim of providing information access to collections of images in various usage scenarios and domains. In 2018, the 16th edition of ImageCLEF ran three main tasks and a pilot task: ( 1 ) a caption prediction task that aims at predicting the caption of a figure from the biomedical literature based only on the figure image ; ( 2 ) a tuberculosis task that aims at detecting the tuberculosis type, severity and drug resistance from CT ( Computed Tomography ) volumes of the lung ; ( 3 ) a LifeLog task ( videos, images and other sources ) about daily activities understanding and moment retrieval, and ( 4 ) a pilot task on visual question answering where systems are tasked with answering medical questions. 

On the other hand it is a much more modern platform that offers new possibilities, for example continuously running the challenge even beyond the workshop dates.