scispace - formally typeset
Search or ask a question

Showing papers presented at "Cross-Language Evaluation Forum in 2014"


Book ChapterDOI
15 Sep 2014
TL;DR: This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling, which forms the largest collection of softwares for these tasks to date.
Abstract: This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling. To improve the reproducibility of shared tasks in general, and PAN’s tasks in particular, the Webis group developed a new web service called TIRA, which facilitates software submissions. Unlike many other labs, PAN asks participants to submit running softwares instead of their run output. To deal with the organizational overhead involved in handling software submissions, the TIRA experimentation platform helps to significantly reduce the workload for both participants and organizers, whereas the submitted softwares are kept in a running state. This year, we addressed the matter of responsibility of successful execution of submitted softwares in order to put participants back in charge of executing their software at our site. In sum, 57 softwares have been submitted to our lab; together with the 58 software submissions of last year, this forms the largest collection of softwares for our three tasks to date, all of which are readily available for further analysis. The report concludes with a brief summary of each task.

171 citations


Book ChapterDOI
15 Sep 2014
TL;DR: The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients.
Abstract: This paper reports on the 2nd ShARe/CLEFeHealth evaluation lab which continues our evaluation resource building activities for the medical domain. In this lab we focus on patients’ information needs as opposed to the more common campaign focus of the specialised information needs of physicians and other healthcare workers. The usage scenario of the lab is to ease patients and next-of-kins’ ease in understanding eHealth information, in particular clinical reports. The 1st ShARe/CLEFeHealth evaluation lab was held in 2013. This lab consisted of three tasks. Task 1 focused on named entity recognition and normalization of disorders; Task 2 on normalization of acronyms/abbreviations; and Task 3 on information retrieval to address questions patients may have when reading clinical reports. This year’s lab introduces a new challenge in Task 1 on visual-interactive search and exploration of eHealth data. Its aim is to help patients (or their next-of-kin) in readability issues related to their hospital discharge documents and related information search on the Internet. Task 2 then continues the information extraction work of the 2013 lab, specifically focusing on disorder attribute identification and normalization from clinical text. Finally, this year’s Task 3 further extends the 2013 information retrieval task, by cleaning the 2013 document collection and introducing a new query generation method and multilingual queries. De-identified clinical reports used by the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Tasks 1 and 3 were from the Internet and originated from the Khresmoi project. Task 2 annotations originated from the ShARe annotations. For Tasks 1 and 3, new annotations, queries, and relevance assessments were created. 50, 79, and 91 people registered their interest in Tasks 1, 2, and 3, respectively. 24 unique teams participated with 1, 10, and 14 teams in Tasks 1, 2 and 3, respectively. The teams were from Africa, Asia, Canada, Europe, and North America. The Task 1 submission, reviewed by 5 expert peers, related to the task evaluation category of Effective use of interaction and targeted the needs of both expert and novice users. The best system had an Accuracy of 0.868 in Task 2a, an F1-score of 0.576 in Task 2b, and Precision at 10 (P@10) of 0.756 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.

105 citations


Proceedings Article
15 Sep 2014
TL;DR: This overview presents more precisely the resources and assessments of task, summarizes the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results.
Abstract: The LifeCLEFs plant identication task provides a testbed for a system-oriented evaluation of plant identication about 500 species trees and herbaceous plants. Seven types of image content are considered: scan and scan-like pictures of leaf, and 6 kinds of detailed views with unconstrained conditions, directly photographed on the plant: ower, fruit, stem & bark, branch, leaf and entire view. The main originality of this data is that it was specically built through a citizen sciences initiative conducted by Tela Botanica, a French social network of amateur and expert botanists. This makes the task closer to the conditions of a realworld application. This overview presents more precisely the resources and assessments of task, summarizes the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results. With a total of ten groups from six countries and with a total of twenty seven submitted runs, involving distinct and original methods, this fourth year task confirms Image & Multimedia Retrieval community interest for biodiversity and botany, and highlights further challenging studies in plant identication

89 citations


Book ChapterDOI
15 Sep 2014
TL;DR: The tasks and the 2014 competition are described, giving a unifying perspective of the present activities of the ImageCLEF lab while discussing future challenges and opportunities.
Abstract: This paper presents an overview of the ImageCLEF 2014 evaluation lab. Since its first edition in 2003, ImageCLEF has become one of the key initiatives promoting the benchmark evaluation of algorithms for the annotation and retrieval of images in various domains, such as public and personal images, to data acquired by mobile robot platforms and medical archives. Over the years, by providing new data collections and challenging tasks to the community of interest, the ImageCLEF lab has achieved an unique position in the image annotation and retrieval research landscape. The 2014 edition consists of four tasks: domain adaptation, scalable concept image annotation, liver CT image annotation and robot vision. This paper describes the tasks and the 2014 competition, giving a unifying perspective of the present activities of the lab while discussing future challenges and opportunities.

84 citations


Book ChapterDOI
15 Sep 2014
TL;DR: This paper presents the 2014 edition of LifeCLEF, i.e. the pilot one, and proposes to evaluate three tasks related to multimedia information retrieval and fine-grained classification problems in three living worlds based on large and real-world data.
Abstract: Using multimedia identification tools is considered as one of the most promising solutions to help bridging the taxonomic gap and build accurate knowledge of the identity, the geographic distribution and the evolution of living species. Large and structured communities of nature observers (e.g. eBird, Xeno-canto, Tela Botanica, etc.) as well as big monitoring equipments have actually started to produce outstanding collections of multimedia records. Unfortunately, the performance of the state-of-the-art analysis techniques on such data is still not well understood and is far from reaching the real world’s requirements. The LifeCLEF lab proposes to evaluate these challenges around three tasks related to multimedia information retrieval and fine-grained classification problems in three living worlds. Each task is based on large and real-world data and the measured challenges are defined in collaboration with biologists and environmental stakeholders in order to reflect realistic usage scenarios. This paper presents more particularly the 2014 edition of LifeCLEF, i.e. the pilot one. For each of the three tasks, we report the methodology and the datasets as well as the official results and the main outcomes.

81 citations


Book ChapterDOI
15 Sep 2014
TL;DR: The organisation and results of RepLab 2014 are described, which focused on two new tasks: reputation dimensions classification and author profiling, which complement the aspects of reputation analysis studied in the previous campaigns.
Abstract: This paper describes the organisation and results of RepLab 2014, the third competitive evaluation campaign for Online Reputation Management systems. This year the focus lied on two new tasks: reputation dimensions classification and author profiling, which complement the aspects of reputation analysis studied in the previous campaigns. The participants were asked (1) to classify tweets applying a standard typology of reputation dimensions and (2) categorise Twitter profiles by type of author as well as rank them according to their influence. New data collections were provided for the development and evaluation of systems that participated in this benchmarking activity.

77 citations


Proceedings Article
01 Jan 2014
TL;DR: The PAN 2014 evaluation lab as mentioned in this paper proposed a new web service called TIRA, which facilitates software submissions and allows participants to submit running softwares instead of their run output, which helps to reduce the workload for both participants and organizers.
Abstract: This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling. To improve the reproducibility of shared tasks in general, and PAN’s tasks in particular, the Webis group developed a new web service called TIRA, which facilitates software submissions. Unlike many other labs, PAN asks participants to submit running softwares instead of their run output. To deal with the organizational overhead involved in handling software submissions, the TIRA experimentation platform helps to significantly reduce the workload for both participants and organizers, whereas the submitted softwares are kept in a running state. This year, we addressed the matter of responsibility of successful execution of submitted softwares in order to put participants back in charge of executing their software at our site. In sum, 57 softwares have been submitted to our lab; together with the 58 software submissions of last year, this forms the largest collection of softwares for our three tasks to date, all of which are readily available for further analysis. The report concludes with a brief summary of each task.

74 citations


Proceedings Article
01 Jan 2014
TL;DR: This paper describes the submission of the University of Washington's Center for Data Science to the PAN 2014 author profiling task, and reports accuracies obtained by two approaches to the multi-label classification problem of predicting both age and gender.
Abstract: This paper describes the submission of the University of Washington's Center for Data Science to the PAN 2014 author profiling task. We examine the predictive quality in terms of age and gender of several sets of features extracted from various genres of online social media. Through comparison, we establish a feature set which maximizes accuracy of gender and age prediction across all genres examined. We report accuracies obtained by two approaches to the multi-label classification problem of predicting both age and gender; a model wherein the multi-label problem is reduced to a single-label problem using powerset transformation, and a chained classifier approach wherein the output of a dedicated classifier for gender is used as input for a classifier for age.

61 citations


Proceedings Article
15 Sep 2014
TL;DR: An overview of the systems developed by the five participating research groups, the methodology of the evaluation of their performance, and an analysis and discussion of the results obtained are reported.
Abstract: The LifeCLEF challenge BirdCLEF offers a large-scale proving ground for system-oriented evaluation of bird species identification based on audio recordings of their sounds. One of its strengths is that it uses data collected through Xeno-canto, the worldwide community of bird sound recordists. This ensures that BirdCLEF is close to the conditions of real-world application, in particular with regard to the number of species in the training set (1500). The main novelty of the 2017 edition of BirdCLEF was the inclusion of soundscape recordings containing time-coded bird species annotations in addition to the usual Xeno-canto recordings that focus on a single foreground species. This paper reports an overview of the systems developed by the five participating research groups, the methodology of the evaluation of their performance, and an analysis and discussion of the results obtained.

51 citations


Book ChapterDOI
15 Sep 2014
TL;DR: It is argued that the living lab can serve as reference point for the implementation of living labs for the evaluation of information access systems and the experimental setup of the two benchmarking events is outlined.
Abstract: Most user-centric studies of information access systems in literature suffer from unrealistic settings or limited numbers of users who participate in the study. In order to address this issue, the idea of a living lab has been promoted. Living labs allow us to evaluate research hypotheses using a large number of users who satisfy their information need in a real context. In this paper, we introduce a living lab on news recommendation in real time. The living lab has first been organized as News Recommendation Challenge at ACM RecSys’13 and then as campaign-style evaluation lab NEWSREEL at CLEF’14. Within this lab, researchers were asked to provide news article recommendations to millions of users in real time. Different from user studies which have been performed in a laboratory, these users are following their own agenda. Consequently, laboratory bias on their behavior can be neglected. We outline the living lab scenario and the experimental setup of the two benchmarking events. We argue that the living lab can serve as reference point for the implementation of living labs for the evaluation of information access systems.

46 citations


Proceedings Article
01 Jan 2014
TL;DR: The aim of this paper is to give an overview of the data issued during the BioASQ track of the Question Answering Lab at CLEF 2014, and to present the systems that participated in the challenge and for which they received system descriptions.
Abstract: The goal of this task is to push the research frontier towards hybrid information systems. We aim to promote systems and approaches that are able to deal with the whole diversity of the Web, especially for, but not restricted to, the context of bio-medicine. This goal is pursued by the organization of challenges. The second challenge consisted of two tasks: semantic indexing and question answering. 61 systems partici- pated by 18 dierent participating teams for the semantic indexing task, of which between 25 and 45 participated in each batch. The semantic indexing task was tackled by 22 systems, which were developed by 8 dierent organizations. Between 15 and 19 of these systems addressed each batch. The question answering task was tackled by 18 dierent sys- The aim of this paper is twofold. First, we aim to give an overview of the data issued during the BioASQ track of the Question Answering Lab at CLEF 2014. In addition, we aim to present the systems that participated in the challenge and for which we received system descriptions. In particular, we aim to evaluate their performance w.r.t. to dedicated baseline systems. To achieve these goals, we begin by giving a brief overview of the tasks included in the track, including the timing of the dierent tasks and the challenge data. Thereafter, we give an overview of the systems which participated in the challenge and provided us with an overview of the technologies they relied upon. Detailed descriptions of some of the systems are given in lab proceedings. The evaluation of the systems, which was carried out by using state-of-the-art measures or manual assessment, is the last focal point of this paper. The conclusion sums up the results of the track.

Proceedings Article
18 Sep 2014
TL;DR: The Task 2 of the 2014 ShARe/CLEF eHealth evaluation lab as mentioned in this paper was focused on template filling of disorder attributes, and the task was comprised of two subtasks: attribute normalization and cue identification.
Abstract: This paper reports on Task 2 of the 2014 ShARe/CLEF eHealth evaluation lab which extended Task 1 of the 2013 ShARe/CLEF eHealth evaluation lab by focusing on template filling of disorder attributes. The task was comprised of two subtasks: attribute normalization (task 2a) and cue identification (task 2b). We instructed participants to develop a system which either kept or updated a default attribute value for each task. Participant systems were evaluated against a blind reference standard of 133 discharge summaries using Accuracy (task 2a) and F-score (task 2b). In total, ten teams participated in task 2a, and three teams in task 2b. For task 2a and 2b, the HITACHI team systems (run 2) had the highest performances, with an overall average average accuracy of 0.868 and F1-score (strict) of 0.676, respectively.

Proceedings Article
18 Sep 2014
TL;DR: This paper presents the recommendation algorithms used by the Insight UCD team participating in the CLEF-NewsREEL 2014 online news recommendation challenge.
Abstract: This paper presents the recommendation algorithms used by the Insight UCD team participating in the CLEF-NewsREEL 2014 online news recommendation challenge.

Proceedings Article
01 Jan 2014
TL;DR: The goal of the Interac- tive Social Book Search Track was to investigate how users used these two sources of information, when looking for books in a leisure context, using one of two book-search interfaces.
Abstract: Users looking for books online are confronted with both pro- fessional meta-data and user-generated content. The goal of the Interac- tive Social Book Search Track was to investigate how users used these two sources of information, when looking for books in a leisure context. To this end participants recruited by four teams performed two dierent tasks using one of two book-search interfaces. Additionally one of the two interfaces also investigated whether user performance can be improved by providing a user-interface that supports multiple search stages.

Book ChapterDOI
15 Sep 2014
TL;DR: This paper compares the performance of eleven summarisation approaches using four microblog summarisation datasets, with the aim of determining which are the most effective and therefore should be used as baselines in future research.
Abstract: Event detection and tracking using social media and user-generated content has received a lot of attention from the research community in recent years, since such sources can purportedly provide up-to-date information about events as they evolve, e.g. earthquakes. Concisely reporting (summarising) events for users/emergency services using information obtained from social media sources like Twitter is not a solved problem. Current systems either directly apply, or build upon, classical summarisation approaches previously shown to be effective within the newswire domain. However, to-date, research into how well these approaches generalise from the newswire to the microblog domain is limited. Hence, in this paper, we compare the performance of eleven summarisation approaches using four microblog summarisation datasets, with the aim of determining which are the most effective and therefore should be used as baselines in future research. Our results indicate that the SumBasic algorithm and Centroid-based summarisation with redundancy reduction are the most effective approaches, across the four datasets and five automatic summarisation evaluation measures tested.

Proceedings Article
01 Sep 2014
TL;DR: The goal of the INEX 2014 Social Book Search Track is to evaluate approaches for supporting users in searching collections of books based on book metadata and associated user-generated content and to explore the relative value of recommendation and retrieval paradigms for book search.
Abstract: The goal of the INEX 2014 Social Book Search Track is to evaluate approaches for supporting users in searching collections of books based on book metadata and associated user-generated content. The track investigates the complex nature of relevance in book search and the role of traditional and user-generated book metadata in retrieval. We extended last year's investigation into the nature of book sugges- tions from the LibraryThing forums and how they compare to book rele- vance judgements. Participants were encouraged to incorporate rich user proles of both topic creators and other LibraryThing users to explore the relative value of recommendation and retrieval paradigms for book search. We found further support that such suggestions are a valuable alternative to traditional test collections that are based on top-k pooling and editorial relevance judgements.

Proceedings Article
01 Jan 2014
TL;DR: The Antinomyra System that participated in the BioASQ Task 2a Challenge for the large-scale biomedical semantic indexing can automatically annotate MeSH terms for MEDLINE citations using only title and abstract information.
Abstract: This paper describes the Antinomyra System that participated in the BioASQ Task 2a Challenge for the large-scale biomedical semantic indexing. The system can automatically annotate MeSH terms for MEDLINE citations using only title and abstract information. With respect to the official test set (batch 3, week 5), based on 1867 annotated citations out of all 4533 citations (June 6, 2014), our best submission achieved 0.6199 in flat Micro F-measure. This is 9.8% higher than the performance of official NLM solution Medical Text Indexer (MTI), which achieved 0.5647 in flat F-measure.

Proceedings Article
01 Jan 2014
TL;DR: The idea was to define, develop and evaluate a simple machine learning classifier able to guess the gender and the age of a given user based on his/her texts, which could become part of the solution portfolio of the company.
Abstract: This paper describes our participation at PAN 2014 author profiling task. Our idea was to define, develop and evaluate a simple machine learning classifier able to guess the gender and the age of a given user based on his/her texts, which could become part of the solution portfolio of the company. We were interested in finding not the best possible classifier that achieves the highest accuracy, but to find the optimum balance between performance and throughput using the most simple strategy and less dependent of external systems. Results show that our software using Naive Bayes Multinomial with a term vector model representation of the text is ranked quite well among the rest of participants in terms of accuracy.

Book ChapterDOI
15 Sep 2014
TL;DR: The outcomes of a longitudinal study on the CLEF Ad Hoc track show a positive trend, even if the performance increase is not always steady from year to year, and bilingual retrieval has demonstrated higher improvements in recent years, probably due to the better linguistic resources now available.
Abstract: This paper reports the outcomes of a longitudinal study on the CLEF Ad Hoc track in order to assess its impact on the effectiveness of monolingual, bilingual and multilingual information access and retrieval systems. Monolingual retrieval shows a positive trend, even if the performance increase is not always steady from year to year; bilingual retrieval has demonstrated higher improvements in recent years, probably due to the better linguistic resources now available; and, multilingual retrieval exhibits constant improvement and performances comparable to bilingual (and, sometimes, even monolingual) ones.

Book ChapterDOI
15 Sep 2014
TL;DR: In this paper, a self-supervised relation extraction approach was applied to the biomedical domain using UMLS, a large biomedical knowledge base containing millions of concepts and relations among them.
Abstract: Self-supervised relation extraction uses a knowledge base to automatically annotate a training corpus which is then used to train a classifier. This approach has been successfully applied to different domains using a range of knowledge bases. This paper applies the approach to the biomedical domain using UMLS, a large biomedical knowledge base containing millions of concepts and relations among them. The approach is evaluated using two different techniques. The presented results are promising and indicate that UMLS is a useful resource for semi-supervised relation extraction.

Proceedings Article
01 Jan 2014
TL;DR: The participation of the SNUMedinfo team at the CLEFeHealth2014 task 3.0 is described.
Abstract: This paper describes the participation of the SNUMedinfo team at the CLEFeHealth2014 task 3. We submitted 7 runs to Task3a (monolingual infor- mation retrieval): 1 baseline run using query likelihood model in Indri search engine; 3 runs applying UMLS based lexical query expansion utilizing dis- charge summary as an expansion term filter; 3 runs applying learning to rank technique utilizing various document features. We submitted 4 runs to Task3b (multilingual information retrieval): 1 baseline run using Google Translate for the English translation; 3 runs applying learning to rank technique on the trans- lated query.


Book ChapterDOI
15 Sep 2014
TL;DR: Automatic processing can be used to dynamically monitor the quality of the CQA content and to compare different data sources and can be useful for symptomatic surveillance and health education campaigns.
Abstract: The paper reports on evaluation of Russian community question answering (CQA) data in health domain About 1,500 question–answer pairs were manually evaluated by medical professionals, in addition automatic evaluation based on reference disease–medicine pairs was performed Although the results of the manual and automatic evaluation do not fully match, we find the method still promising and propose several improvements Automatic processing can be used to dynamically monitor the quality of the CQA content and to compare different data sources Moreover, the approach can be useful for symptomatic surveillance and health education campaigns

Book ChapterDOI
15 Sep 2014
TL;DR: The use of the IR nanopublications will facilitate the assessment and comparison of IR systems and enhance the degree of reproducibility and reliability of IR research progress.
Abstract: Retrieval experiments produce plenty of data, like various experiment settings and experimental results, that are usually not all included in the published articles. Even if they are mentioned, they are not easily machine-readable. We propose the use of IR nanopublications to describe in a formal language such information. Furthermore, to support the unambiguous description of IR domain aspects, we present a preliminary IR ontology. The use of the IR nanopublications will facilitate the assessment and comparison of IR systems and enhance the degree of reproducibility and reliability of IR research progress.

Book ChapterDOI
15 Sep 2014
TL;DR: An overview of all the INEX 2014 tracks, their aims and task, the built test-collections, the participants, and an initial analysis of the results are given.
Abstract: INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2014 evaluation campaign, which consisted of three tracks: The Interactive Social Book Search Track investigated user information seeking behavior when interacting with various sources of information, for realistic task scenarios, and how the user interface impacts search and the search experience. The Social Book Search Track investigated the relative value of authoritative metadata and user-generated content for search and recommendation using a test collection with data from Amazon and LibraryThing, including user profiles and personal catalogues. The Tweet Contextualization Track investigated tweet contextualization, helping a user to understand a tweet by providing him with a short background summary generated from relevant Wikipedia passages aggregated into a coherent summary. INEX 2014 was an exciting year for INEX in which we for the third time ran our workshop as part of the CLEF labs. This paper gives an overview of all the INEX 2014 tracks, their aims and task, the built test-collections, the participants, and gives an initial analysis of the results.

Proceedings Article
15 Sep 2014
TL;DR: The different approaches to integrating user social information such as reviews, tags and ratings are presented, which are as- signed by users to the books.
Abstract: In the article, we describe our participation in the INEX 201 4 Social Book Search track. We present the different approaches expl oiting user social information such as reviews, tags and ratings. These social informations are as- signed by users to the books. We optimize our models using the INEX Social Book Search 2013 collection and we test them on the INEX 2014 S ocial Book Search track.

Book ChapterDOI
15 Sep 2014
TL;DR: A probabilistic distribution model to represent each document as a feature set to increase the interpretability of the results and features is proposed and a distance measure is introduced to compute the distance between two feature sets.
Abstract: Authorship identification was introduced as one of the important problems in the law and journalism fields and it is one of the major techniques in plagiarism detection. In this paper, to tackle the authorship verification problem, we propose a probabilistic distribution model to represent each document as a feature set to increase the interpretability of the results and features. We also introduce a distance measure to compute the distance between two feature sets. Finally, we exploit a KNN-based approach and a dynamic feature selection method to detect the features which discriminate the author’s writing style.

Proceedings Article
01 Jan 2014
TL;DR: The participation of the SNUMedinfo team at the BioASQ Task 2a and Task 2b of CLEF 2014 Question Answering track and semantic concept- enriched dependence model showed significant improvement over baseline are described.
Abstract: This paper describes the participation of the SNUMedinfo team at the BioASQ Task 2a and Task 2b of CLEF 2014 Question Answering track. Task 2a was about biomedical semantic indexing. We trained SVM classifiers to auto- matically assign relevant MeSH descriptors to the MEDLINE article. Regarding Task 2b biomedical question answering, we participated at the document retrieval subtask in Phase A and the ideal answer generation subtask in Phase B. In the document retrieval task, we mostly experimented with semantic concept-en- riched dependence model and sequential dependence model. Semantic concept- enriched dependence model showed significant improvement over baseline. In the ideal answer generation task, we reformulated task as, given relevant lists of passages, selecting the best ones to build the answer. We applied three heuristic methods.

Proceedings Article
15 Sep 2014
TL;DR: A machine learning approach based on several representations of the texts and on optimized decision trees which have as entry various attributes and which are learned for every train- ing corpus separately for this classification task.
Abstract: This article describes our proposal for the Author Identification task in the PAN CLEF Challenge 2014. We have adopted a machine learning ap- proach based on several representations of the texts and on optimized decision trees which have as entry various attributes and which are learned for every train- ing corpus separately for this classification task. Our method ranked us at the 2nd place with an overall AUC of 70.7%, and C@1 of 68.4% and, between the 1st and the 6th place on the six corpora.

Book ChapterDOI
15 Sep 2014
TL;DR: The track was divided into three tasks: QALD focused on translating natural language questions into SPARQL queries; BioASQ focused on the biomedical domain, and Entrance Exams focused on answering questions to assess machine reading capabilities.
Abstract: This paper describes the CLEF QA Track 2014. In the current general scenario for the CLEF QA Track, the starting point is always a natural language question. However, answering some questions may need to query Linked Data (especially if aggregations or logical inferences are required), some questions may need textual inferences and querying free text, and finally, answering some queries may require both sources of information. The track was divided into three tasks: QALD focused on translating natural language questions into SPARQL queries; BioASQ focused on the biomedical domain, and Entrance Exams focused on answering questions to assess machine reading capabilities.