scispace - formally typeset
Search or ask a question
Conference

Cross-Language Evaluation Forum 

About: Cross-Language Evaluation Forum is an academic conference. The conference publishes majorly in the area(s): Clef & Query expansion. Over the lifetime, 1499 publications have been published by the conference receiving 24450 citations.


Papers
More filters
Book ChapterDOI
03 Sep 2001
TL;DR: The fundamental assumptions and appropriate uses of the Cranfield paradigm, especially as they apply in the context of the evaluation conferences, are reviewed.
Abstract: Evaluation conferences such as TREC, CLEF, and NTCIR are modern examples of the Cranfield evaluation paradigm. In Cranfield, researchers perform experiments on test collections to compare the relative effectiveness of different retrieval approaches. The test collections allow the researchers to control the effects of different system parameters, increasing the power and decreasing the cost of retrieval experiments as compared to user-based evaluations. This paper reviews the fundamental assumptions and appropriate uses of the Cranfield paradigm, especially as they apply in the context of the evaluation conferences.

449 citations

Proceedings Article
01 Jan 2011
TL;DR: In PAN'10, 18 plagiarism detectors were evaluated in detail, highlighting several important aspects of plagiarism detection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length as mentioned in this paper.
Abstract: Thispaper overviews 18 plagiarism detectors that have been developed and evaluated within PAN'10. We start with a unified retrieval process that sum- marizes the best practices employed this year. Then, the detectors' performances are evaluated in detail, highlighting several important aspects of plagiarism de- tection, such as obfuscation, intrinsic vs. external plagiarism, and plagiarism case length. Finally, all results are compared to those of last year's competition.

419 citations

Book ChapterDOI
23 Sep 2013
TL;DR: An evaluation lab with an aim to support the continuum of care by developing methods and resources that make clinical reports in English easier to understand for patients, and which helps them in finding information related to their condition.
Abstract: Discharge summaries and other free-text reports in healthcare transfer information between working shifts and geographic locations. Patients are likely to have difficulties in understanding their content, because of their medical jargon, non-standard abbreviations, and ward-specific idioms. This paper reports on an evaluation lab with an aim to support the continuum of care by developing methods and resources that make clinical reports in English easier to understand for patients, and which helps them in finding information related to their condition. This ShARe/CLEFeHealth2013 lab offered student mentoring and shared tasks: identification and normalisation of disorders 1a and 1b and normalisation of abbreviations and acronyms 2 in clinical reports with respect to terminology standards in healthcare as well as information retrieval 3 to address questions patients may have when reading clinical reports. The focus on patients' information needs as opposed to the specialised information needs of physicians and other healthcare workers was the main feature of the lab distinguishing it from previous shared tasks. De-identified clinical reports for the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Task 3 were from the Internet and originated from the Khresmoi project. Task 1 annotations originated from the ShARe annotations. For Tasks 2 and 3, new annotations, queries, and relevance assessments were created. 64, 56, and 55 people registered their interest in Tasks 1, 2, and 3, respectively. 34 unique teams 3 members per team on average participated with 22, 17, 5, and 9 teams in Tasks 1a, 1b, 2 and 3, respectively. The teams were from Australia, China, France, India, Ireland, Republic of Korea, Spain, UK, and USA. Some teams developed and used additional annotations, but this strategy contributed to the system performance only in Task 2. The best systems had the F1 score of 0.75 in Task 1a; Accuracies of 0.59 and 0.72 in Tasks 1b and 2; and Precision at 10 of 0.52 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.

232 citations

BookDOI
01 Jan 2004
TL;DR: The paper discusses the evaluation approach adopted, describes the tracks and tasks offered and the test collections used, and provides an outline of the guidelines given to the participants.
Abstract: We describe the overall organization of the CLEF 2003 evaluation campaign, with a particular focus on the cross-language ad hoc and domainspecific retrieval tracks. The paper discusses the evaluation approach adopted, describes the tracks and tasks offered and the test collections used, and provides an outline of the guidelines given to the participants. It concludes with an overview of the techniques employed for results calculation and analysis for the monolingual, bilingual and multilingual and GIRT tasks.

214 citations

Book ChapterDOI
05 Sep 2016
TL;DR: A novel early detection task is proposed and a novel effectiveness measure is defined to systematically compare early detection algorithms that takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases.
Abstract: Several studies in the literature have shown that the words people use are indicative of their psychological states. In particular, depression was found to be associated with distinctive linguistic patterns. However, there is a lack of publicly available data for doing research on the interaction between language and depression. In this paper, we describe our first steps to fill this gap. We outline the methodology we have adopted to build and make publicly available a test collection on depression and language use. The resulting corpus includes a series of textual interactions written by different subjects. The new collection not only encourages research on differences in language between depressed and non-depressed individuals, but also on the evolution of the language use of depressed individuals. Further, we propose a novel early detection task and define a novel effectiveness measure to systematically compare early detection algorithms. This new measure takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases. We also present baseline results with novel detection methods that process users’ interactions in different ways.

199 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
202152
202037
201953
201854
201753
201663