scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Multilingual and cross-domain temporal tagging

01 Jun 2013-Vol. 47, Iss: 2, pp 269-298
TL;DR: The authors' publicly available temporal tagger HeidelTime is presented, which is easily extensible to further languages due to its strict separation of source code and language resources like patterns and rules.
Abstract: Extraction and normalization of temporal expressions from documents are important steps towards deep text understanding and a prerequisite for many NLP tasks such as information extraction, question answering, and document summarization. There are different ways to express (the same) temporal information in documents. However, after identifying temporal expressions, they can be normalized according to some standard format. This allows the usage of temporal information in a term- and language-independent way. In this paper, we describe the challenges of temporal tagging in different domains, give an overview of existing annotated corpora, and survey existing approaches for temporal tagging. Finally, we present our publicly available temporal tagger HeidelTime, which is easily extensible to further languages due to its strict separation of source code and language resources like patterns and rules. We present a broad evaluation on multiple languages and domains on existing corpora as well as on a newly created corpus for a language/domain combination for which no annotated corpus has been available so far.
Citations
More filters
Journal ArticleDOI
TL;DR: A survey of the existing literature on temporal information retrieval is presented, categorize the relevant research, describe the main contributions, and compare different approaches to provide a coherent view of the field.
Abstract: Temporal information retrieval has been a topic of great interest in recent years Its purpose is to improve the effectiveness of information retrieval methods by exploiting temporal information in documents and queries In this article, we present a survey of the existing literature on temporal information retrieval In addition to giving an overview of the field, we categorize the relevant research, describe the main contributions, and compare different approaches We organize existing research to provide a coherent view, discuss several open issues, and point out some possible future research directions in this area Despite significant advances, the area lacks a systematic arrangement of prior efforts and an overview of state-of-the-art approaches Moreover, an effective end-to-end temporal retrieval system that exploits temporal information to improve the quality of the presented results remains undeveloped

212 citations

Journal ArticleDOI
TL;DR: This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English and identifies major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.
Abstract: Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.

188 citations

Book
04 Jul 2016
TL;DR: This book brings together computational methods from many disciplines: natural language processing, semantic technologies, data mining, machine learning, network analysis, human-computer interaction, and information visualization, focusing on methods that are commonly used for processing social media messages under time-critical constraints.
Abstract: Social media is an invaluable source of time-critical information during a crisis. However, emergency response and humanitarian relief organizations that would like to use this information struggle with an avalanche of social media messages that exceeds the human capacity to process. Emergency managers, decision makers, and affected communities can make sense of social media through a combination of machine computation and human compassion - expressed by thousands of digital volunteers who publish, process, and summarize potentially life-saving information. This book brings together computational methods from many disciplines: natural language processing, semantic technologies, data mining, machine learning, network analysis, human-computer interaction, and information visualization, focusing on methods that are commonly used for processing social media messages under time-critical constraints, and offering more than 500 references to in-depth information.

183 citations

Proceedings ArticleDOI
01 Jun 2016
TL;DR: There was a gap between the best systems and human performance, but the gap was less than half the gap of Clinical TempEval 2015.
Abstract: Clinical TempEval 2016 evaluated temporal information extraction systems on the clinical domain. Nine sub-tasks were included, covering problems in time expression identification, event expression identification and temporal relation identification. Participant systems were trained and evaluated on a corpus of clinical and pathology notes from the Mayo Clinic, annotated with an extension of TimeML for the clinical domain. 14 teams submitted a total of 40 system runs, with the best systems achieving near-human performance on identifying events and times. On identifying temporal relations, there was a gap between the best systems and human performance, but the gap was less than half the gap of Clinical TempEval 2015.

174 citations


Cites methods from "Multilingual and cross-domain tempo..."

  • ...LIMSI (Grouin and Moriceau, 2016) submitted 2 runs for each phase, based on conditional random fields with lexical, morphological, and word cluster features, and the rule-based HeidelTime (Strötgen and Gertz, 2013)....

    [...]

Proceedings ArticleDOI
01 Aug 2017
TL;DR: Nine sub-tasks were included, covering problems in time expression identification, event expression identification and temporal relation identification, and most tasks observed about a 20 point drop over Clinical TempEval 2016.
Abstract: Clinical TempEval 2017 aimed to answer the question: how well do systems trained on annotated timelines for one medical condition (colon cancer) perform in predicting timelines on another medical condition (brain cancer)? Nine sub-tasks were included, covering problems in time expression identification, event expression identification and temporal relation identification. Participant systems were evaluated on clinical and pathology notes from Mayo Clinic cancer patients, annotated with an extension of TimeML for the clinical domain. 11 teams participated in the tasks, with the best systems achieving F1 scores above 0.55 for time expressions, above 0.70 for event expressions, and above 0.40 for temporal relations. Most tasks observed about a 20 point drop over Clinical TempEval 2016, where systems were trained and evaluated on the same domain (colon cancer).

148 citations


Cites methods from "Multilingual and cross-domain tempo..."

  • ...LIMSI (Grouin and Moriceau, 2016) submitted 2 runs for each phase, based on conditional random fields with lexical, morphological, and word cluster features, and the rule-based HeidelTime (Strötgen and Gertz, 2013)....

    [...]

  • ...LIMSI (Grouin and Moriceau, 2016) submitted 2 runs for each phase, based on conditional random fields with lexical, morphological, and word cluster features, and the rule-based HeidelTime (Strötgen and Gertz, 2013)....

    [...]

References
More filters
Book
01 Jan 2002
TL;DR: This collection of technical papers from leading researchers in the field not only provides several chapters devoted to the research program and its evaluation paradigm, but also presents the most current research results and describes some of the remaining open challenges.
Abstract: Topic Detection and Tracking: Event-based Information Organization brings together in one place state-of-the-art research in Topic Detection and Tracking (TDT). This collection of technical papers from leading researchers in the field not only provides several chapters devoted to the research program and its evaluation paradigm, but also presents the most current research results and describes some of the remaining open challenges. Topic Detection and Tracking: Event-based Information Organization is an excellent reference for researchers and practitioners in a variety of fields related to TDT, including information retrieval, automatic speech recognition, machine learning, and information extraction

872 citations


"Multilingual and cross-domain tempo..." refers background in this paper

  • ...For example, in topic detection and tracking, it helps to identify new unreported events and to assign documents to already detected events (see, e.g., Allan 2002; Makkonen et al. 2003)....

    [...]

01 Jan 2003
TL;DR: TimeML is described, a rich specification language for event and temporal expressions in natural language text, developed in the context of the AQUAINT program on Question Answering Systems, and demonstrated for a delayed (underspecified) interpretation of partially determined temporal expressions.
Abstract: In this paper we provide a description of TimeML, a rich specification language for event and temporal expressions in natural language text, developed in the context of the AQUAINT program on Question Answering Systems. Unlik em ost previous work on event annotation, TimeML capture st hree distinct phenomena in temporal markup: (1) it systematically anchors event predicates to a broad range of temporally denotating expressions; (2) it orders event expressions in text relative to one another, both intrasententially and in discourse; and (3) it allows for a delayed (underspecified) interpretation of partially determined temporal expressions. We demonstrate the expressiveness of TimeML for a broad range of syntactic and semantic contexts, including aspectual predication, modal subordination, and an initial treatment of lexical and constructional causation in text.

797 citations


Additional excerpts

  • ...Although there are some promising machine learning approaches for the extraction of temporal expressions, we developed HeidelTime as a rule-based system for the following reasons: (1) the divergence of temporal expressions is very limited compared to other named entity recognition and normalization tasks, e.g., the number of persons and organizations as well as the variety of names referring to these entities are probably infinite, (2) the normalization is hardly solvable without using rules, (3) resources for additional languages can be added without the need of an annotated corpus, and (4) the knowledge base can be extended in a modular way, e.g., for adding events and their temporal information such as ‘‘soccer world cup final 2010’’ that took place on July 11, 2010. Furthermore, for the ability to easily add and modify rules (req. E), we developed a well-defined rule syntax (see Sect. 4.1.2). As annotation format, HeidelTime uses the TimeML annotation standard of TIMEX3 tags for temporal expressions. Nevertheless, due to the similarities between TIMEX3 and TIMEX2, the tags can be converted into TIMEX2 as well—although not all attributes are supported. Similar to the transformation from TIMEX2 to TIMEX3 described by Saquete Boro (2010), though the other way around, we used this property to be able to evaluate HeidelTime on corpora annotated with TIMEX2....

    [...]

Proceedings ArticleDOI
03 Oct 2000
TL;DR: An annotation scheme for temporal expressions, and a method for resolving temporal expressions in print and broadcast news, based on both hand-crafted and machine-learnt rules are described.
Abstract: We introduce an annotation scheme for temporal expressions, and describe a method for resolving temporal expressions in print and broadcast news. The system, which is based on both hand-crafted and machine-learnt rules, achieves an 83.2% accuracy (F-measure) against hand-annotated data. Some initial steps towards tagging event chronologies are also described.

392 citations

Proceedings Article
15 Jul 2010
TL;DR: Tempeval-2 comprises evaluation tasks for time expressions, events and temporal relations, the latter of which was split up in four sub tasks, motivated by the notion that smaller subtasks would make both data preparation and temporal relation extraction easier.
Abstract: Tempeval-2 comprises evaluation tasks for time expressions, events and temporal relations, the latter of which was split up in four sub tasks, motivated by the notion that smaller subtasks would make both data preparation and temporal relation extraction easier. Manually annotated data were provided for six languages: Chinese, English, French, Italian, Korean and Spanish.

389 citations


"Multilingual and cross-domain tempo..." refers background or methods in this paper

  • ...On both corpora, HeidelTime significantly Table 3 Results of TempEval2 (Verhagen et al. 2010) and HeidelTime’s publicly available version P R F Value Type...

    [...]

  • ...In the context of TempEval-2, we developed HeidelTime’s first version of English resources using the TempEval-2 training data, which corresponds to the TimeBank corpus (Verhagen et al. 2010)....

    [...]

Proceedings Article
15 Jul 2010
TL;DR: HeidelTime is a rule-based system mainly using regular expression patterns for the extraction of temporal expressions and knowledge resources as well as linguistic clues for their normalization.
Abstract: In this paper, we describe HeidelTime, a system for the extraction and normalization of temporal expressions. HeidelTime is a rule-based system mainly using regular expression patterns for the extraction of temporal expressions and knowledge resources as well as linguistic clues for their normalization. In the TempEval-2 challenge, HeidelTime achieved the highest F-Score (86%) for the extraction and the best results in assigning the correct value attribute, i.e., in understanding the semantics of the temporal expressions.

332 citations


"Multilingual and cross-domain tempo..." refers methods in this paper

  • ...However, these modifications were not performed using an annotated corpus but in the context of our work on spatio-temporal document exploration (Strötgen and Gertz 2010b)....

    [...]

  • ...HeidelTime achieved the best results for both the extraction and the normalization task (English) (Strötgen and Gertz 2010a)....

    [...]

  • ...For example, we built a system called TimeTrails for the exploration of events in documents based on the spatial and temporal information occurring together in the sentences of documents....

    [...]

  • ...Thus, for our research on multilingual temporal information extraction and exploration (Strötgen et al. 2010; Strötgen and Gertz 2010b), we developed HeidelTime, a temporal tagger satisfying the following requirements: A. Extraction and normalization should be of high quality....

    [...]

  • ...Finally, a CAS Consumer writes all extracted pairs of spatial and temporal expressions and thus all events into a database, which is used as knowledge base for the visualization and exploration components of TimeTrails (Strötgen and Gertz 2010b)....

    [...]