scispace - formally typeset
Search or ask a question
Proceedings Article

Overview of BioNLP Shared Task 2013

TL;DR: The BioNLP Shared Task 2013 shows advances in the state of the art and demonstrates that extraction methods can be successfully generalized in various aspects.
Abstract: The BioNLP Shared Task 2013 is the third edition of the BioNLP Shared Task series that is a community-wide effort to address fine-grained, structural information extraction from biomedical literature. The BioNLP Shared Task 2013 was held from January to April 2013. Six main tasks were proposed. 38 final submissions were received, from 22 teams. The results show advances in the state of the art and demonstrate that extraction methods can be successfully generalized in various aspects.
Citations
More filters
Proceedings ArticleDOI
Chris Quirk1, Hoifung Poon1
29 Jun 2017
TL;DR: This paper proposed a graph representation that can incorporate both standard dependencies and discourse relations, thus providing a unifying way to model relations within and across sentences and extract features from multiple paths in this graph, increasing accuracy and robustness when confronted with linguistic variation and analysis error.
Abstract: The growing demand for structured knowledge has led to great interest in relation extraction, especially in cases with limited supervision. However, existing distance supervision approaches only extract relations expressed in single sentences. In general, cross-sentence relation extraction is under-explored, even in the supervised-learning setting. In this paper, we propose the first approach for applying distant supervision to cross-sentence relation extraction. At the core of our approach is a graph representation that can incorporate both standard dependencies and discourse relations, thus providing a unifying way to model relations within and across sentences. We extract features from multiple paths in this graph, increasing accuracy and robustness when confronted with linguistic variation and analysis error. Experiments on an important extraction task for precision medicine show that our approach can learn an accurate cross-sentence extractor, using only a small existing knowledge base and unlabeled text from biomedical research articles. Compared to the existing distant supervision paradigm, our approach extracted twice as many relations at similar precision, thus demonstrating the prevalence of cross-sentence relations and the promise of our approach.

233 citations

Journal ArticleDOI
TL;DR: This article reviews the different community challenge evaluations held from 2002 to 2014 and their respective tasks and examines these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively.
Abstract: One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations.

201 citations


Additional excerpts

  • ...GRO and GRN tasks in 2013 [77]) and gene inter-...

    [...]

Book
01 Jan 2020
TL;DR: This book explores how advances in machine learning and data mining can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.
Abstract: Advances in Machine Learning and Data Mining for Astronomy documents numerous successful collaborations among computer scientists, statisticians, and astronomers who illustrate the application of state-of-the-art machine learning and data mining techniques in astronomy. Due to the massive amount and complexity of data in most scientific disciplines, the material discussed in this text transcends traditional boundaries between various areas in the sciences and computer science. The books introductory part provides context to issues in the astronomical sciences that are also important to health, social, and physical sciences, particularly probabilistic and statistical aspects of classification and cluster analysis. The next part describes a number of astrophysics case studies that leverage a range of machine learning and data mining technologies. In the last part, developers of algorithms and practitioners of machine learning and data mining show how these tools and techniques are used in astronomical applications. With contributions from leading astronomers and computer scientists, this book is a practical guide to many of the most important developments in machine learning, data mining, and statistics. It explores how these advances can solve current and future problems in astronomy and looks at how they could lead to the creation of entirely new algorithms within the data mining community.

169 citations

Journal ArticleDOI
TL;DR: An overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine, are presented.
Abstract: Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine.

140 citations

Journal ArticleDOI
TL;DR: This article provides a comprehensive yet up-to-date survey for event extraction from text, which not only summarizes the task definitions, data sources and performance evaluations, but also provides a taxonomy for its solution approaches.
Abstract: Numerous important events happen everyday and everywhere but are reported in different media sources with different narrative styles. How to detect whether real-world events have been reported in articles and posts is one of the main tasks of event extraction. Other tasks include extracting event arguments and identifying their roles, as well as clustering and tracking similar events from different texts. As one of the most important research themes in natural language processing and understanding, event extraction has a wide range of applications in diverse domains and has been intensively researched for decades. This article provides a comprehensive yet up-to-date survey for event extraction from text. We not only summarize the task definitions, data sources and performance evaluations for event extraction, but also provide a taxonomy for its solution approaches. In each solution group, we provide detailed analysis for the most representative methods, especially their origins, basics, strengths and weaknesses. Last, we also present our envisions about future research directions.

106 citations


Cites background from "Overview of BioNLP Shared Task 2013..."

  • ...In addition, there are also some other event public evaluation programs for event extraction in specific domains, such as the BioNLP in the biomedical domain [20], the TimeBANK for extracting temporal information of events [21]....

    [...]

  • ...event corpus, GeneReg corpus and PPI corpora [12], [20]....

    [...]

References
More filters
Proceedings ArticleDOI
25 Jun 2005
TL;DR: This paper describes a simple yet novel method for constructing sets of 50- best parses based on a coarse-to-fine generative parser that generates 50-best lists that are of substantially higher quality than previously obtainable.
Abstract: Discriminative reranking is one method for constructing high-performance statistical parsers (Collins, 2000). A discriminative reranker requires a source of candidate parses for each sentence. This paper describes a simple yet novel method for constructing sets of 50-best parses based on a coarse-to-fine generative parser (Charniak, 2000). This method generates 50-best lists that are of substantially higher quality than previously obtainable. We used these parses as the input to a MaxEnt reranker (Johnson et al., 1999; Riezler et al., 2002) that selects the best parse from the set of parses for each sentence, obtaining an f-score of 91.0% on sentences of length 100 or less.

1,156 citations

Proceedings ArticleDOI
05 Jun 2009
TL;DR: The design and implementation of the BioNLP'09 Shared Task is presented, indicating that state-of-the-art performance is approaching a practically applicable level and revealing some remaining challenges.
Abstract: The paper presents the design and implementation of the BioNLP'09 Shared Task, and reports the final results with analysis. The shared task consists of three sub-tasks, each of which addresses bio-molecular event extraction at a different level of specificity. The data was developed based on the GENIA event corpus. The shared task was run over 12 weeks, drawing initial interest from 42 teams. Of these teams, 24 submitted final results. The evaluation results are encouraging, indicating that state-of-the-art performance is approaching a practically applicable level and revealing some remaining challenges.

633 citations


Additional excerpts

  • ...BioNLPST 2013 follows the general outline and goals of the previous tasks, namely BioNLP-ST’09 (Kim et al., 2009) and BioNLP-ST’11 (Kim et al....

    [...]

  • ...It was first organized as the sole task of the initial 2009 edition of BioNLP-ST (Kim et al., 2009)....

    [...]

01 Jan 2007
TL;DR: An error measure is defined, the slot error rate, which combines the different types of error directly, without having to resort to precision and recall as preliminary measures.
Abstract: While precision and recall have served the information extraction community well as two separate measures of system performance, we show that the F -measure, the weighted harmonic mean of precision and recall, exhibits certain undesirable behaviors. To overcome these limitations, we define an error measure, the slot error rate, which combines the different types of error directly, without having to resort to precision and recall as preliminary measures. The slot error rate is analogous to the word error rate that is used for measuring speech recognition performance; it is intended to be a measure of the cost to the user for the system to make the different types of errors.

609 citations


Additional excerpts

  • ...The network prediction submissions have been evaluated against the reference network using an original metric, the Slot Error Rate (Makhoul et al., 1999) that is more adapted to graph comparison than the usual Recall, Precision and F-score measures....

    [...]

Book ChapterDOI
TL;DR: This chapter first discusses how biological knowledge is represented, particularly the importance of ontologies or standards in systems biology research, and uses PANTHER Pathway as an example to illustrate how ontologies and standards play a role in data modeling, data entry, and data display.
Abstract: The availability of whole genome sequences from various model organisms and increasing experimental data and literatures stimulated the evolution of a systems approach for biological research. The development of computational tools and algorithms to study biological pathway networks has made great progress in helping analyze research data. Pathway databases become an integral part of such an approach. This chapter first discusses how biological knowledge is represented, particularly the importance of ontologies or standards in systems biology research. Next, we use PANTHER Pathway as an example to illustrate how ontologies and standards play a role in data modeling, data entry, and data display. Last, we describe the usage of such systems. We also describe the computational tools that utilize PANTHER pathway information to analyze gene expression experimental data.

545 citations


Additional excerpts

  • ...The PC task corpus was newly annotated for the task and consists of 525 PubMed abstracts, chosen for the relevance to specific pathway reactions selected from SBML models registered in BioModels and PANTHER DB repositories (Mi and Thomas, 2009)....

    [...]

Journal ArticleDOI
TL;DR: A corpus of discharge summaries annotated with temporal information was provided to be used for the development and evaluation of temporal reasoning systems, and the best systems overwhelmingly adopted a rule based approach for value normalization.

440 citations


Additional excerpts

  • ..., 2011) and i2b2 (Informatics for Integrating Biology and the Bedside) Shared-Tasks (Sun et al., 2013)....

    [...]