BioCreAtIvE Task 1A: gene mention finding evaluation
Reads0
Chats0
TLDR
The 80% plus F-measure results are good, but still somewhat lag the best scores achieved in some other domains such as newswire, due in part to the complexity and length of gene names, compared to person or organization names in newswire.Abstract:
The biological research literature is a major repository of knowledge. As the amount of literature increases, it will get harder to find the information of interest on a particular topic. There has been an increasing amount of work on text mining this literature, but comparing this work is hard because of a lack of standards for making comparisons. To address this, we worked with colleagues at the Protein Design Group, CNB-CSIC, Madrid to develop BioCreAtIvE (Critical Assessment for Information Extraction in Biology), an open common evaluation of systems on a number of biological text mining tasks. We report here on task 1A, which deals with finding mentions of genes and related entities in text. "Finding mentions" is a basic task, which can be used as a building block for other text mining tasks. The task makes use of data and evaluation software provided by the (US) National Center for Biotechnology Information (NCBI). 15 teams took part in task 1A. A number of teams achieved scores over 80% F-measure (balanced precision and recall). The teams that tried to use their task 1A systems to help on other BioCreAtIvE tasks reported mixed results. The 80% plus F-measure results are good, but still somewhat lag the best scores achieved in some other domains such as newswire, due in part to the complexity and length of gene names, compared to person or organization names in newswire.read more
Citations
More filters
Proceedings ArticleDOI
An Analysis of Active Learning Strategies for Sequence Labeling Tasks
Burr Settles,Mark Craven +1 more
TL;DR: This paper surveys previously used query selection strategies for sequence models, and proposes several novel algorithms to address their shortcomings, and conducts a large-scale empirical comparison.
Journal ArticleDOI
Overview of BioCreAtIvE: critical assessment of information extraction for biology
TL;DR: The first BioCreAtIvE assessment provided state-of-the-art performance results for a basic task (gene name finding and normalization), where the best systems achieved a balanced 80% precision / recall or better, which potentially makes them suitable for real applications in biology.
Proceedings ArticleDOI
BANNER: an executable survey of advances in biomedical named entity recognition.
Robert Leaman,Graciela Gonzalez +1 more
TL;DR: BANNER is an open-source, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field and is designed to maximize domain independence by not employing brittle semantic features or rule-based processing steps.
Journal ArticleDOI
Overview of BioCreative II gene mention recognition
Larry Smith,Lorraine K. Tanabe,Rie Johnson nee Ando,Cheng-Ju Kuo,I-Fang Chung,Chun-Nan Hsu,Yu-Shi Lin,Roman Klinger,Christoph M. Friedrich,Kuzman Ganchev,Manabu Torii,Hongfang Liu,Barry Haddow,Craig A. Struble,Richard J. Povinelli,Andreas Vlachos,William A. Baumgartner,Lawrence Hunter,Bob Carpenter,Richard Tzong-Han Tsai,Richard Tzong-Han Tsai,Hong-Jie Dai,Hong-Jie Dai,Feng Liu,Yifei Chen,Chengjie Sun,Sophia Katrenko,Pieter Adriaans,Christian Blaschke,Rafael Torres,Mariana Neves,Preslav Nakov,Preslav Nakov,Anna Divoli,Manuel Maña-López,Jacinto Mata,W. John Wilbur +36 more
TL;DR: It is demonstrated that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.
Journal ArticleDOI
The CHEMDNER corpus of chemicals and drugs and its annotation principles.
Martin Krallinger,Obdulia Rabal,Florian Leitner,Miguel Vazquez,David Salgado,Zhiyong Lu,Robert Leaman,Yanan Lu,Donghong Ji,Daniel M. Lowe,Roger A. Sayle,Riza Theresa Batista-Navarro,Rafal Rak,Torsten Huber,Tim Rocktäschel,Sérgio Matos,David Campos,Buzhou Tang,Hua Xu,Tsendsuren Munkhdalai,Keun Ho Ryu,S. V. Ramanan,Senthil Nathan,Slavko Žitnik,Marko Bajec,Lutz Weber,Matthias Irmer,Saber A. Akhondi,Jan A. Kors,Shuo Xu,Xin An,Utpal Kumar Sikdar,Asif Ekbal,Masaharu Yoshioka,Thaer M. Dieb,Miji Choi,Karin Verspoor,Madian Khabsa,C. Lee Giles,Hongfang Liu,Komandur Elayavilli Ravikumar,Andre Lamurias,Francisco M. Couto,Hong-Jie Dai,Richard Tzong-Han Tsai,Caglar Ata,Tolga Can,Anabel Usié,Rui Alves,Isabel Segura-Bedmar,Paloma Martínez,Julen Oyarzabal,Alfonso Valencia +52 more
TL;DR: The CHEMDNER corpus is presented, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task.
References
More filters
Proceedings Article
Transductive Inference for Text Classification using Support Vector Machines
TL;DR: An analysis of why Transductive Support Vector Machines are well suited for text classi cation is presented, and an algorithm for training TSVMs, handling 10,000 examples and more is proposed.
Book
Computer-intensive methods for testing hypotheses : an introduction
TL;DR: Approximate Randomization Tests.
Proceedings ArticleDOI
More accurate tests for the statistical significance of result differences
TL;DR: It is found in a set of experiments that many commonly used tests often underestimate the significance and so are less likely to detect differences that exist between different techniques, including computationally-intensive randomization tests.
Journal ArticleDOI
Accomplishments and challenges in literature data mining for biology
TL;DR: To encourage participation and accelerate progress in this expanding field of literature data mining, it is proposed creating challenge evaluations, and two specific applications are described in this context.
Related Papers (5)
GENIA corpus—a semantically annotated corpus for bio-textmining
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more