Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization
Reads0
Chats0
TLDR
The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks.About:
This article is published in Journal of Biomedical Informatics.The article was published on 2014-02-01 and is currently open access. It has received 506 citations till now. The article focuses on the topics: Named-entity recognition.read more
Citations
More filters
Journal ArticleDOI
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
TL;DR: This article proposed BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora.
Proceedings ArticleDOI
SciBERT: A Pretrained Language Model for Scientific Text
Iz Beltagy,Kyle Lo,Arman Cohan +2 more
TL;DR: SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.
Journal ArticleDOI
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Yu Gu,Robert Tinn,Hao Cheng,Michael Lucas,Naoto Usuyama,Xiaodong Liu,Tristan Naumann,Jianfeng Gao,Hoifung Poon +8 more
TL;DR: It is shown that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.
Journal ArticleDOI
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
Jiao Li,Yueping Sun,Robin J. Johnson,Daniela Sciaky,Chih-Hsuan Wei,Robert Leaman,Allan Peter Davis,Carolyn J. Mattingly,Thomas C. Wiegers,Zhiyong Lu +9 more
TL;DR: The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.
Journal ArticleDOI
Deep learning with word embeddings improves biomedical named entity recognition.
TL;DR: This work shows that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin.
References
More filters
Journal ArticleDOI
The Unified Medical Language System (UMLS): integrating biomedical terminology
TL;DR: The Unified Medical Language System is a repository of biomedical vocabularies developed by the US National Library of Medicine and includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap).
Proceedings Article
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program
TL;DR: MetaMap as discussed by the authors is a system developed at the National Library of Medicine (NLM) to map biomedical text to the UMLS Metathesaurus or, equivalently, to discover METAThesaurus concepts referred to in text.
Journal ArticleDOI
GENIA corpus—a semantically annotated corpus for bio-textmining
TL;DR: The GENIA corpus as mentioned in this paper is a large corpus of 2000 MEDLINE abstracts with more than 400 000 words and almost 100, 000 annotations for biological terms for bio-text mining.
Journal ArticleDOI
A large-scale evaluation of computational protein function prediction
Predrag Radivojac,Wyatt T. Clark,Tal Ronnen Oron,Alexandra M. Schnoes,Tobias Wittkop,Artem Sokolov,Artem Sokolov,Kiley Graim,Christopher S. Funk,Karin Verspoor,Asa Ben-Hur,Gaurav Pandey,Gaurav Pandey,Jeffrey M. Yunes,Ameet Talwalkar,Susanna Repo,Susanna Repo,Michael L Souza,Damiano Piovesan,Rita Casadio,Zheng Wang,Jianlin Cheng,Hai Fang,Julian Gough,Patrik Koskinen,Petri Törönen,Jussi Nokso-Koivisto,Liisa Holm,Domenico Cozzetto,Daniel W. A. Buchan,Kevin Bryson,David T. Jones,Bhakti Limaye,Harshal Inamdar,Avik Datta,Sunitha K Manjari,Rajendra Joshi,Meghana Chitale,Daisuke Kihara,Andreas Martin Lisewski,Serkan Erdin,Eric Venner,Olivier Lichtarge,Robert Rentzsch,Haixuan Yang,Alfonso E. Romero,Prajwal Bhat,Alberto Paccanaro,Tobias Hamp,Rebecca Kaßner,Stefan Seemayer,Esmeralda Vicedo,Christian Schaefer,Dominik Achten,Florian Auer,Ariane Boehm,Tatjana Braun,Maximilian Hecht,Mark Heron,Peter Hönigschmid,Thomas A. Hopf,Stefanie Kaufmann,Michael Kiening,Denis Krompass,Cedric Landerer,Yannick Mahlich,Manfred Roos,Jari Björne,Tapio Salakoski,Andrew Wong,Hagit Shatkay,Hagit Shatkay,Fanny Gatzmann,Ingolf Sommer,Mark N. Wass,Michael J.E. Sternberg,Nives Škunca,Fran Supek,Matko Bošnjak,Panče Panov,Sašo Džeroski,Tomislav Šmuc,Yiannis A. I. Kourmpetis,Yiannis A. I. Kourmpetis,Aalt D. J. van Dijk,Cajo J. F. ter Braak,Yuanpeng Zhou,Qingtian Gong,Xinran Dong,Weidong Tian,Marco Falda,Paolo Fontana,Enrico Lavezzo,Barbara Di Camillo,Stefano Toppo,Liang Lan,Nemanja Djuric,Yuhong Guo,Slobodan Vucetic,Amos Marc Bairoch,Amos Marc Bairoch,Michal Linial,Patricia C. Babbitt,Steven E. Brenner,Christine A. Orengo,Burkhard Rost,Sean D. Mooney,Iddo Friedberg +107 more
TL;DR: Today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets, and there is considerable need for improvement of currently available tools.
Journal ArticleDOI
Disease Ontology: a backbone for disease semantic integration
Lynn M. Schriml,Cesar Arze,Suvarna Nadendla,Yu-Wei Wayne Chang,Mark J. Mazaitis,Victor Felix,Gang Feng,Warren A. Kibbe +7 more
TL;DR: The next iteration of the DO web browser will integrate DO's extended relations and logical definition representation along with these biomedical resource cross-mappings.