Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization

doi:10.1016/J.JBI.2013.12.006

Open AccessJournal ArticleDOI

Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization

Rezarta Islamaj Dogan, +2 more

- 01 Feb 2014 -

Journal of Biomedical Informatics

- Vol. 47, pp 1-10

Chats0

TLDR

The results show that the NCBI disease corpus has the potential to significantly improve the state-of-the-art in disease name recognition and normalization research, by providing a high-quality gold standard thus enabling the development of machine-learning based approaches for such tasks.

About:

This article is published in Journal of Biomedical Informatics.The article was published on 2014-02-01 and is currently open access. It has received 506 citations till now. The article focuses on the topics: Named-entity recognition.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Jinhyuk Lee, +6 more

- 25 Jan 2019 -

Bioinformatics

TL;DR: This article proposed BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora.

...read moreread less

Proceedings ArticleDOI

SciBERT: A Pretrained Language Model for Scientific Text

Iz Beltagy, +2 more

TL;DR: SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

...read moreread less

Journal ArticleDOI

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Yu Gu, +8 more

- 31 Jul 2020 -

arXiv: Computation and Language

TL;DR: It is shown that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

...read moreread less

Journal ArticleDOI

BioCreative V CDR task corpus: a resource for chemical disease relation extraction

Jiao Li, +9 more

- 01 Jan 2016 -

Database

TL;DR: The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.

...read moreread less

Journal ArticleDOI

Deep learning with word embeddings improves biomedical named entity recognition.

Maryam Habibi, +4 more

- 15 Jul 2017 -

Bioinformatics

TL;DR: This work shows that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The Unified Medical Language System (UMLS): integrating biomedical terminology

Olivier Bodenreider

- 01 Jan 2004 -

Nucleic Acids Research

TL;DR: The Unified Medical Language System is a repository of biomedical vocabularies developed by the US National Library of Medicine and includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap).

...read moreread less

Proceedings Article

Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program

Alan R. Aronson

TL;DR: MetaMap as discussed by the authors is a system developed at the National Library of Medicine (NLM) to map biomedical text to the UMLS Metathesaurus or, equivalently, to discover METAThesaurus concepts referred to in text.

...read moreread less

Journal ArticleDOI

GENIA corpus—a semantically annotated corpus for bio-textmining

Jin-Dong Kim, +3 more

- 03 Jul 2003 -

Bioinformatics

TL;DR: The GENIA corpus as mentioned in this paper is a large corpus of 2000 MEDLINE abstracts with more than 400 000 words and almost 100, 000 annotations for biological terms for bio-text mining.

...read moreread less

Journal ArticleDOI

A large-scale evaluation of computational protein function prediction

Predrag Radivojac, +107 more

- 01 Mar 2013 -

Nature Methods

TL;DR: Today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets, and there is considerable need for improvement of currently available tools.

...read moreread less

Journal ArticleDOI

Disease Ontology: a backbone for disease semantic integration

Lynn M. Schriml, +7 more

- 01 Jan 2012 -

Nucleic Acids Research

TL;DR: The next iteration of the DO web browser will integrate DO's extended relations and logical definition representation along with these biomedical resource cross-mappings.

...read moreread less

Collapse

Related Papers (5)

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Jinhyuk Lee, +6 more

- 25 Jan 2019 -

Bioinformatics

Nucleic Acids Research

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

Special Report: NCBI disease corpus: A resource for disease name recognition and concept normalization

Citations

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

SciBERT: A Pretrained Language Model for Scientific Text

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

BioCreative V CDR task corpus: a resource for chemical disease relation extraction

Deep learning with word embeddings improves biomedical named entity recognition.

References

The Unified Medical Language System (UMLS): integrating biomedical terminology

Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program

GENIA corpus—a semantically annotated corpus for bio-textmining

A large-scale evaluation of computational protein function prediction

Disease Ontology: a backbone for disease semantic integration

Related Papers (5)

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Neural Architectures for Named Entity Recognition

The Unified Medical Language System (UMLS): integrating biomedical terminology

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data