A dictionary to identify small molecules and drugs in free text

doi:10.1093/BIOINFORMATICS/BTP535

Open AccessJournal ArticleDOI

A dictionary to identify small molecules and drugs in free text

Kristina Hettne, +7 more

- 01 Nov 2009 -

Bioinformatics

- Vol. 25, Iss: 22, pp 2983-2991

TLDR

A dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus is developed.

Abstract:

Motivation: From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers. Results: We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary. Availability: The combined dictionary is freely available as an XML file in Simple Knowledge Organization System format on the web site http://www.biosemantics.org/chemlist. Contact: k.hettne@erasmusmc.nl Supplementary information:Supplementary data are available at Bioinformatics online.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning with word embeddings improves biomedical named entity recognition.

Maryam Habibi, +4 more

- 15 Jul 2017 -

Bioinformatics

TL;DR: This work shows that a completely generic method based on deep learning and statistical word embeddings [called long short‐term memory network‐conditional random field (LSTM‐CRF)] outperforms state‐of‐the‐art entity‐specific NER tools, and often by a large margin.

...read moreread less

Journal ArticleDOI

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

George Tsatsaronis, +21 more

- 30 Apr 2015 -

BMC Bioinformatics

TL;DR: Overall, BioASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.

...read moreread less

Proceedings Article

A Survey on Recent Advances in Named Entity Recognition from Deep Learning models

Vikas Yadav, +1 more

TL;DR: This work presents a comprehensive survey of deep neural network architectures for NER, and contrast them with previous approaches to NER based on feature engineering and other supervised or semi-supervised learning algorithms.

...read moreread less

Journal ArticleDOI

An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition

Ling Luo, +6 more

- 15 Apr 2018 -

Bioinformatics

TL;DR: A neural network approach, i.e. attention‐based bidirectional Long Short‐Term Memory with a conditional random field layer (Att‐BiLSTM‐CRF), to document‐level chemical NER that achieves better performances with little feature engineering than other state‐of‐the‐art methods.

...read moreread less

Journal ArticleDOI

tmChem: a high performance approach for chemical named entity recognition and normalization

Robert Leaman, +2 more

- 19 Jan 2015 -

Journal of Cheminformatics

TL;DR: For example, tmChem as mentioned in this paper is a state-of-the-art system for chemical named entity recognition that combines two independent machine learning models in an ensemble, achieving a micro-averaged f-measure of 0.8739 on the CEM subtask (mention-level evaluation).

...read moreread less