scispace - formally typeset
Open AccessProceedings ArticleDOI

Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

Reads0
Chats0
TLDR
In this article, a supervised learning approach was proposed to improve the UMLS Metathesaurus construction process by developing a novel supervised learning method for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the source vocabularies.
Abstract
With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.

read more

Citations
More filters
Journal ArticleDOI

A Simple Standard for Sharing Ontological Mappings (SSSOM)

TL;DR: The Simple Standard for Sharing Ontological Mappings (SSSOM) as mentioned in this paper is a standard for describing and exchanging scientific information that defines a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit.
Proceedings ArticleDOI

Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching

TL;DR: New biomedical OM tasks involving ontologies extracted from Mondo and UMLS are introduced and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems.
Journal ArticleDOI

Adding an Attention Layer Improves the Performance of a Neural Network Architecture for Synonymy Prediction in the UMLS Metathesaurus

Vinh-Kim Nguyen
- 06 Jun 2022 - 
TL;DR: An attention layer is added to the LSTM layer to improve the performance of the neural network architecture developed for predicting synonymy between terms in the UMLS Metathesaurus, specifically through the addition of an attention layer, and reduces the false positive rate and minimizes the need for manual curation.
Proceedings ArticleDOI

Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations

TL;DR: This work adjusts the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples during contrastive learning to learn fine-grained representations which result in better biomedical term clustering.
Journal ArticleDOI

BIOS: An Algorithmically Generated Biomedical Knowledge Graph

TL;DR: This work introduces the Biomedical Informatics Ontology System (BIOS), the first large-scale publicly available BioMedKG generated completely by machine learning algorithms, and suggests that machine learning-based Bio medKG development is a viable alternative to traditional expert curation.
References
More filters
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content

Efficient Estimation of Word Representations in Vector Space

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Book

Artificial Intelligence: A Modern Approach

TL;DR: In this article, the authors present a comprehensive introduction to the theory and practice of artificial intelligence for modern applications, including game playing, planning and acting, and reinforcement learning with neural networks.
Posted Content

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Proceedings ArticleDOI

Convolutional Neural Networks for Sentence Classification

TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
Related Papers (5)