Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus
Vinh Nguyen,Hong Yung Yip,Olivier Bodenreider +2 more
- Vol. 2021, pp 2672-2683
Reads0
Chats0
TLDR
In this article, a supervised learning approach was proposed to improve the UMLS Metathesaurus construction process by developing a novel supervised learning method for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the source vocabularies.Abstract:
With 214 source vocabularies, the construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus terminology integration system is costly, time-consuming, and error-prone as it primarily relies on (1) lexical and semantic processing for suggesting groupings of synonymous terms, and (2) the expertise of UMLS editors for curating these synonymy predictions. This paper aims to improve the UMLS Metathesaurus construction process by developing a novel supervised learning approach for improving the task of suggesting synonymous pairs that can scale to the size and diversity of the UMLS source vocabularies. We evaluate this deep learning (DL) approach against a rule-based approach (RBA) that approximates the current UMLS Metathesaurus construction process. The key to the generalizability of our approach is the use of various degrees of lexical similarity in negative pairs during the training process. Our initial experiments demonstrate the strong performance across multiple datasets of our DL approach in terms of recall (91-92%), precision (88-99%), and F1 score (89-95%). Our DL approach largely outperforms the RBA method in recall (+23%), precision (+2.4%), and F1 score (+14.1%). This novel approach has great potential for improving the UMLS Metathesaurus construction process by providing better synonymy suggestions to the UMLS editors.read more
Citations
More filters
Journal ArticleDOI
A Simple Standard for Sharing Ontological Mappings (SSSOM)
TL;DR: The Simple Standard for Sharing Ontological Mappings (SSSOM) as mentioned in this paper is a standard for describing and exchanging scientific information that defines a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit.
Proceedings ArticleDOI
Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching
TL;DR: New biomedical OM tasks involving ontologies extracted from Mondo and UMLS are introduced and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems.
Journal ArticleDOI
Adding an Attention Layer Improves the Performance of a Neural Network Architecture for Synonymy Prediction in the UMLS Metathesaurus
TL;DR: An attention layer is added to the LSTM layer to improve the performance of the neural network architecture developed for predicting synonymy between terms in the UMLS Metathesaurus, specifically through the addition of an attention layer, and reduces the false positive rate and minimizes the need for manual curation.
Proceedings ArticleDOI
Automatic Biomedical Term Clustering by Learning Fine-grained Term Representations
Sihang Zeng,Zheng Yuan,Sheng Yu +2 more
TL;DR: This work adjusts the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples during contrastive learning to learn fine-grained representations which result in better biomedical term clustering.
Journal ArticleDOI
BIOS: An Algorithmically Generated Biomedical Knowledge Graph
Sheng Yu,Zheng Yuan,Jun Xia,Sheng Kang Luo,Huaiyuan Ying,Sihang Zeng,Jingyi Ren,Hongyi Yuan,Zhengyun Zhao,Yucong Lin,Keming Lu,Jing Wang,Yutao Xie,Heung-Yeung Shum +13 more
TL;DR: This work introduces the Biomedical Informatics Ontology System (BIOS), the first large-scale publicly available BioMedKG generated completely by machine learning algorithms, and suggests that machine learning-based Bio medKG development is a viable alternative to traditional expert curation.
References
More filters
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content
Efficient Estimation of Word Representations in Vector Space
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Book
Artificial Intelligence: A Modern Approach
Stuart Russell,Peter Norvig +1 more
TL;DR: In this article, the authors present a comprehensive introduction to the theory and practice of artificial intelligence for modern applications, including game playing, planning and acting, and reinforcement learning with neural networks.
Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Proceedings ArticleDOI
Convolutional Neural Networks for Sentence Classification
TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
Related Papers (5)
Aligning knowledge sources in the UMLS: methods, quantitative results, and applications.
Olivier Bodenreider,Anita Burgun +1 more