scispace - formally typeset
Open AccessProceedings ArticleDOI

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

Reads0
Chats0
TLDR
By applying a novel knowledge augmentation strategy, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models on common named-entity recognition (NER) and clinical natural language inference tasks.
Abstract
Contextual word embedding models, such as BioBERT and Bio_ClinicalBERT, have achieved state-of-the-art results in biomedical natural language processing tasks by focusing their pre-training process on domain-specific corpora. However, such models do not take into consideration structured expert domain knowledge from a knowledge base. We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process via a novel knowledge augmentation strategy. More specifically, the augmentation on UmlsBERT with the Unified Medical Language System (UMLS) Metathesaurus is performed in two ways: i) connecting words that have the same underlying ‘concept’ in UMLS and ii) leveraging semantic type knowledge in UMLS to create clinically meaningful input embeddings. By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models on common named-entity recognition (NER) and clinical natural language inference tasks.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Self-alignment pretraining for biomedical entity representations

TL;DR: SapBERT offers an elegant one-model-for-all solution to the problem of medical entity linking (MEL), achieving a new state-of-the-art (SOTA) on six MEL benchmarking datasets and being able to achieve SOTA even without task-specific supervision.
Posted Content

Improving Biomedical Pretrained Language Models with Knowledge

TL;DR: KeBioLM is proposed, a biomedical pretrained language model that explicitly leverages knowledge from the UMLS knowledge bases and has better ability to model medical knowledge.
Journal ArticleDOI

AMMU: A survey of transformer-based biomedical pretrained language models

TL;DR: Transformer-based pre-trained language models (PLMs) have started a new era in modern natural language processing (NLP), which combine the power of transformers, transfer learning, and self-supervised learning as mentioned in this paper .
Journal ArticleDOI

Pre-trained language models with domain knowledge for biomedical extractive summarization

TL;DR: The authors propose a knowledge infusion training framework for biomedical text summarization, which combines generative and discriminative training techniques to fuse domain knowledge into knowledge adapters and apply adapter fusion to efficiently inject the knowledge adapters into the basic PLMs for fine-tuning the extractive summarization task.
Journal ArticleDOI

ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT

TL;DR: This article proposed ImpressionGPT, which leverages the in-context learning capability of LLMs by constructing dynamic contexts using domain-specific, individualized data, and designed an iterative optimization algorithm that performs automatic evaluation on the generated impression results and composes the corresponding instruction prompts to further optimize the model.
References
More filters
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI

WordNet: a lexical database for English

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Related Papers (5)
Trending Questions (1)
What is the most recent model in UMLS medical term embedding?

The paper does not mention the most recent model in UMLS medical term embedding. The paper introduces UmlsBERT as a model that integrates domain knowledge from the UMLS Metathesaurus during the pre-training process.