scispace - formally typeset
Open AccessPosted Content

Unsupervised Multimodal Representation Learning across Medical Images and Reports

TLDR
This paper established baseline joint embedding results measured via both local and global retrieval methods on the soon-to-be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports.
Abstract
Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the broader goals of multimodal representation learning. In this work, we establish baseline joint embedding results measured via both local and global retrieval methods on the soon to be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports. We examine both supervised and unsupervised methods on this task and show that for document retrieval tasks with the learned representations, only a limited amount of supervision is needed to yield results comparable to those of fully-supervised methods.

read more

Citations
More filters
Posted Content

Clinically Accurate Chest X-Ray Report Generation

TL;DR: A domain-aware automatic chest X-ray radiology report generation system which first predicts what topics will be discussed in the report, then conditionally generates sentences corresponding to these topics, and is fine-tuned using reinforcement learning.
Journal ArticleDOI

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

TL;DR: This paper proposes a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends, and defines two key principles of modality heterogeneity and interconnections that have driven subsequent innovations.
Book ChapterDOI

Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

TL;DR: This paper proposed a self-supervised joint vision-language approach with a focus on better text modelling, which achieved state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiological reports.
Posted Content

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

TL;DR: External evaluation using the OpenI dataset shows that the joint embedding learned by pre-trained LXMERT, VisualBERT, UNIER and PixelBERT models demonstrates performance improvement of 1.4% in thoracic finding classification tasks compared to a pioneering CNN + RNN model.
Proceedings ArticleDOI

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

TL;DR: In this article, the authors adopt four pre-trained models: LXMERT, VisualBERT, UNIER and PixelBERT to learn multimodal representation from MIMIC-CXR images and associated reports.
References
More filters
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Journal ArticleDOI

Dermatologist-level classification of skin cancer with deep neural networks

TL;DR: This work demonstrates an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists, trained end-to-end from images directly, using only pixels and disease labels as inputs.
Book ChapterDOI

Domain-adversarial training of neural networks

TL;DR: In this article, a new representation learning approach for domain adaptation is proposed, in which data at training and test time come from similar but different distributions, and features that cannot discriminate between the training (source) and test (target) domains are used to promote the emergence of features that are discriminative for the main learning task on the source domain.
Journal ArticleDOI

Cumulated gain-based evaluation of IR techniques

TL;DR: This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position, and test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences.
Related Papers (5)