Unsupervised Multimodal Representation Learning across Medical Images and Reports

Open AccessPosted Content

Unsupervised Multimodal Representation Learning across Medical Images and Reports

- 21 Nov 2018 -

TLDR

This paper established baseline joint embedding results measured via both local and global retrieval methods on the soon-to-be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports.

Abstract:

Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the broader goals of multimodal representation learning. In this work, we establish baseline joint embedding results measured via both local and global retrieval methods on the soon to be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports. We examine both supervised and unsupervised methods on this task and show that for document retrieval tasks with the learned representations, only a limited amount of supervision is needed to yield results comparable to those of fully-supervised methods.

Citations

PDF

Open Access

More filters

Posted Content

Clinically Accurate Chest X-Ray Report Generation

Guanxiong Liu, +6 more

- 04 Apr 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A domain-aware automatic chest X-ray radiology report generation system which first predicts what topics will be discussed in the report, then conditionally generates sentences corresponding to these topics, and is fine-tuned using reinforcement learning.

...read moreread less

Journal ArticleDOI

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Paul Pu Liang, +2 more

arXiv.org

TL;DR: This paper proposes a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends, and defines two key principles of modality heterogeneity and interconnections that have driven subsequent innovations.

...read moreread less

Book ChapterDOI

Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

Benedikt Böcking, +10 more

- 21 Apr 2022 -

Lecture Notes in Computer Science

TL;DR: This paper proposed a self-supervised joint vision-language approach with a focus on better text modelling, which achieved state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiological reports.

...read moreread less

Posted Content

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

Yikuan Li, +2 more

- 03 Sep 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: External evaluation using the OpenI dataset shows that the joint embedding learned by pre-trained LXMERT, VisualBERT, UNIER and PixelBERT models demonstrates performance improvement of 1.4% in thoracic finding classification tasks compared to a pioneering CNN + RNN model.

...read moreread less

Proceedings ArticleDOI

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

Yikuan Li, +2 more

TL;DR: In this article, the authors adopt four pre-trained models: LXMERT, VisualBERT, UNIER and PixelBERT to learn multimodal representation from MIMIC-CXR images and associated reports.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Journal ArticleDOI

Dermatologist-level classification of skin cancer with deep neural networks

Andre Esteva, +7 more

- 02 Feb 2017 -

Nature

TL;DR: This work demonstrates an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists, trained end-to-end from images directly, using only pixels and disease labels as inputs.

...read moreread less

Book ChapterDOI

Domain-adversarial training of neural networks

Yaroslav Ganin, +7 more

- 01 Jan 2016 -

Journal of Machine Learning Research

TL;DR: In this article, a new representation learning approach for domain adaptation is proposed, in which data at training and test time come from similar but different distributions, and features that cannot discriminate between the training (source) and test (target) domains are used to promote the emergence of features that are discriminative for the main learning task on the source domain.

...read moreread less

Journal ArticleDOI

Cumulated gain-based evaluation of IR techniques

Kalervo Järvelin, +1 more

- 01 Oct 2002 -

ACM Transactions on Information Systems

TL;DR: This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position, and test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences.

...read moreread less

Collapse

Related Papers (5)

Multimodal Self-Supervised Learning for Medical Image Analysis

Aiham Taleb, +3 more

A survey on deep learning in medical image analysis

Geert Litjens, +8 more

- 01 Dec 2017 -

Medical Image Analysis

Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports.

Hong-Yu Zhou, +5 more

- 04 Nov 2021 -

medRxiv

Learning Robust Visual-Semantic Embeddings

Yao-Hung Hubert Tsai, +2 more

Self-supervised Multi-task Representation Learning for Sequential Medical Images

Nanqing Dong, +2 more

Unsupervised Multimodal Representation Learning across Medical Images and Reports

Citations

Clinically Accurate Chest X-Ray Report Generation

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

References

Glove: Global Vectors for Word Representation

Dermatologist-level classification of skin cancer with deep neural networks

Domain-adversarial training of neural networks

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs

Cumulated gain-based evaluation of IR techniques

Related Papers (5)

Multimodal Self-Supervised Learning for Medical Image Analysis

A survey on deep learning in medical image analysis

Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports.

Learning Robust Visual-Semantic Embeddings

Self-supervised Multi-task Representation Learning for Sequential Medical Images