AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation.

Open AccessPosted Content

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation.

Xiangyi Yan, +5 more

- 20 Oct 2021 -

arXiv: Image and Video Processing

Chats0

TLDR

In this article, Axial Fusion Transformer UNet (AFTer-UNet) is proposed, which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.

Abstract:

Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation.

Citations

Identifying Malignant Breast Ultrasound Images Using ViT-Patch

Swin transformer-based GAN for multi-modal medical image translation

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

SwinCup: Cascaded swin transformer for histopathological structures segmentation in colorectal cancer

SSCAP: Self-supervised co-occurrence action parsing for unsupervised temporal action segmentation

References

Adam: A Method for Stochastic Optimization

Attention is All you Need

U-Net: Convolutional Networks for Biomedical Image Segmentation

Fully convolutional networks for semantic segmentation

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Related Papers (5)

An Image Knowledge Based Video Codec For Low Bitrates.

Segmentation-Based and Region-Adaptive Lossless Image Compression Underpinned by a Stellar-Field Image Model

Fast and Accurate Single Image Super-Resolution via Information Distillation Network

Image Compression with Deeper Learned Transformer

A Novel Point Cloud Compression Algorithm Based on Clustering

Trending Questions (1)