AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

doi:10.1109/wacv51458.2022.00333

Open AccessProceedings ArticleDOI

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

Chats0

TLDR

In this paper , Axial Fusion Transformer UNet (AFTer-UNet) is proposed, which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.

Abstract:

Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the UNet model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers’ capability of extracting detailed features and transformers’ strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Identifying Malignant Breast Ultrasound Images Using ViT-Patch

Hao Feng, +7 more

- 09 Mar 2023 -

Applied Sciences

TL;DR: Zhang et al. as discussed by the authors proposed an improved ViT architecture, which adds a shared MLP head to the output of each patch token to balance the feature learning on the class and patch tokens.

...read moreread less

Journal ArticleDOI

Swin transformer-based GAN for multi-modal medical image translation

Shouang Yan, +3 more

- 08 Aug 2022 -

Frontiers in Oncology

TL;DR: A Swin Transformer-based GAN for Multi-Modal Medical Image Translation, named MMTrans, which outperformed other advanced medical image translation methods in both aligned and unpaired datasets and has great potential to be applied in clinical applications.

...read moreread less

Proceedings ArticleDOI

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

TL;DR: Wang et al. as mentioned in this paper proposed a spatio-temporal attention (STAR-transformer) encoder and decoder to represent two cross-modal features as a recognizable vector.

...read moreread less

Journal ArticleDOI

SwinCup: Cascaded swin transformer for histopathological structures segmentation in colorectal cancer

Usama Zidan, +2 more

- 01 Dec 2022 -

Expert Systems With Applications

Posted Content

SSCAP: Self-supervised co-occurrence action parsing for unsupervised temporal action segmentation

Zhe Wang, +6 more

- 01 Jan 2022 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, an unsupervised method, namely SSCAP, is proposed to predict a likely set of temporal segments across the videos by leveraging self-supervised learning to extract distinguishable features and then applies a novel Co-occurrence Action Parsing algorithm to not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal path of the sub-action in an accurate and general way.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Journal ArticleDOI

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation

Fabian Isensee, +7 more

- 01 Feb 2021 -

Nature Methods

TL;DR: nnU-Net as mentioned in this paper is a deep learning-based segmentation method that automatically configures itself, including preprocessing, network architecture, training and post-processing for any new task.

...read moreread less

Proceedings ArticleDOI

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Sixiao Zheng, +10 more

TL;DR: Zhang et al. as discussed by the authors proposed a pure transformer to encode an image as a sequence of patches, which can be combined with a simple decoder to provide a powerful segmentation model.

...read moreread less

Journal ArticleDOI

VoxelMorph: A Learning Framework for Deformable Medical Image Registration

Guha Balakrishnan, +4 more

- 14 Sep 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: VoxelMorph promises to speed up medical image analysis and processing pipelines while facilitating novel directions in learning-based registration and its applications and demonstrates that the unsupervised model’s accuracy is comparable to the state-of-the-art methods while operating orders of magnitude faster.

...read moreread less

Journal ArticleDOI

Automatic Multi-Organ Segmentation on Abdominal CT With Dense V-Networks

Eli Gibson, +9 more

- 14 Feb 2018 -

IEEE Transactions on Medical Imaging

TL;DR: It is concluded that the deep-learning-based segmentation represents a registration-free method for multi-organ abdominal CT segmentation whose accuracy can surpass current methods, potentially supporting image-guided navigation in gastrointestinal endoscopy procedures.

...read moreread less

Collapse

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

Citations

Identifying Malignant Breast Ultrasound Images Using ViT-Patch

Swin transformer-based GAN for multi-modal medical image translation

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

SwinCup: Cascaded swin transformer for histopathological structures segmentation in colorectal cancer

SSCAP: Self-supervised co-occurrence action parsing for unsupervised temporal action segmentation

References

Fully convolutional networks for semantic segmentation

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

VoxelMorph: A Learning Framework for Deformable Medical Image Registration

Automatic Multi-Organ Segmentation on Abdominal CT With Dense V-Networks

Related Papers (5)

Accurate silhouette segmentation using motion detection and graph cuts

Variational segmentation of image sequences using deformable shape priors

Segmentation and motion estimation

Detecting and solving template ambiguities in motion segmentation

Object segmentation using stereo images