scispace - formally typeset
Open AccessPosted Content

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation.

Reads0
Chats0
TLDR
In this article, Axial Fusion Transformer UNet (AFTer-UNet) is proposed, which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
Abstract
Recent advances in transformer-based models have drawn attention to exploring these techniques in medical image segmentation, especially in conjunction with the U-Net model (or its variants), which has shown great success in medical image segmentation, under both 2D and 3D settings. Current 2D based methods either directly replace convolutional layers with pure transformers or consider a transformer as an additional intermediate encoder between the encoder and decoder of U-Net. However, these approaches only consider the attention encoding within one single slice and do not utilize the axial-axis information naturally provided by a 3D volume. In the 3D setting, convolution on volumetric data and transformers both consume large GPU memory. One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance. In this paper, we propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling. It considers both intra-slice and inter-slice long-range cues to guide the segmentation. Meanwhile, it has fewer parameters and takes less GPU memory to train than the previous transformer-based models. Extensive experiments on three multi-organ segmentation datasets demonstrate that our method outperforms current state-of-the-art methods.

read more

Citations
More filters
Journal ArticleDOI

Identifying Malignant Breast Ultrasound Images Using ViT-Patch

TL;DR: Zhang et al. as discussed by the authors proposed an improved ViT architecture, which adds a shared MLP head to the output of each patch token to balance the feature learning on the class and patch tokens.
Journal ArticleDOI

Swin transformer-based GAN for multi-modal medical image translation

TL;DR: A Swin Transformer-based GAN for Multi-Modal Medical Image Translation, named MMTrans, which outperformed other advanced medical image translation methods in both aligned and unpaired datasets and has great potential to be applied in clinical applications.
Proceedings ArticleDOI

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

TL;DR: Wang et al. as mentioned in this paper proposed a spatio-temporal attention (STAR-transformer) encoder and decoder to represent two cross-modal features as a recognizable vector.
Posted Content

SSCAP: Self-supervised co-occurrence action parsing for unsupervised temporal action segmentation

TL;DR: In this article, an unsupervised method, namely SSCAP, is proposed to predict a likely set of temporal segments across the videos by leveraging self-supervised learning to extract distinguishable features and then applies a novel Co-occurrence Action Parsing algorithm to not only capture the correlation among sub-actions underlying the structure of activities, but also estimate the temporal path of the sub-action in an accurate and general way.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Trending Questions (1)
What are recent development with UNET?

Recent development includes AFTer-UNet, combining U-Net with transformers for medical image segmentation. It utilizes axial-axis information in 3D settings, outperforming current methods with fewer parameters and less GPU memory.