scispace - formally typeset
Search or ask a question

Showing papers on "Image segmentation published in 2022"


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper presented a new multilevel image segmentation method based on the swarm intelligence algorithm (SIA) to enhance the segmentation of COVID-19 X-rays.

88 citations


Journal ArticleDOI
TL;DR: In this paper, a new method for detecting COVID-19 and pneumonia using chest X-ray images was proposed, which can be described as a three-step process and achieved the highest testing classification accuracy of 96.6% using the VGG-19 model associated with the binary robust invariant scalable key-points (BRISK) algorithm.

76 citations


Proceedings ArticleDOI
01 Jun 2022
TL;DR: Mask2former as discussed by the authors proposes Masked-Attention Mask Transformer (Mask2Transformer), which extracts localized features by constraining cross-attention within predicted mask regions. But it is not suitable for instance segmentation.
Abstract: Image segmentation groups pixels with different semantics, e.g., category or instance membership. Each choice of semantics defines a task. While only the semantics of each task differ, current research focuses on designing spe-cialized architectures for each task. We present Masked- attention Mask Transformer (Mask2Former), a new archi-tecture capable of addressing any image segmentation task (panoptic, instance or semantic). Its key components in-clude masked attention, which extracts localized features by constraining cross-attention within predicted mask regions. In addition to reducing the research effort by at least three times, it outperforms the best specialized architectures by a significant margin on four popular datasets. Most no-tably, Mask2Former sets a new state-of-the-art for panoptic segmentation (57.8 PQ on COCO), instance segmentation (50.1 AP on COCO) and semantic segmentation (57.7 mIoU onADE20K).

68 citations


Journal ArticleDOI
TL;DR: In this article , a comprehensive thematic survey on medical image segmentation using deep learning techniques is presented, where the authors classify currently popular literatures according to a multi-level structure from coarse to fine.
Abstract: Deep learning has been widely used for medical image segmentation and a large number of papers has been presented recording the success of deep learning in the field. In this paper, we present a comprehensive thematic survey on medical image segmentation using deep learning techniques. This paper makes two original contributions. Firstly, compared to traditional surveys that directly divide literatures of deep learning on medical image segmentation into many groups and introduce literatures in detail for each group, we classify currently popular literatures according to a multi-level structure from coarse to fine. Secondly, this paper focuses on supervised and weakly supervised learning approaches, without including unsupervised approaches since they have been introduced in many old surveys and they are not popular currently. For supervised learning approaches, we analyze literatures in three aspects: the selection of backbone networks, the design of network blocks, and the improvement of loss functions. For weakly supervised learning approaches, we investigate literature according to data augmentation, transfer learning, and interactive segmentation, separately. Compared to existing surveys, this survey classifies the literatures very differently from before and is more convenient for readers to understand the relevant rationale and will guide them to think of appropriate improvements in medical image segmentation based on deep learning approaches.

64 citations


Journal ArticleDOI
TL;DR: The most common rule-driven-based and data-driven image segmentation algorithms are compared and discussed in this article , and strategies to obtain better results such as hybrid integration algorithms and optimization methods are presented.

63 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a dual-scale encoder-decoder architecture with self-attention to enhance the semantic segmentation quality of varying medical images, which can effectively model the non-local dependencies and multi-scale contexts for enhancing the pixellevel intrinsic structural features inside each patch.
Abstract: Automatic medical image segmentation has made great progress owing to the powerful deep representation learning. Inspired by the success of self-attention mechanism in Transformer, considerable efforts are devoted to designing the robust variants of encoder-decoder architecture with Transformer. However, the patch division used in the existing Transformer-based models usually ignores the pixel-level intrinsic structural features inside each patch. In this paper, we propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet), which aims to incorporate the hierarchical Swin Transformer into both encoder and decoder of the standard U-shaped architecture. Our DS-TransUNet benefits from the self-attention computation in Swin Transformer and the designed dual-scale encoding, which can effectively model the non-local dependencies and multi-scale contexts for enhancing the semantic segmentation quality of varying medical images. Unlike many prior Transformer-based solutions, the proposed DS-TransUNet adopts a well-established dual-scale encoding mechanism that utilizes dual-scale encoders based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales. Meanwhile, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively perform the multi-scale information fusion through the self-attention mechanism. Furthermore, we introduce the Swin Transformer block into decoder to further explore the long-range contextual information during the up-sampling process. Extensive experiments across four typical tasks for medical image segmentation demonstrate the effectiveness of DS-TransUNet, and our approach significantly outperforms the state-of-the-art methods.

56 citations


Journal ArticleDOI
TL;DR: This paper summarizes the medical image segmentation technologies based on the U-Net structure variants concerning their structure, innovation, efficiency, etc.
Abstract: Deep learning has been extensively applied to segmentation in medical imaging. U-Net proposed in 2015 shows the advantages of accurate segmentation of small targets and its scalable network architecture. With the increasing requirements for the performance of segmentation in medical imaging in recent years, U-Net has been cited academically more than 2500 times. Many scholars have been constantly developing the U-Net architecture. This paper summarizes the medical image segmentation technologies based on the U-Net structure variants concerning their structure, innovation, efficiency, etc.; reviews and categorizes the related methodology; and introduces the loss functions, evaluation parameters, and modules commonly applied to segmentation in medical imaging, which will provide a good reference for the future research.

54 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a multi-scale residual fusion network (MSRF-Net) for medical image segmentation, which is able to exchange multiscale features of varying receptive fields using a Dual-Scale Dense Fusion (DSDF) block.
Abstract: Methods based on convolutional neural networks have improved the performance of biomedical image segmentation. However, most of these methods cannot efficiently segment objects of variable sizes and train on small and biased datasets, which are common for biomedical use cases. While methods exist that incorporate multi-scale fusion approaches to address the challenges arising with variable sizes, they usually use complex models that are more suitable for general semantic segmentation problems. In this paper, we propose a novel architecture called Multi-Scale Residual Fusion Network (MSRF-Net), which is specially designed for medical image segmentation. The proposed MSRF-Net is able to exchange multi-scale features of varying receptive fields using a Dual-Scale Dense Fusion (DSDF) block. Our DSDF block can exchange information rigorously across two different resolution scales, and our MSRF sub-network uses multiple DSDF blocks in sequence to perform multi-scale fusion. This allows the preservation of resolution, improved information flow and propagation of both high- and low-level features to obtain accurate segmentation maps. The proposed MSRF-Net allows to capture object variabilities and provides improved results on different biomedical datasets. Extensive experiments on MSRF-Net demonstrate that the proposed method outperforms the cutting-edge medical image segmentation methods on four publicly available datasets. We achieve the Dice Coefficient (DSC) of 0.9217, 0.9420, and 0.9224, 0.8824 on Kvasir-SEG, CVC-ClinicDB, 2018 Data Science Bowl dataset, and ISIC-2018 skin lesion segmentation challenge dataset respectively. We further conducted generalizability tests and achieved DSC of 0.7921 and 0.7575 on CVC-ClinicDB and Kvasir-SEG, respectively.

52 citations


Journal ArticleDOI
TL;DR: This work presents a review of the literature in the field of medical image segmentation employing deep convolutional neural networks, and examines the various widely used medical image datasets, the different metrics used for evaluating the segmentation tasks, and performances of different CNN based networks.
Abstract: Image segmentation is a branch of digital image processing which has numerous applications in the field of analysis of images, augmented reality, machine vision, and many more. The field of medical image analysis is growing and the segmentation of the organs, diseases, or abnormalities in medical images has become demanding. The segmentation of medical images helps in checking the growth of disease like tumour, controlling the dosage of medicine, and dosage of exposure to radiations. Medical image segmentation is really a challenging task due to the various artefacts present in the images. Recently, deep neural models have shown application in various image segmentation tasks. This significant growth is due to the achievements and high performance of the deep learning strategies. This work presents a review of the literature in the field of medical image segmentation employing deep convolutional neural networks. The paper examines the various widely used medical image datasets, the different metrics used for evaluating the segmentation tasks, and performances of different CNN based networks. In comparison to the existing review and survey papers, the present work also discusses the various challenges in the field of segmentation of medical images and different state-of-the-art solutions available in the literature.

50 citations


Journal ArticleDOI
TL;DR: Image-to-image translation (I2I) aims to transfer images from a source domain to a target domain while preserving the content representations as mentioned in this paper , which has drawn increasing attention and made tremendous progress in recent years.
Abstract: Image-to-image translation (I2I) aims to transfer images from a source domain to a target domain while preserving the content representations. I2I has drawn increasing attention and made tremendous progress in recent years because of its wide range of applications in many computer vision and image processing problems, such as image synthesis, segmentation, style transfer, restoration, and pose estimation. In this paper, we provide an overview of the I2I works developed in recent years. We will analyze the key techniques of the existing I2I works and clarify the main progress the community has made. Additionally, we will elaborate on the effect of I2I on the research and industry community and point out remaining challenges in related fields.

46 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper employed a differentiable superpixel generation method to over-segment the single-polarization SAR image and proposed a superpixel-wise statistical dissimilarity measure for converting the soft superpixels set into a self-connected weighted graph.
Abstract: In existing superpixel-wise segmentation algorithms, superpixel generation most often is an isolated preprocessing step. The segmentation performance is determined to a certain extent by the accuracy of superpixels. However, it is still a challenge to develop a stable superpixel generation method. In this paper, we attempt to incorporate the superpixel generation and merging steps into an end-to-end trainable deep network. First, we employ a recently proposed differentiable superpixel generation method to over-segment the single-polarization SAR image. It outputs the statistical likelihood that each pixel belongs to different superpixels. In superpixel mering part, as one of our main contributions, we propose a superpixel-wise statistical dissimilarity measure method for converting the soft superpixels set into a self-connected weighted graph. More importantly, inspired by the concept of the number of walks in graph theory, we define the k-order connectivity of each vertex. This definition can intelligently indicate the potential soft cluster centers and class assignments in graph. This merging method is differentiable, computationally simple, and free of empirical parameters. The superpixel generation and merging phases can be implemented under a unified deep network. The benefit is that our method can iteratively adjust the shapes of the superpixels according to the boundaries and segmentation results during training, until the satisfactory segmentation results are captured. Experimental results on real SAR images demonstrate that the segmentation precision of our proposed method is superior to other state-of-the-art methods in terms of precision and computational efficiency.

Journal ArticleDOI
TL;DR: UNeXt as discussed by the authors is a Convolutional multilayer perceptron (MLP) based network for image segmentation, which has an early convolutional stage and a MLP stage in the latent stage.
Abstract: AbstractUNet and its latest extensions like TransUNet have been the leading medical image segmentation methods in recent years. However, these networks cannot be effectively adopted for rapid image segmentation in point-of-care applications as they are parameter-heavy, computationally complex and slow to use. To this end, we propose UNeXt which is a Convolutional multilayer perceptron (MLP) based network for image segmentation. We design UNeXt in an effective way with an early convolutional stage and a MLP stage in the latent stage. We propose a tokenized MLP block where we efficiently tokenize and project the convolutional features and use MLPs to model the representation. To further boost the performance, we propose shifting the channels of the inputs while feeding in to MLPs so as to focus on learning local dependencies. Using tokenized MLPs in latent space reduces the number of parameters and computational complexity while being able to result in a better representation to help segmentation. The network also consists of skip connections between various levels of encoder and decoder. We test UNeXt on multiple medical image segmentation datasets and show that we reduce the number of parameters by 72x, decrease the computational complexity by 68x, and improve the inference speed by 10x while also obtaining better segmentation performance over the state-of-the-art medical image segmentation architectures. Code is available at https://github.com/jeya-maria-jose/UNeXt-pytorch. KeywordsMedical image segmentationMLPPoint-of-care

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an improved ABC (CCABC) based on a horizontal search mechanism and a vertical search mechanism to improve the algorithm's performance and also presented a multilevel thresholding image segmentation (MTIS) method based on CCABC to enhance the effectiveness of the multi-level thresholding approach.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed an uncertainty guided network (UG-Net) for automatic medical image segmentation, which consists of three parts: a coarse segmentation module (CSM), an uncertainty-guided module (UGM), and a feature refinement module (FRM) embedded with several dual attention (DAT) blocks.
Abstract: Automatic segmentation is a fundamental task in computer-assisted medical image analysis. Convolutional neural networks (CNNs) have been widely used for medical image segmentation tasks. Currently, most deep learning-based methods output a probability map and use a hand-crafted threshold to generate the final segmentation result, while how confident the network is of the probability map remains unclear. The segmentation result can be quite unreliable even though the probability is much higher than the threshold since the uncertainty of the probability can also be high. Moreover, boundary information loss caused by consecutive pooling layers and convolution strides makes the object’s boundary in segmentation even more unreliable. In this paper, we propose an uncertainty guided network referred to as UG-Net for automatic medical image segmentation. Different from previous methods, our UG-Net can learn from and contend with uncertainty by itself in an end-to-end manner. Specifically, UG-Net consists of three parts: a coarse segmentation module (CSM) to obtain the coarse segmentation and the uncertainty map, an uncertainty guided module (UGM) to leverage the obtained uncertainty map in an end-to-end manner, and a feature refinement module (FRM) embedded with several dual attention (DAT) blocks to generate the final segmentations. In addition, to formulate a unified segmentation network and extract richer context information, a multi-scale feature extractor (MFE) is inserted between the encoder and decoder of the CSM. Experimental results show that the proposed UG-Net outperforms the state-of-the-art methods on nasopharyngeal carcinoma (NPC) segmentation, lung segmentation, optic disc segmentation and retinal vessel detection. • A deep learning-based network named UG-Net is proposed for medical image segmentation. • An uncertainty guided module is proposed to learn from uncertainty in an end-to-end manner. • A feature refinement module with dual attention mechanism is designed for further performance promotion. • A multi-scale feature extractor is devised to fit into different segmentation tasks.

Journal ArticleDOI
TL;DR: KiU-Net as mentioned in this paper uses an overcomplete convolutional architecture where the input image is projected into a higher dimension such that the receptive field from increasing in the deep layers of the network is constrained.
Abstract: Most methods for medical image segmentation use U-Net or its variants as they have been successful in most of the applications. After a detailed analysis of these "traditional" encoder-decoder based approaches, we observed that they perform poorly in detecting smaller structures and are unable to segment boundary regions precisely. This issue can be attributed to the increase in receptive field size as we go deeper into the encoder. The extra focus on learning high level features causes U-Net based approaches to learn less information about low-level features which are crucial for detecting small structures. To overcome this issue, we propose using an overcomplete convolutional architecture where we project the input image into a higher dimension such that we constrain the receptive field from increasing in the deep layers of the network. We design a new architecture for im- age segmentation- KiU-Net which has two branches: (1) an overcomplete convolutional network Kite-Net which learns to capture fine details and accurate edges of the input, and (2) U-Net which learns high level features. Furthermore, we also propose KiU-Net 3D which is a 3D convolutional architecture for volumetric segmentation. We perform a detailed study of KiU-Net by performing experiments on five different datasets covering various image modalities. We achieve a good performance with an additional benefit of fewer parameters and faster convergence. We also demonstrate that the extensions of KiU-Net based on residual blocks and dense blocks result in further performance improvements. Code: https://github.com/jeya-maria-jose/KiU-Net-pytorch.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a multi-scale residual encoding and decoding network (Ms RED) for skin lesion segmentation, which is able to accurately and reliably segment a variety of lesions with efficiency.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a new method called Dilated Transformer, which conducts self-attention alternately in local and global scopes for pair-wise patch relations capturing.
Abstract: Computer-aided medical image segmentation has been applied widely in diagnosis and treatment to obtain clinically useful information of shapes and volumes of target organs and tissues. In the past several years, convolutional neural network (CNN)-based methods (e.g., U-Net) have dominated this area, but still suffered from inadequate long-range information capturing. Hence, recent work presented computer vision Transformer variants for medical image segmentation tasks and obtained promising performances. Such Transformers modeled long-range dependency by computing pair-wise patch relations. However, they incurred prohibitive computational costs, especially on 3D medical images (e.g., CT and MRI). In this paper, we propose a new method called Dilated Transformer, which conducts self-attention alternately in local and global scopes for pair-wise patch relations capturing. Inspired by dilated convolution kernels, we conduct the global self-attention in a dilated manner, enlarging receptive fields without increasing the patches involved and thus reducing computational costs. Based on this design of Dilated Transformer, we construct a U-shaped encoder–decoder hierarchical architecture called D-Former for 3D medical image segmentation. Experiments on the Synapse and ACDC datasets show that our D-Former model, trained from scratch, outperforms various competitive CNN-based or Transformer-based segmentation models at a low computational cost without time-consuming per-training process.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a skip-level roundtrip sampling block structure with the implementation of convolutional neural networks, thereby constructed a novel pixel level semantic segmentation network called CrackW-Net.
Abstract: Image-based intelligent detection of road cracks with high accuracy and efficiency is vital to the overall condition assessment of the pavement. However, significant problems of continuous cracks interruption and background discrete noise misidentification are frequently observed in current semantic segmentation of pavement cracks, which mainly caused by traditional segmentation convolutional neural networks. This paper proposes a skip-level round-trip sampling block structure with the implementation of convolutional neural networks, thereby constructed a novel pixel level semantic segmentation network called CrackW-Net. After that, two datasets, including the widely recognized Crack500 dataset and a self-built dataset, were used to train two versions CrackW-Net, FCN, U-Net and ResU-Net. Meanwhile, comparative experiments are conducted among all these network models for crack detection. Results show that CrackW-Net without residual block performs the best in the task of pavement crack segmentation.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a boundary-aware context neural network (BA-Net) for 2D medical image segmentation to capture richer context and preserve fine spatial information, which incorporates encoder-decoder architecture.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a multiple instance learning (MIL) framework, which can be trained in an end-to-end manner using training images with image-level labels.
Abstract: Weakly supervised semantic instance segmentation with only image-level supervision, instead of relying on expensive pixel-wise masks or bounding box annotations, is an important problem to alleviate the data-hungry nature of deep learning. In this article, we tackle this challenging problem by aggregating the image-level information of all training images into a large knowledge graph and exploiting semantic relationships from this graph. Specifically, our effort starts with some generic segment-based object proposals (SOP) without category priors. We propose a multiple instance learning (MIL) framework, which can be trained in an end-to-end manner using training images with image-level labels. For each proposal, this MIL framework can simultaneously compute probability distributions and category-aware semantic features, with which we can formulate a large undirected graph. The category of background is also included in this graph to remove the massive noisy object proposals. An optimal multi-way cut of this graph can thus assign a reliable category label to each proposal. The denoised SOP with assigned category labels can be viewed as pseudo instance segmentation of training images, which are used to train fully supervised models. The proposed approach achieves state-of-the-art performance for both weakly supervised instance segmentation and semantic segmentation. The code is available at https://github.com/yun-liu/LIID.

Journal ArticleDOI
TL;DR: An efficient variant named the fuzzy clustering algorithm with variable multi-pixel fitting spatial information (FCM-VMF) is presented, which has extremely high efficiency and has a better prospect of application.

Journal ArticleDOI
TL;DR: A semi-automated and interactive approach based on the spatial Fuzzy C-Means (sFCM) algorithm is proposed, used to segment masses on dynamic contrast-enhanced MRI of the breast, and is confirmed to outperform the competing literature methods.

Journal ArticleDOI
TL;DR: In this paper , an opposition-based golden jackal optimizer (IGJO) was proposed to solve the multilevel thresholding problem using Otsu's method as an objective function.

Journal ArticleDOI
TL;DR: This work proposes an osteosarcoma aided segmentation method based on a guided aggregated bilateral network (OSGABN), which improves the segmentation accuracy of the model and greatly reduces the parameter scale, effectively alleviating the above problems.
Abstract: Osteosarcoma is a primary malignant tumor. It is difficult to cure and expensive to treat. Generally, diagnosis is made by analyzing MRI images of patients. In the process of clinical diagnosis, the mainstream method is the still time-consuming and laborious manual screening. Modern computer image segmentation technology can realize the automatic processing of the original image of osteosarcoma and assist doctors in diagnosis. However, to achieve a better effect of segmentation, the complexity of the model is relatively high, and the hardware conditions in developing countries are limited, so it is difficult to use it directly. Based on this situation, we propose an osteosarcoma aided segmentation method based on a guided aggregated bilateral network (OSGABN), which improves the segmentation accuracy of the model and greatly reduces the parameter scale, effectively alleviating the above problems. The fast bilateral segmentation network (FaBiNet) is used to segment images. It is a high-precision model with a detail branch that captures low-level information and a lightweight semantic branch that captures high-level semantic context. We used more than 80,000 osteosarcoma MRI images from three hospitals in China for detection, and the results showed that our model can achieve an accuracy of around 0.95 and a params of 2.33 M.

Journal ArticleDOI
TL;DR: In this article , a modified remora optimization algorithm (MROA) was proposed for global optimization and image segmentation tasks, which used Brownian motion to promote the exploration ability of ROA and provide a greater opportunity to find the optimal solution.
Abstract: Image segmentation is a key stage in image processing because it simplifies the representation of the image and facilitates subsequent analysis. The multi-level thresholding image segmentation technique is considered one of the most popular methods because it is efficient and straightforward. Many relative works use meta-heuristic algorithms (MAs) to determine threshold values, but they have issues such as poor convergence accuracy and stagnation into local optimal solutions. Therefore, to alleviate these shortcomings, in this paper, we present a modified remora optimization algorithm (MROA) for global optimization and image segmentation tasks. We used Brownian motion to promote the exploration ability of ROA and provide a greater opportunity to find the optimal solution. Second, lens opposition-based learning is introduced to enhance the ability of search agents to jump out of the local optimal solution. To substantiate the performance of MROA, we first used 23 benchmark functions to evaluate the performance. We compared it with seven well-known algorithms regarding optimization accuracy, convergence speed, and significant difference. Subsequently, we tested the segmentation quality of MORA on eight grayscale images with cross-entropy as the objective function. The experimental metrics include peak signal-to-noise ratio (PSNR), structure similarity (SSIM), and feature similarity (FSIM). A series of experimental results have proved that the MROA has significant advantages among the compared algorithms. Consequently, the proposed MROA is a promising method for global optimization problems and image segmentation.

Journal ArticleDOI
TL;DR: In this article , a method for training neural networks to perform image or volume segmentation in which prior knowledge about the topology of the segmented object can be explicitly provided and then incorporated into the training process is introduced.
Abstract: We introduce a method for training neural networks to perform image or volume segmentation in which prior knowledge about the topology of the segmented object can be explicitly provided and then incorporated into the training process. By using the differentiable properties of persistent homology, a concept used in topological data analysis, we can specify the desired topology of segmented objects in terms of their Betti numbers and then drive the proposed segmentations to contain the specified topological features. Importantly this process does not require any ground-truth labels, just prior knowledge of the topology of the structure being segmented. We demonstrate our approach in four experiments: one on MNIST image denoising and digit recognition, one on left ventricular myocardium segmentation from magnetic resonance imaging data from the UK Biobank, one on the ACDC public challenge dataset and one on placenta segmentation from 3-D ultrasound. We find that embedding explicit prior knowledge in neural network segmentation tasks is most beneficial when the segmentation task is especially challenging and that it can be used in either a semi-supervised or post-processing context to extract a useful training gradient from images without pixelwise labels.


Proceedings ArticleDOI
23 May 2022
TL;DR: Dootmaan et al. as discussed by the authors proposed a Mixed Transformer Module (MTM) for simultaneous inter-and intra- affinities learning, which calculates self-affinities efficiently through Local-Global Gaussian-Weighted Self-Attention (LGG-SA) and mines inter-connections between data samples through External Attention (EA).
Abstract: Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous inter- and intra- affinities learning. MTM first calculates self-affinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a novel edge-based reversible re-calibration network (ERRNet), which is characterized by two innovative designs, namely Selective Edge Aggregation (SEA) and Reversible Re-Calibration Unit (RRU), which aim to model the visual perception behavior and achieve effective edge prior and cross-comparison between potential camouflaged regions and background.

Journal ArticleDOI
TL;DR: In this paper , a hybrid active contour model driven by pre-fitting energy with an adaptive edge indicator function and an adaptive sign function is proposed to enable the evolution curve to adjust evolution direction and speed.