Home
/
Authors
/
Jiayu Xu

Author

Jiayu Xu

Bio: Jiayu Xu is an academic researcher. The author has contributed to research in topics: Computer science & Artificial intelligence. The author has an hindex of 1, co-authored 1 publications receiving 3 citations.

Papers

PDF

Open Access

More filters

Posted Content•

DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation.

[...]

Ailiang Lin, Bingzhi Chen¹, Jiayu Xu, Zheng Zhang¹, Guangming Lu - Show less +1 more•Institutions (1)

Harbin Institute of Technology¹

12 Jun 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as discussed by the authors proposed a dual-scale encoder subnetworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales.

...read moreread less

Abstract: Automatic medical image segmentation has made great progress benefit from the development of deep learning. However, most existing methods are based on convolutional neural networks (CNNs), which fail to build long-range dependencies and global context connections due to the limitation of receptive field in convolution operation. Inspired by the success of Transformer in modeling the long-range contextual information, some researchers have expended considerable efforts in designing the robust variants of Transformer-based U-Net. Moreover, the patch division used in vision transformers usually ignores the pixel-level intrinsic structural features inside each patch. To alleviate these problems, we propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet), which might be the first attempt to concurrently incorporate the advantages of hierarchical Swin Transformer into both encoder and decoder of the standard U-shaped architecture to enhance the semantic segmentation quality of varying medical images. Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoder subnetworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales. As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism. Furthermore, we also introduce the Swin Transformer block into decoder to further explore the long-range contextual information during the up-sampling process. Extensive experiments across four typical tasks for medical image segmentation demonstrate the effectiveness of DS-TransUNet, and show that our approach significantly outperforms the state-of-the-art methods.

...read moreread less

59 citations

Journal Article•DOI•

A Serial Attention Frame for Multi-Label Waste Bottle Classification

[...]

Jingyu Xiao, Jiayu Xu, Chunwei Tian, Peiyi Han, Lei You, Shichao Zhang - Show less +2 more

08 Feb 2022-Applied Sciences

TL;DR: A serial attention frame (SAF) containing MAB and SAB is presented to address the effect of the complex background of waste bottle recognition and exhibited good recognition performance in the collected waste bottle datasets.

...read moreread less

Abstract: The multi-label recognition of damaged waste bottles has important significance in environmental protection. However, most of the previous methods are known for their poor performance, especially in regards to damaged waste bottle classification. In this paper, we propose the use of a serial attention frame (SAF) to overcome the mentioned drawback. The proposed network architecture includes the following three parts: a residual learning block (RB), a mixed attention block (MAB), and a self-attention block (SAB). The RB uses ResNet to pretrain the SAF to extract more detailed information. To address the effect of the complex background of waste bottle recognition, a serial attention mechanism containing MAB and SAB is presented. MAB is used to extract more salient category information via the simultaneous use of spatial attention and channel attention. SAB exploits the obtained features and its parameters to enable the diverse features to improve the classification results of waste bottles. The experimental results demonstrate that our proposed model exhibited good recognition performance in the collected waste bottle datasets, with eight labels of three classifications, i.e., the color, whether the bottle was damage, and whether the wrapper had been removed, as well as public image classification datasets.

...read moreread less

6 citations

Journal Article•DOI•

Attention-based CNNs for Image Classification: A Survey

[...]

Menghua Zheng, Jiayu Xu, Yinjie Shen, Chunwei Tian, Jian-Bing Li, Lunke Fei, Ming Zong, Xiaoyang Liu - Show less +4 more

01 Jan 2022-Journal of physics

TL;DR: An attention mechanism acts a CNN for image classification and the main architecture of CNNs with attentions, public and collected datasets, experimental results in image classification are given.

...read moreread less

Abstract: Deep learning techniques as well as CNNs can learn power context information, they have been widely applied in image recognition. However, deep CNNs may reply on large width and large depth, which may increase computational costs. Attention mechanism fused into CNNs can address this problem. In this paper, we summary an attention mechanism acts a CNN for image classification. Firstly, the survey shows the development of CNNs for image classification. Then, we illustrate basis of CNNs and attention mechanisms for image classification. Next, we give the main architecture of CNNs with attentions, public and our collected datasets, experimental results in image classification. Finally, we point out potential research points, challenges attention-based for image classification and summary the whole paper.

...read moreread less

5 citations

Book Chapter•DOI•

ConTrans: Improving Transformer with Convolutional Attention for Medical Image Segmentation

[...]

Ai-Jun Lin, Jiayu Xu, Jinxing Li, Gang Lu

01 Jan 2022

TL;DR: ConTrans as discussed by the authors proposes a concurrent structure which consists of two parallel encoders, i.e., a Swin Transformer encoder and a CNN encoder, which can couple detailed localization information with global contexts to the maximum extent.

...read moreread less

Abstract: Over the past few years, convolution neural networks (CNNs) and vision transformers (ViTs) have been two dominant architectures in medical image segmentation. Although CNNs can efficiently capture local representations, they experience difficulty establishing long-distance dependencies. Comparably, ViTs achieve impressive success owing to their powerful global contexts modeling capabilities, but they may not generalize well on insufficient datasets due to the lack of inductive biases inherent to CNNs. To inherit the merits of these two different design paradigms while avoiding their respective limitations, we propose a concurrent structure termed ConTrans, which can couple detailed localization information with global contexts to the maximum extent. ConTrans consists of two parallel encoders, i.e., a Swin Transformer encoder and a CNN encoder. Specifically, the CNN encoder is progressively stacked by the novel Depthwise Attention Block (DAB), with the aim to provide the precise local features we need. Furthermore, a well-designed Spatial-Reduction-Cross-Attention (SRCA) module is embedded in the decoder to form a comprehensive fusion of these two distinct feature representations and eliminate the semantic divergence between them. This allows to obtain accurate semantic information and ensure the up-sampling features with semantic consistency in a hierarchical manner. Extensive experiments across four typical tasks show that ConTrans significantly outperforms state-of-the-art methods on ten famous benchmarks.

...read moreread less

4 citations

Journal Article•DOI•

Context-aware learning for cancer cell nucleus recognition in pathology images

[...]

Tian Bai, Jiayu Xu, Zhenting Zhang, Shuyu Guo, Xiao Luo - Show less +1 more

21 Mar 2022-Bioinformatics

TL;DR: A novel framework with context to locate and classify nuclei in microscopy image data is proposed and experimental results demonstrate that the method outperforms other recent state-of-the-art models in nucleus identification.

...read moreread less

Abstract: MOTIVATION Nucleus identification supports many quantitative analysis studies that rely on nuclei positions or categories. Contextual information in pathology images refers to information near the to-be-recognized cell, which can be very helpful for nucleus subtyping. Current CNN-based methods do not explicitly encode contextual information within the input images and point annotations. RESULTS In this paper, we propose a novel framework with context to locate and classify nuclei in microscopy image data. Specifically, first we use state-of-the-art network architectures to extract multi-scale feature representations from multi-field-of-view, multi-resolution input images and then conduct feature aggregation on-the-fly with stacked convolutional operations. Then, two auxiliary tasks are added to the model to effectively utilize the contextual information. One for predicting the frequencies of nuclei, and the other for extracting the regional distribution information of the same kind of nuclei. The entire framework is trained in an end-to-end, pixel-to-pixel fashion. We evaluate our method on two histopathological image datasets with different tissue and stain preparations, and experimental results demonstrate that our method outperforms other recent state-of-the-art models in nucleus identification. AVAILABILITY The source code of our method is freely available at https://github.com/qjxjy123/DonRabbit. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

...read moreread less

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

[...]

Jun Li, Junyu Chen, Yucheng Tang, Bennett A. Landman, S. Kevin Zhou - Show less +1 more

02 Jun 2022

TL;DR: A comprehensive review of the state-of-the-art Transformer-based approaches for medical imaging is presented in this paper , where the Transformer's key defining properties are compared with CNNs and its type of architecture, which specifies the manner in which the Transformers and CNN are combined.

...read moreread less

Abstract: Transformer, one of the latest technological advances of deep learning, has gained prevalence in natural language processing or computer vision. Since medical imaging bear some resemblance to computer vision, it is natural to inquire about the status quo of Transformers in medical imaging and ask the question: can the Transformer models transform medical imaging? In this paper, we attempt to make a response to the inquiry. After a brief introduction of the fundamentals of Transformers, especially in comparison with convolutional neural networks (CNNs), and highlighting key defining properties that characterize the Transformers, we offer a comprehensive review of the state-of-the-art Transformer-based approaches for medical imaging and exhibit current research progresses made in the areas of medical image segmentation, recognition, detection, registration, reconstruction, enhancement, etc. In particular, what distinguishes our review lies in its organization based on the Transformer's key defining properties, which are mostly derived from comparing the Transformer and CNN, and its type of architecture, which specifies the manner in which the Transformer and CNN are combined, all helping the readers to best understand the rationale behind the reviewed approaches. We conclude with discussions of future perspectives.

...read moreread less

37 citations

Journal Article•DOI•

A Deep Learning-Based Model for Date Fruit Classification

[...]

Khalied AlBarrak, Yonis Gulzar, Yasir Hamid, Abid Mehmood, Arjumand Bano Soomro - Show less +1 more

23 May 2022-Sustainability

TL;DR: In this article , the authors proposed an efficient date classification model based on MobileNetV2 architecture, which achieved 99% accuracy on eight different classes of date fruit and compared with other existing models such as AlexNet, VGG16, InceptionV3, ResNet, and MobileNetv2.

...read moreread less

Abstract: A total of 8.46 million tons of date fruit are produced annually around the world. The date fruit is considered a high-valued confectionery and fruit crop. The hot arid zones of Southwest Asia, North Africa, and the Middle East are the major producers of date fruit. The production of dates in 1961 was 1.8 million tons, which increased to 2.8 million tons in 1985. In 2001, the production of dates was recorded at 5.4 million tons, whereas recently it has reached 8.46 million tons. A common problem found in the industry is the absence of an autonomous system for the classification of date fruit, resulting in reliance on only the manual expertise, often involving hard work, expense, and bias. Recently, Machine Learning (ML) techniques have been employed in such areas of agriculture and fruit farming and have brought great convenience to human life. An automated system based on ML can carry out the fruit classification and sorting tasks that were previously handled by human experts. In various fields, CNNs (convolutional neural networks) have achieved impressive results in image classification. Considering the success of CNNs and transfer learning in other image classification problems, this research also employs a similar approach and proposes an efficient date classification model. In this research, a dataset of eight different classes of date fruit has been created to train the proposed model. Different preprocessing techniques have been applied in the proposed model, such as image augmentation, decayed learning rate, model checkpointing, and hybrid weight adjustment to increase the accuracy rate. The results show that the proposed model based on MobileNetV2 architecture has achieved 99% accuracy. The proposed model has also been compared with other existing models such as AlexNet, VGG16, InceptionV3, ResNet, and MobileNetV2. The results prove that the proposed model performs better than all other models in terms of accuracy.

...read moreread less

30 citations

Journal Article•DOI•

Swin-MFINet: Swin transformer based multi-feature integration network for detection of pixel-level surface defects

[...]

Hüseyin Üzen, Muammer Türkoğlu, Berrin Yanikoglu, Davut Hanbay

01 Jul 2022-Expert Systems With Applications

TL;DR: Wang et al. as mentioned in this paper proposed a multi-feature integration network (Swin-MFINet), which consists of an encoder, a Swin transformer-based decoder, and multi-Feature integration (MFI) modules.

...read moreread less

Abstract: Automatic surface defect detection is critical for manufacturing industries, such as steel, fabric, and marble industries. This study proposes a Swin transformer-based model called Multi-Feature Integration Network (Swin-MFINet) for pixel-level surface defect detection. The proposed model consists of an encoder, a Swin transformer-based decoder, and Multi-Feature Integration (MFI) modules. In the encoder module of the proposed model, a pre-trained Inception network is used to extract key features from small-size datasets. In the decoder section, global semantic features are obtained from the initial features by using the Swin-transformer block, which is the newest transformer technology of today. In addition, the convolution layer is used in the last step of the decoder, since transformers are limited in acquiring small spatial details such as edges, colors, and textures, which are important in detecting some small defects. In the last module called MFI, feature maps from different decoder stages are combined, and the channel squeeze-spatial excitation block is applied to reveal important features. Finally, a prediction map is obtained by applying a convolution layer and sigmoid activation function to the MFI module output, respectively. The performance of proposed model is analyzed over MT and MVTec datasets containing surface defect images. The proposed model obtained mIoU scores of 81.37%, and 77.07% respectively, for these two datasets These results outperform the state-of-the-art for the surface defect detection problem.

...read moreread less

14 citations

Proceedings Article•DOI•

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

[...]

01 Jan 2023

TL;DR: In this article , the authors propose Hiformer, a novel method that efficiently bridges a CNN and a transformer for medical image segmentation, which designs two multi-scale feature representations using the seminal Swin Transformer module and a CNN-based encoder.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have been the consensus for medical image segmentation tasks. However, they suffer from the limitation in modeling long-range dependencies and spatial correlations due to the nature of convolution operation. Although transformers were first developed to address this issue, they fail to capture low-level features. In contrast, it is demonstrated that both local and global features are crucial for dense prediction, such as segmenting in challenging contexts. In this paper, we propose HiFormer, a novel method that efficiently bridges a CNN and a transformer for medical image segmentation. Specifically, we design two multi-scale feature representations using the seminal Swin Transformer module and a CNN-based encoder. To secure a fine fusion of global and local features obtained from the two aforementioned representations, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure. Extensive experiments on various medical image segmentation datasets demonstrate the effectiveness of HiFormer over other CNN-based, transformer-based, and hybrid methods in terms of computational complexity, quantitative and qualitative results. Our code is publicly available at GitHub.

...read moreread less

8 citations

Posted Content•

nnFormer: Interleaved Transformer for Volumetric Segmentation

[...]

Hong-Yu Zhou, Jiansen Guo¹, Yinghao Zhang¹, Lequan Yu², Liansheng Wang¹, Yizhou Yu³ - Show less +2 more•Institutions (3)

Xiamen University¹, University of Hong Kong², Association for Computing Machinery³

07 Sep 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Not-a-Nother transformer (nnnformer) as discussed by the authors combines self-attention and convolution to learn volumetric representations from 3D local volumes.

...read moreread less

Abstract: Transformers, the default model of choices in natural language processing, have drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks (convnets) to overcome its inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations without investigating how to optimally combine self-attention (i.e., the core of transformers) with convolution. To address this issue, in this paper, we introduce nnFormer (i.e., Not-aNother transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution. In practice, nnFormer learns volumetric representations from 3D local volumes. Compared to the naive voxel-level self-attention implementation, such volume-based operations help to reduce the computational complexity by approximate 98% and 99.5% on Synapse and ACDC datasets, respectively. In comparison to prior-art network configurations, nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC. For instance, nnFormer outperforms Swin-UNet by over 7 percents on Synapse. Even when compared to nnUNet, currently the best performing fully-convolutional medical segmentation network, nnFormer still provides slightly better performance on Synapse and ACDC.

...read moreread less

7 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15

Collapse