scispace - formally typeset
Search or ask a question
Author

Bingzhi Chen

Bio: Bingzhi Chen is an academic researcher from Harbin Institute of Technology. The author has contributed to research in topics: Computer science & Artificial intelligence. The author has an hindex of 3, co-authored 9 publications receiving 70 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes a novel label co-occurrence learning framework based on Graph Convolution Networks (GCNs) to explicitly explore the dependencies between pathologies for the multi-label chest X-ray (CXR) image classification task, which is term the “CheXGCN”.
Abstract: Existing multi-label medical image learning tasks generally contain rich relationship information among pathologies such as label co-occurrence and interdependency, which is of great importance for assisting in clinical diagnosis and can be represented as the graph-structured data. However, most state-of-the-art works only focus on regression from the input to the binary labels, failing to make full use of such valuable graph-structured information due to the complexity of graph data. In this paper, we propose a novel label co-occurrence learning framework based on Graph Convolution Networks (GCNs) to explicitly explore the dependencies between pathologies for the multi-label chest X-ray (CXR) image classification task, which we term the “CheXGCN”. Specifically, the proposed CheXGCN consists of two modules, i.e., the image feature embedding (IFE) module and label co-occurrence learning (LCL) module. Thanks to the LCL model, the relationship between pathologies is generalized into a set of classifier scores by introducing the word embedding of pathologies and multi-layer graph information propagation. During end-to-end training, it can be flexibly integrated into the IFE module and then adaptively recalibrate multi-label outputs with these scores. Extensive experiments on the ChestX-Ray14 and CheXpert datasets have demonstrated the effectiveness of CheXGCN as compared with the state-of-the-art baselines.

79 citations

Journal ArticleDOI
TL;DR: A novel dual asymmetric feature learning network named DualCheXNet is presented for multi-label thoracic disease classification in CXRs and an iterative training strategy is designed to integrate the loss contribution of the involved classifiers into a unified loss, and optimize the process of complementary features learning in an alternative way.

61 citations

Posted Content
TL;DR: Wang et al. as discussed by the authors proposed a dual-scale encoder subnetworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales.
Abstract: Automatic medical image segmentation has made great progress benefit from the development of deep learning. However, most existing methods are based on convolutional neural networks (CNNs), which fail to build long-range dependencies and global context connections due to the limitation of receptive field in convolution operation. Inspired by the success of Transformer in modeling the long-range contextual information, some researchers have expended considerable efforts in designing the robust variants of Transformer-based U-Net. Moreover, the patch division used in vision transformers usually ignores the pixel-level intrinsic structural features inside each patch. To alleviate these problems, we propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet), which might be the first attempt to concurrently incorporate the advantages of hierarchical Swin Transformer into both encoder and decoder of the standard U-shaped architecture to enhance the semantic segmentation quality of varying medical images. Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoder subnetworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales. As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism. Furthermore, we also introduce the Swin Transformer block into decoder to further explore the long-range contextual information during the up-sampling process. Extensive experiments across four typical tasks for medical image segmentation demonstrate the effectiveness of DS-TransUNet, and show that our approach significantly outperforms the state-of-the-art methods.

59 citations

Journal ArticleDOI
TL;DR: By revealing the equivalence of the region- level attention (RLA) and channel-level attention (CLA), it is found that the RLA is available as priors for object localization while the CLA implicitly provides high weights to the attractive channels, which both enable lesion location attention excitation.
Abstract: Traditional clinical experiences have shown the benefit of lesion location attention for improving clinical diagnosis tasks. Inspired by this point of interest, in this paper we propose a novel lesion location attention guided network named LLAGnet to focus on the discriminative features from lesion locations for multi-label thoracic disease classification in chest X-rays (CXRs). By revealing the equivalence of the region-level attention (RLA) and channel-level attention (CLA), we find that the RLA is available as priors for object localization while the CLA implicitly provides high weights to the attractive channels, which both enable lesion location attention excitation. To integrate the advantages from both mechanisms, the proposed LLAGnet is structured with two corresponding attention modules, i.e., the RLA and CLA modules. Specifically, the RLA module consists of the global and local branches. And the weakly supervised attention mechanism embedded in the global branch can obtain visual regions of lesion locations by back-propagating gradients. Then the optimal attention region is amplified and applied to the local branch to provide more fine-grained features for the image classification. Finally, the CLA module adaptively enhances the weights of channel-wise features from the lesion locations by modeling interdependencies among channels. Extensive experiments on the ChestX-ray14 dataset clearly substantiate the effectiveness of LLAGnet as compared with the state-of-the-art baselines.

43 citations

Proceedings ArticleDOI
Qi Cao1, Mixiao Hou1, Bingzhi Chen1, Zheng Zhang1, Guangming Lu1 
06 Jun 2021
TL;DR: In this paper, a hierarchical network called HNSD was proposed to integrate the static and dynamic features for automatic speech emotion recognition (SER) and achieved state-of-the-art performance on IEMOCAP benchmark dataset.
Abstract: Many studies on automatic speech emotion recognition (SER) have been devoted to extracting meaningful emotional features for generating emotion-relevant representations. However, they generally ignore the complementary learning of static and dynamic features, leading to limited performances. In this paper, we propose a novel hierarchical network called HNSD that can efficiently integrate the static and dynamic features for SER. Specifically, the proposed HNSD framework consists of three different modules. To capture the discriminative features, an effective encoding module is firstly designed to simultaneously encode both static and dynamic features. By taking the obtained features as inputs, the Gated Multi-features Unit (GMU) is conducted to explicitly determine the emotional intermediate representations for frame-level features fusion, instead of directly fusing these acoustic features. In this way, the learned static and dynamic features can jointly and comprehensively generate the unified feature representations. Benefiting from a well-designed attention mechanism, the last classification module is applied to predict the emotional states at the utterance level. Extensive experiments on the IEMOCAP benchmark dataset demonstrate the superiority of our method in comparison with state-of-the-art baselines.

18 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a review of deep learning on chest X-ray images is presented, focusing on image-level prediction (classification and regression), segmentation, localization, image generation and domain adaptation.

121 citations

Journal ArticleDOI
TL;DR: An overview of explainable artificial intelligence (XAI) used in deep learning-based medical image analysis can be found in this article , where a framework of XAI criteria is introduced to classify deep learning based medical image classification methods.

94 citations

Posted Content
TL;DR: An overview of eXplainable Artificial Intelligence (XAI) used in deep learning-based medical image analysis is presented in this paper, where a framework of XAI criteria is introduced to classify deep learning based methods.
Abstract: With an increase in deep learning-based methods, the call for explainability of such methods grows, especially in high-stakes decision making areas such as medical image analysis. This survey presents an overview of eXplainable Artificial Intelligence (XAI) used in deep learning-based medical image analysis. A framework of XAI criteria is introduced to classify deep learning-based medical image analysis methods. Papers on XAI techniques in medical image analysis are then surveyed and categorized according to the framework and according to anatomical location. The paper concludes with an outlook of future opportunities for XAI in medical image analysis.

92 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel label co-occurrence learning framework based on Graph Convolution Networks (GCNs) to explicitly explore the dependencies between pathologies for the multi-label chest X-ray (CXR) image classification task, which is term the “CheXGCN”.
Abstract: Existing multi-label medical image learning tasks generally contain rich relationship information among pathologies such as label co-occurrence and interdependency, which is of great importance for assisting in clinical diagnosis and can be represented as the graph-structured data. However, most state-of-the-art works only focus on regression from the input to the binary labels, failing to make full use of such valuable graph-structured information due to the complexity of graph data. In this paper, we propose a novel label co-occurrence learning framework based on Graph Convolution Networks (GCNs) to explicitly explore the dependencies between pathologies for the multi-label chest X-ray (CXR) image classification task, which we term the “CheXGCN”. Specifically, the proposed CheXGCN consists of two modules, i.e., the image feature embedding (IFE) module and label co-occurrence learning (LCL) module. Thanks to the LCL model, the relationship between pathologies is generalized into a set of classifier scores by introducing the word embedding of pathologies and multi-layer graph information propagation. During end-to-end training, it can be flexibly integrated into the IFE module and then adaptively recalibrate multi-label outputs with these scores. Extensive experiments on the ChestX-Ray14 and CheXpert datasets have demonstrated the effectiveness of CheXGCN as compared with the state-of-the-art baselines.

79 citations

Journal ArticleDOI
TL;DR: This paper argues that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges, and formulate the multi-label thoracic disease classification problem as weighted independent binary tasks according to the categories, enabling superior knowledge mining ability.
Abstract: Deep learning approaches have demonstrated remarkable progress in automatic Chest X-ray analysis. The data-driven feature of deep models requires training data to cover a large distribution. Therefore, it is substantial to integrate knowledge from multiple datasets, especially for medical images. However, learning a disease classification model with extra Chest X-ray (CXR) data is yet challenging. Recent researches have demonstrated that performance bottleneck exists in joint training on different CXR datasets, and few made efforts to address the obstacle. In this paper, we argue that incorporating an external CXR dataset leads to imperfect training data, which raises the challenges. Specifically, the imperfect data is in two folds: domain discrepancy , as the image appearances vary across datasets; and label discrepancy , as different datasets are partially labeled. To this end, we formulate the multi-label thoracic disease classification problem as weighted independent binary tasks according to the categories. For common categories shared across domains, we adopt task-specific adversarial training to alleviate the feature differences. For categories existing in a single dataset, we present uncertainty-aware temporal ensembling of model predictions to mine the information from the missing labels further. In this way, our framework simultaneously models and tackles the domain and label discrepancies, enabling superior knowledge mining ability. We conduct extensive experiments on three datasets with more than 360,000 Chest X-ray images. Our method outperforms other competing models and sets state-of-the-art performance on the official NIH test set with 0.8349 AUC, demonstrating its effectiveness of utilizing the external dataset to improve the internal classification.

63 citations