scispace - formally typeset
Search or ask a question
Author

Yinghao Zhang

Bio: Yinghao Zhang is an academic researcher from Xiamen University. The author has contributed to research in topics: Deep learning & Transfer of learning. The author has an hindex of 1, co-authored 4 publications receiving 5 citations.

Papers
More filters
Posted ContentDOI
04 Nov 2021-medRxiv
TL;DR: The authorsERS The authors is a cross-supervised approach for radiograph analysis supported by deep learning, which employs a vision transformer and is designed to learn joint representations from multiple views within every patient study.
Abstract: Pre-training lays the foundation for recent successes in radiograph analysis supported by deep learning. It learns transferable image representations by conducting large-scale fully-supervised or self-supervised learning on a source domain. However, supervised pre-training requires a complex and labor intensive two-stage human-assisted annotation process while self-supervised learning cannot compete with the supervised paradigm. To tackle these issues, we propose a cross-supervised methodology named REviewing FreE-text Reports for Supervision (REFERS), which acquires free supervision signals from original radiology reports accompanying the radiographs. The proposed approach employs a vision transformer and is designed to learn joint representations from multiple views within every patient study. REFERS outperforms its transfer learning and self-supervised learning counterparts on 4 well-known X-ray datasets under extremely limited supervision. Moreover, REFERS even surpasses methods based on a source domain of radiographs with human-assisted structured labels. Thus REFERS has the potential to replace canonical pre-training methodologies.

22 citations

Posted Content
TL;DR: The authorsERS The authors is a cross-supervised approach for radiograph analysis supported by deep learning, which employs a vision transformer and is designed to learn joint representations from multiple views within every patient study.
Abstract: Pre-training lays the foundation for recent successes in radiograph analysis supported by deep learning. It learns transferable image representations by conducting large-scale fully-supervised or self-supervised learning on a source domain. However, supervised pre-training requires a complex and labor intensive two-stage human-assisted annotation process while self-supervised learning cannot compete with the supervised paradigm. To tackle these issues, we propose a cross-supervised methodology named REviewing FreE-text Reports for Supervision (REFERS), which acquires free supervision signals from original radiology reports accompanying the radiographs. The proposed approach employs a vision transformer and is designed to learn joint representations from multiple views within every patient study. REFERS outperforms its transfer learning and self-supervised learning counterparts on 4 well-known X-ray datasets under extremely limited supervision. Moreover, REFERS even surpasses methods based on a source domain of radiographs with human-assisted structured labels. Thus REFERS has the potential to replace canonical pre-training methodologies.

10 citations

Posted Content
TL;DR: Not-a-Nother transformer (nnnformer) as discussed by the authors combines self-attention and convolution to learn volumetric representations from 3D local volumes.
Abstract: Transformers, the default model of choices in natural language processing, have drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks (convnets) to overcome its inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations without investigating how to optimally combine self-attention (i.e., the core of transformers) with convolution. To address this issue, in this paper, we introduce nnFormer (i.e., Not-aNother transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution. In practice, nnFormer learns volumetric representations from 3D local volumes. Compared to the naive voxel-level self-attention implementation, such volume-based operations help to reduce the computational complexity by approximate 98% and 99.5% on Synapse and ACDC datasets, respectively. In comparison to prior-art network configurations, nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC. For instance, nnFormer outperforms Swin-UNet by over 7 percents on Synapse. Even when compared to nnUNet, currently the best performing fully-convolutional medical segmentation network, nnFormer still provides slightly better performance on Synapse and ACDC.

7 citations

Posted ContentDOI
Lin Y1, Shaomao Lv1, Juan Wang1, Jianghe Kang1, Yinghao Zhang1, Zhi-Peng Feng1 
06 Apr 2020-medRxiv
TL;DR: The abnormalities of GGOs with peripleural distribution, consolidated areas, septal thickening, pleural involvement and intralesional vasodilatation on UHR-CT indicate the diagnosis of COVID-19.
Abstract: Background An ongoing outbreak of mystery pneumonia in Wuhan was caused by coronavirus disease 2019 (COVID-19). The infectious disease has spread globally and become a major threat to public health. Purpose We aim to investigate the ultra-high-resolution CT (UHR-CT) findings of imported COVID-19 related pneumonia from the initial diagnosis to early-phase follow-up. Methods This retrospective study included confirmed cases with early-stage COVID-19 related pneumonia imported from the epicenter. Initial and early-phase follow-up UHR-CT scans (within 5 days) were reviewed for characterizing the radiological findings. The normalized total volumes of ground-glass opacities (GGOs) and consolidations were calculated and compared during the radiological follow-up by artificial-intelligence-based methods. Results Eleven patients (3 males and 8 females, aged 32-74 years) with confirmed COVID-19 were evaluated. Subpleural GGOs with inter/intralobular septal thickening were typical imaging findings. Other diagnostic CT features included distinct margins (8/11, 73%), pleural retraction or thickening (7/11, 64%), intralesional vasodilatation (6/11, 55%). Normalized volumes of pulmonary GGOs (p=0.003) and consolidations (p=0.003) significantly increased during the CT follow-up. Conclusions The abnormalities of GGOs with peripleural distribution, consolidated areas, septal thickening, pleural involvement and intralesional vasodilatation on UHR-CT indicate the diagnosis of COVID-19. COVID-19 cases could manifest significantly progressed GGOs and consolidations with increased volume during the early-phase CT follow-up.

1 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The diagnostic accuracy of chest imaging (computed tomography, X-ray and ultrasound) in people with suspected or confirmed COVID-19 disease was determined and uncertainty of the accuracy estimates was presented using 95% confidence intervals (CIs).
Abstract: Background: The respiratory illness caused by SARS-CoV-2 infection continues to present diagnostic challenges. Our 2020 edition of this review showed thoracic (chest) imaging to be sensitive and moderately specific in the diagnosis of coronavirus disease 2019 (COVID-19). In this update, we include new relevant studies, and have removed studies with case-control designs, and those not intended to be diagnostic test accuracy studies. Objectives: To evaluate the diagnostic accuracy of thoracic imaging (computed tomography (CT), X-ray and ultrasound) in people with suspected COVID-19. Search methods: We searched the COVID-19 Living Evidence Database from the University of Bern, the Cochrane COVID-19 Study Register, The Stephen B. Thacker CDC Library, and repositories of COVID-19 publications through to 30 September 2020. We did not apply any language restrictions. Selection criteria: We included studies of all designs, except for case-control, that recruited participants of any age group suspected to have COVID-19 and that reported estimates of test accuracy or provided data from which we could compute estimates. Data collection and analysis: The review authors independently and in duplicate screened articles, extracted data and assessed risk of bias and applicability concerns using the QUADAS-2 domain-list. We presented the results of estimated sensitivity and specificity using paired forest plots, and we summarised pooled estimates in tables. We used a bivariate meta-analysis model where appropriate. We presented the uncertainty of accuracy estimates using 95% confidence intervals (CIs). Main results: We included 51 studies with 19,775 participants suspected of having COVID-19, of whom 10,155 (51%) had a final diagnosis of COVID-19. Forty-seven studies evaluated one imaging modality each, and four studies evaluated two imaging modalities each. All studies used RT-PCR as the reference standard for the diagnosis of COVID-19, with 47 studies using only RT-PCR and four studies using a combination of RT-PCR and other criteria (such as clinical signs, imaging tests, positive contacts, and follow-up phone calls) as the reference standard. Studies were conducted in Europe (33), Asia (13), North America (3) and South America (2); including only adults (26), all ages (21), children only (1), adults over 70 years (1), and unclear (2); in inpatients (2), outpatients (32), and setting unclear (17). Risk of bias was high or unclear in thirty-two (63%) studies with respect to participant selection, 40 (78%) studies with respect to reference standard, 30 (59%) studies with respect to index test, and 24 (47%) studies with respect to participant flow. For chest CT (41 studies, 16,133 participants, 8110 (50%) cases), the sensitivity ranged from 56.3% to 100%, and specificity ranged from 25.4% to 97.4%. The pooled sensitivity of chest CT was 87.9% (95% CI 84.6 to 90.6) and the pooled specificity was 80.0% (95% CI 74.9 to 84.3). There was no statistical evidence indicating that reference standard conduct and definition for index test positivity were sources of heterogeneity for CT studies. Nine chest CT studies (2807 participants, 1139 (41%) cases) used the COVID-19 Reporting and Data System (CO-RADS) scoring system, which has five thresholds to define index test positivity. At a CO-RADS threshold of 5 (7 studies), the sensitivity ranged from 41.5% to 77.9% and the pooled sensitivity was 67.0% (95% CI 56.4 to 76.2); the specificity ranged from 83.5% to 96.2%; and the pooled specificity was 91.3% (95% CI 87.6 to 94.0). At a CO-RADS threshold of 4 (7 studies), the sensitivity ranged from 56.3% to 92.9% and the pooled sensitivity was 83.5% (95% CI 74.4 to 89.7); the specificity ranged from 77.2% to 90.4% and the pooled specificity was 83.6% (95% CI 80.5 to 86.4). For chest X-ray (9 studies, 3694 participants, 2111 (57%) cases) the sensitivity ranged from 51.9% to 94.4% and specificity ranged from 40.4% to 88.9%. The pooled sensitivity of chest X-ray was 80.6% (95% CI 69.1 to 88.6) and the pooled specificity was 71.5% (95% CI 59.8 to 80.8). For ultrasound of the lungs (5 studies, 446 participants, 211 (47%) cases) the sensitivity ranged from 68.2% to 96.8% and specificity ranged from 21.3% to 78.9%. The pooled sensitivity of ultrasound was 86.4% (95% CI 72.7 to 93.9) and the pooled specificity was 54.6% (95% CI 35.3 to 72.6). Based on an indirect comparison using all included studies, chest CT had a higher specificity than ultrasound. For indirect comparisons of chest CT and chest X-ray, or chest X-ray and ultrasound, the data did not show differences in specificity or sensitivity. Authors' conclusions: Our findings indicate that chest CT is sensitive and moderately specific for the diagnosis of COVID-19. Chest X-ray is moderately sensitive and moderately specific for the diagnosis of COVID-19. Ultrasound is sensitive but not specific for the diagnosis of COVID-19. Thus, chest CT and ultrasound may have more utility for excluding COVID-19 than for differentiating SARS-CoV-2 infection from other causes of respiratory illness. Future diagnostic accuracy studies should pre-define positive imaging findings, include direct comparisons of the various modalities of interest in the same participant population, and implement improved reporting practices.

176 citations

Journal ArticleDOI
TL;DR: In this article , the authors outline the key applications enabled by multimodal artificial intelligence, along with the technical and analytical challenges, and survey the data, modeling and privacy challenges that must be overcome to realize the full potential of multimodAL artificial intelligence in health.
Abstract: The increasing availability of biomedical data from large biobanks, electronic health records, medical imaging, wearable and ambient biosensors, and the lower cost of genome and microbiome sequencing have set the stage for the development of multimodal artificial intelligence solutions that capture the complexity of human health and disease. In this Review, we outline the key applications enabled, along with the technical and analytical challenges. We explore opportunities in personalized medicine, digital clinical trials, remote monitoring and care, pandemic surveillance, digital twin technology and virtual health assistants. Further, we survey the data, modeling and privacy challenges that must be overcome to realize the full potential of multimodal artificial intelligence in health.

86 citations

Journal ArticleDOI
TL;DR: A comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective and divides their applications into categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images.
Abstract: Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e.g.,} social network analysis and recommender systems), computer vision (\emph{e.g.,} object detection and point cloud learning), and natural language processing (\emph{e.g.,} relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, \emph{i.e.,} 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.

16 citations

Proceedings ArticleDOI
12 Oct 2022
TL;DR: A novel M ulti- G ranularity C ross-modal A lignment (MGCA) framework for generalized medical visual representation learning by harnessing the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels, i.e. , pathological region-level, instance- level, and disease-level.
Abstract: Learning medical visual representations directly from paired radiology reports has become an emerging topic in representation learning. However, existing medical image-text joint learning methods are limited by instance or local supervision analysis, ignoring disease-level semantic correspondences. In this paper, we present a novel Multi-Granularity Cross-modal Alignment (MGCA) framework for generalized medical visual representation learning by harnessing the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels, i.e., pathological region-level, instance-level, and disease-level. Specifically, we first incorporate the instance-wise alignment module by maximizing the agreement between image-report pairs. Further, for token-wise alignment, we introduce a bidirectional cross-attention strategy to explicitly learn the matching between fine-grained visual tokens and text tokens, followed by contrastive learning to align them. More important, to leverage the high-level inter-subject relationship semantic (e.g., disease) correspondences, we design a novel cross-modal disease-level alignment paradigm to enforce the cross-modal cluster assignment consistency. Extensive experimental results on seven downstream medical image datasets covering image classification, object detection, and semantic segmentation tasks demonstrate the stable and superior performance of our framework.

12 citations

Posted ContentDOI
31 Aug 2022-medRxiv
TL;DR: This study quantitatively examines the correlation between automated metrics and the scoring of reports by radiologists, and proposes a composite metric, called RadCliQ, that is able to rank the quality of reports similarly to radiologists and better than existing metrics.
Abstract: The application of AI to medical image interpretation tasks has largely been limited to the identification of a handful of individual pathologies. In contrast, the generation of complete narrative radiology reports more closely matches how radiologists communicate diagnostic information in clinical workflows. Recent progress in artificial intelligence (AI) on vision-language tasks has enabled the possibility of generating high-quality radiology reports from medical images. Automated metrics to evaluate the quality of generated reports attempt to capture overlap in the language or clinical entities between a machine-generated report and a radiologist-generated report. In this study, we quantitatively examine the correlation between automated metrics and the scoring of reports by radiologists. We analyze failure modes of the metrics, namely the types of information the metrics do not capture, to understand when to choose particular metrics and how to interpret metric scores. We propose a composite metric, called RadCliQ, that we find is able to rank the quality of reports similarly to radiologists and better than existing metrics. Lastly, we measure the performance of state-of-the-art report generation approaches using the investigated metrics. We expect that our work can guide both the evaluation and the development of report generation systems that can generate reports from medical images approaching the level of radiologists.

12 citations