Clinically Applicable Deep Learning Strategy for Pulmonary Nodule Risk Prediction: Insights into HONORS
TL;DR: The combination of HONORS and FGP-NET provides well-organized stratification for pulmonary nodules and also offers the potential for reducing medical errors.
Abstract: Background and Purpose Limited optimization was clinically applicable for reducing missed diagnosis, misdiagnosis and inter-reader variability in pulmonary nodule diagnosis. We aimed to propose a deep learning-based algorithm and a practical strategy to better stratify the risk of pulmonary nodules, thus reducing medical errors and optimizing the clinical workflow. Materials and Methods A total of 2,348 pulmonary nodules (1,215 with lung cancer) containing screened nodules from National Lung Cancer Screening Trial (NLST) and incidentally detected nodules from Jinling Hospital (JLH) were used to train and evaluate a deep learning algorithm, Filter-guided pyramid network (FGP-NET). Internal and external test of FGP-NET were performed on two independent datasets (n=542). The performance of FGP-NET at Youden point which maximizing the Youden index was compared with 126 board-certificated radiologists. We further proposed Hierarchical Ordered Network ORiented Strategy (HONORS), which manipulates the emphasis either on sensitivity or specificity to target risk-stratified clinical scenarios, directly making decisions for some patients. Results FGP-NET achieved a high area under the curve (AUC) of 0.969 and 0.855 for internal and external testing, and was comparable or even outperformed the radiologists when considering sensitivity. HONORS-guided FGP-NET identified benign nodules with a high sensitivity (95.5%) in the screening scenario, and demonstrated satisfactory performance for the rest ambiguous nodules with 0.945 of AUC by the Youden point. FGP-NET also detected lung cancer with a high specificity of 94.5% in routine diagnostic scenario; an AUC of 0.809 was achieved for the rest nodules. Conclusion The combination of HONORS and FGP-NET provides well-organized stratification for pulmonary nodules and also offers the potential for reducing medical errors. Highlights Pulmonary nodules were managed for both screening and diagnostic scenarios Proposal of a hierarchical strategy for targeting risk-stratified clinical scenarios A large scale Human-deep learning contest for reliable performance evaluation
Summary (3 min read)
- Lung cancer is one of the commonest cancers and the leading cause of cancer mortality worldwide 1,2.
- Every step from screening to final diagnosis is indispensable which makes the entire process time- and labor-consuming.
- The desire to improve the efficacy and efficiency of clinical care continues to drive multiple innovations into practice, including deep learning (DL).
- Most of previous works focused on the screening population, and studies based on incidentally detected nodules in routine diagnostic scenario with higher risk were limited.
- The benign nodules or lung cancer can be accurately identified in screening and routine diagnostic scenarios in step-1 and further stratification of ambiguous nodules was performed to aid clinical decision making in step-2.
- Ethical approval was obtained for this retrospective study, and informed consent was waived for reviewing patients’ medical records.
- The authors retrospectively analyzed 16,801 patients who underwent surgery or biopsy due to lung lesions from May 2009 to June 2018 in Jinling hospital.
- The authors analyzed 1,060 patients confirmed as having lung cancer and 5,275 patients as not in the NLST.
- Details for the data curation are available in the Supplementary Information, Inclusion and exclusion criteria.
Annotation and preprocessing
- Automatic nodule detection was performed using the Dr. wise platform 17 (the detection network could be found in the supplementary information, Pulmonary nodule detection network) and the geometric centers of the enrolled nodules were further revised by two radiologists.
- Due to the differences in pixel spacing and slice thickness, the CT images were subsequently linearly interpolated into 3D isotropic images with voxel spacing of 0.6 × 0.6 × 0.6 mm3.
- Since the datasets were relatively small compared to traditional image classification datasets for deep learning such as cifar1018 and ImageNet19, the authors conducted heavy data augmentations on all initially generated image patches (a size of 128 × 128 × 128) containing the nodules, e.g., 0-360 degree of random rotation, random zooming in or zooming out, random cropping and flipping.
Development and training of FGP-NET
- JLH and NLST dataset were randomly assigned into one of the following three sets: training set (JLH, 1086 nodules; NLST, 520 nodules) for optimizing network weights, validation set (JLH, 100 nodules; NLST, 100 nodules) for deciding the values of hyperparameters and internal test set (JLH, 100 nodules; NLST, 200 nodules) for evaluating the performance.
- The patients in the training, validation and test sets were exclusive to each of the other data sets.
- To this end, the authors aimed to capture both local and global features and their interactive relationships to better represent the nodule.
- By concatenating the clear attention map distilled by local feature extractors and raw feature maps from early stage of network, FGP-Net was able to keep high resolution details to describe small-sized local features and to use the accurate localization of large ones to guide small ones.
- DenseNet was one of the state-of-art network structures in computer vision.
Validation of FGP-NET
- FGP-Net generated continuous numbers between 0 and 1 for nodule risk stratification, being consistent with the malignancy probability of the nodules, named ‘malignancy score’.
- The corresponding sets were defined as S1, S2, S3 and S4, respectively.
- To keep up with the input of FGP-NET, radiologists were only informed with the precise location of the nodules while blinded to the medical history and pathological results.
- The performance of radiologists was evaluated at the average level and majority level (voting the scores rated by 126 radiologists) when compared with FGP-NET.
- All statistical tests used in this study were 2-sided and a P-value less than .05 was considered significant.
Interpretation of learned features
- Given the black box property of DL, the authors further conducted a two-way feature interpretation to explore whether FGP-NET learned solid and effective features.
- More specifically, T-distributed Stochastic Neighborhood Embedding (t-SNE) 24 and probability heat-map were applied for visualization of global features and local features (Supplementary Information, Feature visualization of FGP-Net).
Evaluation of HONORS
- To validate the performance of HONORS in screening and routine diagnostic scenarios, the authors simulated the its application on NLST test set and multi-center set.
- The precision of the stratified nodules was evaluated using negative predictive value (NPV) and positive predictive value (PPV), respectively.
- The authors further compared the performance between HONORS and 126 radiologists in the incidentally detected nodules using JLH test set (Supplementary information, Comparison of HONORS with radiologists).
- In addition to the two-step way targeted on different scenarios, HONORS can also be realized in a three-step way regardless of scenarios .
- In the first step, FPG-NET at HSen point stratified the benign nodules; in the second step, FPG-NET at HSpe point stratified the lung cancer; and in the third step, FPG-NET at Youden point stratified the rest ambiguous nodules.
- Overview of the study design and results A total of 2,348 pulmonary nodules (1,215 malignant nodules) containing screened and incidentally detected nodules found by chest CT were used to train and evaluate their DL algorithm, FGP-NET .
- The authors further investigated whether the FGP-NET was comparable or even superior to radiologists.
- It was still at a relatively low level even in consultant group (κw < 0.4).
- A total of 18 nodules was misdiagnosed by radiologists’ majority opinion and FGP-NET inconsistently, accounting for 9 each .
Interpretation of Features Learned by FGP-NET
- DL is frequently referred to as a black box—data goes in, decisions come out, but the processes between input and output are opaque 25.
- It is crucial to enable the black box to be opened, and thus, the authors interpreted the global features by using t-SNE which is particularly well suited for the visualization of high-dimensional data.
- To be specific, these attribution regions were mainly located in benign features for those benign nodules.
- Nevertheless, to malignant nodules, regions were predominately situated in malignant features, suggesting that FGP-NET may concentrate on the irregular margin of the nodule and solid component within part-solid nodules .
- These attribution regions were certainly consistent with the visual observation by radiologists.
Proposal of HONORS
- This does not eliminate the role of radiologists in making the final decision.
- Therefore, the authors further proposed HONORS, a novel two-step hierarchical strategy for clinical application of FGP-NET.
- HONORS was applied to NLST test set and multi-center set to appraise its performance.
- Additionally, a three-step way to realize HONORS was also evaluated in this context .
- The authors developed and validated a DL algorithm—FGP-NET that is capable of stratifying pulmonary nodules with great performance and is comparable with a large group of 10 / 20 radiologists.
- The authors study was conceptually practical in clinics because FGP-NET and HONORS were designed to tailor both screening and routine diagnostic scenarios.
- Pyramid structure that the authors harnessed to support FGP-NET is a method for extracting multi-scale features and is quite suitable for feature extraction of pulmonary nodules due to its large size variation 26.
- Huang et al. developed a computer-aided diagnosis approach with a sensitivity of 95% and a specificity of 88% which outperformed three radiologists’ combined reading using 186 nodules from NLST dataset 29.
- 12 / 20 Taken together, the authors proposed HONORS to lay groundwork toward application of the DL-based pulmonary nodule stratification algorithm in the screening and routine diagnostic scenarios.
Did you find this useful? Give us your feedback
Cites methods from "Clinically Applicable Deep Learning..."
...It was finetuned on the basis of Filter-guided Pyramid Network (FGPNET), a novel 3D convolutional network structure designed for the classification of malignant and benign pulmonary nodules in our previous study ....
Related Papers (5)
Frequently Asked Questions (2)
Q1. What have the authors contributed in "Clinically applicable deep learning strategy for pulmonary nodule risk prediction: insights into honors" ?
Wang et al. this paper proposed a clinically applicable DL-based algorithm, Filter-guided pyramid network ( FGP-NET ), and a practical strategy, Hierarchical-Ordered Network-ORiented Strategy ( HONORS ), which involves two steps for two different clinical scenarios ( i.e., screening and routine diagnostic scenarios ).
Q2. What future works have the authors mentioned in the paper "Clinically applicable deep learning strategy for pulmonary nodule risk prediction: insights into honors" ?
To further applied FGP-NET to different clinical settings, a novel two-step strategy—Hierarchical-Ordered Network-ORiented Strategy ( HONORS ) was proposed. Based on FGP-NET, the authors further proposed a two-step strategy—HONORS, which is promising to optimize clinical workflow and realize personalized precise treatment of pulmonary 11 / 20 nodules. Future studies are warranted to prospectively assess the performance and generalizability of the algorithm at a variety of sites in real-world-use scenarios, and determine how the HONORS would impact diagnostic accuracy and clinical workflow. It may have potential to accelerate the process of pulmonary nodule diagnosis and to free doctors, nurses and other healthcare professionals to focus on providing real care for patients.