Diagnostic Accuracy and Failure Mode Analysis of a Deep Learning Algorithm for the Detection of Cervical Spine Fractures.
TLDR
In this article, the authors evaluated the performance of an artificial intelligence decision support system, Aidoc, for the detection of cervical spinal fractures on non-contrast cervical spine CT scans and to conduct a failure mode analysis to identify areas of poor performance.Abstract:
BACKGROUND AND PURPOSE: Artificial intelligence decision support systems are a rapidly growing class of tools to help manage ever-increasing imaging volumes. The aim of this study was to evaluate the performance of an artificial intelligence decision support system, Aidoc, for the detection of cervical spinal fractures on noncontrast cervical spine CT scans and to conduct a failure mode analysis to identify areas of poor performance. MATERIALS AND METHODS: This retrospective study included 1904 emergent noncontrast cervical spine CT scans of adult patients (60 [SD, 22] years, 50.3% men). The presence of cervical spinal fracture was determined by Aidoc and an attending neuroradiologist; discrepancies were independently adjudicated. Algorithm performance was assessed by calculation of the diagnostic accuracy, and a failure mode analysis was performed. RESULTS: Aidoc and the neuroradiologist’s interpretation were concordant in 91.5% of cases. Aidoc correctly identified 67 of 122 fractures (54.9%) with 106 false-positive flagged studies. Diagnostic performance was calculated as the following: sensitivity, 54.9% (95% CI, 45.7%–63.9%); specificity, 94.1% (95% CI, 92.9%–95.1%); positive predictive value, 38.7% (95% CI, 33.1%–44.7%); and negative predictive value, 96.8% (95% CI, 96.2%–97.4%). Worsened performance was observed in the detection of chronic fractures; differences in diagnostic performance were not altered by study indication or patient characteristics. CONCLUSIONS: We observed poor diagnostic accuracy of an artificial intelligence decision support system for the detection of cervical spine fractures. Many similar algorithms have also received little or no external validation, and this study raises concerns about their generalizability, utility, and rapid pace of deployment. Further rigorous evaluations are needed to understand the weaknesses of these tools before widespread implementation.read more
Citations
More filters
Journal ArticleDOI
Methods for Clinical Evaluation of Artificial Intelligence Algorithms for Medical Diagnosis.
Seong Ho Park,Kyu-Rok Han,Hye-Young Jang,Ji Eun Park,Jung Goo Lee,Dong Wook Kim,Jae Won Choi +6 more
TL;DR: In this paper , the importance of external testing of AI algorithms and strategies for conducting the external testing effectively, various metrics and graphical methods for evaluating the AI performance as well as essential methodological points to note in using and interpreting them are discussed.
Journal ArticleDOI
Current development and prospects of deep learning in spine image analysis: a literature review
TL;DR: A systematic literature search was conducted in the PubMed and Web of Science databases using the keywords "deep learning" and "spine". Date ranges used to conduct the search were from 1 January, 2015 to 20 March, 2021 as discussed by the authors .
Journal ArticleDOI
Artificial Intelligence in "Code Stroke"-A Paradigm Shift: Do Radiologists Need to Change Their Practice?
Achala Vagal,Luca Saba +1 more
TL;DR: Artificial intelligence (AI) applications in stroke care are being used currently in clinical practice with multiple Food and Drug Administration–approved and Conformité Européenne mark–certified commercially available platforms and are no longer a mere academic exercise.
Journal ArticleDOI
Artificial intelligence CAD tools in trauma imaging: a scoping review from the American Society of Emergency Radiology (ASER) AI/ML Expert Panel
David Dreizin,Pedro V. Staziaki,Garvit D. Khatri,Nicholas M. Beckmann,Zhaoyong Feng,Yuanyuan Liang,Zachary S. DelProposto,Maximiliano Klug,J. S. Spann,Nathan Sarkar,Yunting Fu +10 more
References
More filters
Journal ArticleDOI
The three column spine and its significance in the classification of acute thoracolumbar spinal injuries
TL;DR: The author introduces the concept of middle column or middle osteoligamentouscomplex between the traditionally recognized posterior ligamentous complex and the anterior longitudinal ligament, and the correlation between the three-column system, the classification, the stability, the therapeutic indications are presented.
Journal ArticleDOI
Validity of a set of clinical criteria to rule out injury to the cervical spine in patients with blunt trauma. National Emergency X-Radiography Utilization Study Group.
TL;DR: A prospective, observational study of a set of clinical criteria that can identify patients who have an extremely low probability of injury and who consequently have no need for imaging studies, which identified all but 8 of the 818 patients who had cervical-spine injury.
Journal ArticleDOI
Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.
TL;DR: Pneumonia-screening CNNs robustly identified hospital system and department within a hospital, which can have large differences in disease burden and may confound predictions.
Journal ArticleDOI
A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis
Xiaoxuan Liu,Livia Faes,Aditya Kale,Siegfried K Wagner,Dun Jack Fu,Alice Bruynseels,Thushika Mahendiran,Gabriella Moraes,Mohith Shamdas,Christoph Kern,Christoph Kern,Joseph R. Ledsam,Martin Schmid,Konstantinos Balaskas,Konstantinos Balaskas,Eric J. Topol,Lucas M. Bachmann,Pearse A. Keane,Alastair K Denniston +18 more
TL;DR: A major finding of the review is that few studies presented externally validated results or compared the performance of deep learning models and health-care professionals using the same sample, which limits reliable interpretation of the reported diagnostic accuracy.
Journal ArticleDOI
Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction
Seong Ho Park,Kyunghwa Han +1 more
TL;DR: Key methodology points involved in a clinical evaluation of artificial intelligence technology for use in medicine, especially high-dimensional or overparameterized diagnostic or predictive models in which artificial deep neural networks are used are explained.