Diagnostic Accuracy and Failure Mode Analysis of a Deep Learning Algorithm for the Detection of Cervical Spine Fractures.

doi:10.3174/AJNR.A7179

Open AccessJournal ArticleDOI

Diagnostic Accuracy and Failure Mode Analysis of a Deep Learning Algorithm for the Detection of Cervical Spine Fractures.

Andrew F. Voter, +3 more

- 11 Jun 2021 -

American Journal of Neuroradiology

- Vol. 42, Iss: 8, pp 1550-1556

TLDR

In this article, the authors evaluated the performance of an artificial intelligence decision support system, Aidoc, for the detection of cervical spinal fractures on non-contrast cervical spine CT scans and to conduct a failure mode analysis to identify areas of poor performance.

Abstract:

BACKGROUND AND PURPOSE: Artificial intelligence decision support systems are a rapidly growing class of tools to help manage ever-increasing imaging volumes. The aim of this study was to evaluate the performance of an artificial intelligence decision support system, Aidoc, for the detection of cervical spinal fractures on noncontrast cervical spine CT scans and to conduct a failure mode analysis to identify areas of poor performance. MATERIALS AND METHODS: This retrospective study included 1904 emergent noncontrast cervical spine CT scans of adult patients (60 [SD, 22] years, 50.3% men). The presence of cervical spinal fracture was determined by Aidoc and an attending neuroradiologist; discrepancies were independently adjudicated. Algorithm performance was assessed by calculation of the diagnostic accuracy, and a failure mode analysis was performed. RESULTS: Aidoc and the neuroradiologist’s interpretation were concordant in 91.5% of cases. Aidoc correctly identified 67 of 122 fractures (54.9%) with 106 false-positive flagged studies. Diagnostic performance was calculated as the following: sensitivity, 54.9% (95% CI, 45.7%–63.9%); specificity, 94.1% (95% CI, 92.9%–95.1%); positive predictive value, 38.7% (95% CI, 33.1%–44.7%); and negative predictive value, 96.8% (95% CI, 96.2%–97.4%). Worsened performance was observed in the detection of chronic fractures; differences in diagnostic performance were not altered by study indication or patient characteristics. CONCLUSIONS: We observed poor diagnostic accuracy of an artificial intelligence decision support system for the detection of cervical spine fractures. Many similar algorithms have also received little or no external validation, and this study raises concerns about their generalizability, utility, and rapid pace of deployment. Further rigorous evaluations are needed to understand the weaknesses of these tools before widespread implementation.

Diagnostic Accuracy and Failure Mode Analysis of a Deep Learning Algorithm for the Detection of Cervical Spine Fractures.

Citations

Methods for Clinical Evaluation of Artificial Intelligence Algorithms for Medical Diagnosis.

Current development and prospects of deep learning in spine image analysis: a literature review

Artificial Intelligence in "Code Stroke"-A Paradigm Shift: Do Radiologists Need to Change Their Practice?

Artificial intelligence CAD tools in trauma imaging: a scoping review from the American Society of Emergency Radiology (ASER) AI/ML Expert Panel

A survey of ASER members on artificial intelligence in emergency radiology: trends, perceptions, and expectations

References

The three column spine and its significance in the classification of acute thoracolumbar spinal injuries

Validity of a set of clinical criteria to rule out injury to the cervical spine in patients with blunt trauma. National Emergency X-Radiography Utilization Study Group.

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.

A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis

Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction

Related Papers (5)

External Validation of University of Wisconsin's Clinical Criteria for Obtaining Maxillofacial Computed Tomography in Trauma.

[Acute abdominal pain--standardized findings as diagnostic support. Results of a prospective multicenter intervention study and testing of a computer-assisted diagnosis system]

Diagnostic Management Strategies for Adults and Children with Minor Head Injury: A Systematic Review and an Economic Evaluation

A systematic review of the diagnostic accuracy of artificial intelligence-based computer programs to analyze chest x-rays for pulmonary tuberculosis.

Deep Learning Under Scrutiny: Performance Against Health Care Professionals in Detecting Diseases from Medical Imaging - Systematic Review and Meta-Analysis