scispace - formally typeset
Search or ask a question
Posted ContentDOI

A Vital Sign-based Prediction Algorithm for Differentiating COVID-19 Versus Seasonal Influenza in Hospitalized Patients

TL;DR: In this article, a supervised machine learning pipeline was developed and validated to distinguish the two viral infections using the available vital signs and demographic dataset from the first hospital/emergency room encounters of 3,883 patients who had confirmed diagnoses of influenza A/B, COVID-19 or negative laboratory test results.
Abstract: Patients with influenza and SARS-CoV2/Coronavirus disease 2019 (COVID-19) infections have different clinical course and outcomes. We developed and validated a supervised machine learning pipeline to distinguish the two viral infections using the available vital signs and demographic dataset from the first hospital/emergency room encounters of 3,883 patients who had confirmed diagnoses of influenza A/B, COVID-19 or negative laboratory test results. The models were able to achieve an area under the receiver operating characteristic curve (ROC AUC) of at least 97% using our multiclass classifier. The predictive models were externally validated on 15,697 encounters in 3,125 patients available on TrinetX database that contains patient-level data from different healthcare organizations. The influenza vs. COVID-19-positive model had an AUC of 98%, and 92% on the internal and external test sets, respectively. Our study illustrates the potentials of machine-learning models for accurately distinguishing the two viral infections. The code is made available at https://github.com/ynaveena/COVID-19-vs-Influenza and may be have utility as a frontline diagnostic tool to aid healthcare workers in triaging patients once the two viral infections start cocirculating in the communities.

Summary (3 min read)

Main

  • Infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV 2) causing coronavirus disease 2019 (COVID-19) has led to an unprecedented global crisis due to its vigorous transmission, spectrum of respiratory manifestations, and vascular affects [1] [2] [3] .
  • Due to its similar symptomatology, COVID-19 has drawn comparisons to the seasonal influenza epidemic.
  • 5, 6 To help curb this dilemma, front-line providers need the ability to rapidly and accurately triage these patients.
  • With just a few important parameters clinicians can diagnose the patients well before a laboratory diagnosis.
  • In the present investigation, the authors therefore explored the use of machine learning models to differentiate between SARS-CoV 2 and influenza infection using basic officebased clinical variables.

Baseline Characteristics

  • There was higher prevalence of Black/African Americans in the influenza cohort in comparison with other cohorts.
  • They also had higher mean body temperature compared to COVID-negative patients and exhibited an overall higher systolic and diastolic blood pressures than the other two cohorts (p<0.001 for all variables).
  • Patients in the COVID-19-positive and influenza groups had a higher respiratory rate than the COVID-19-negative group (Table 1 ).

TriNetX Cohort

  • The external cohort included a total of 15,697 patient encounters from 3.125 patients with body temperature information available (Supplementary Table 2 ).
  • This subgroup of the external cohort included 6,613 encounters from 1,057 COVID-positive patients and 9,087 encounters from 2,068 influenza patients.
  • The COVID-19-positive group was predominantly male (54%), while the influenza group involved more female (53%) patients.
  • The COVID-19-positive group included more Black/African-American (47%) patients, while the influenza cohort consisted of more White/Caucasian (52%) patients.
  • The vital signs for mean body temperature, heart rate, respiratory rate, and diastolic blood pressure showed a statistical difference between patients with COVID-19 and those with influenza (p <0.0001).

COVID-19 Versus Influenza Infection Prediction at the ED or Hospitalization.

  • Vitals and symptomatic features, which are readily available to providers, in an effort to develop supervised machine learning classifiers that can predict patients who are either COVID-19 positive or negative, while further distinguishing influenza from COVID-19 infection.the authors.
  • The WVU hospitalized patient cohort was randomly divided into a training (80%) and testing set (20%) to develop four different contexts specific XGBoost predictive models.
  • The authors assessed receiver operator characteristic (ROC) area under the curve (AUC) plots, precision, recall and other threshold evaluation metrics to select the best performing model in each case (Tables 3, 4.

SHAP Importance

  • SHAP is a model-agnostic interpretability method that aids in analyzing feature importance based on their impact on the model's output.
  • The additive importance of each feature for the model is calculated over all possible orderings of features.
  • Encounter-related features such as the month of encounter along with reason for visit (Table S4 ) and encounter type also contributed to the most-informative variables for predicting influenza compared to COVID-19 positive patients or influenza vs all other patients .
  • A similar trend of feature importance was also reflected in the SHAP summary plot of the three-way multi-class classifier .
  • Vital signs played a more significant role in distinguishing between influenza and COVID-19 positive encounters through parameters such as body temperature, heart rate and blood pressure.

Model Interpretability

  • The SHAP force plots shown in Similarly, a second set of plots were generated for the COVID-19 positive vs negative classifier .
  • In these plots, Panel A shows an encounter correctly classified as COVID-19 negative while Panel B shows an encounter classified correctly as COVID-19 positive.
  • The authors see in the former that the temperature, reason for visit, lower age and higher value of SPO 2 push the prediction towards COVID-19 negative.
  • With these interpretability methods the authors are able to clearly determine the reasons for the model's output and ensure they can be scrutinized.
  • The insights obtained further corroborate with patterns often observed in COVID-19 patients.

Impact of Vitals on Model Performance

  • Stepwise removal of vitals based on feature importance.
  • Each feature's importance to the construction of the machine learning model was assessed through individually removing each vital sign parameter in the internal validation set.
  • Removal of body temperature significantly decreased models' performance to an accuracy of 74% and F1 score of 68% compared to the initial performance 90% and 89%, respectively.
  • Subsequent removal of other vital signs, including heart rate did not affect model performance drastically.

Inclusion of only one vital sign at a time.

  • The authors also assessed how well the model performs given only one vital is present at a time in the Internal test set.
  • More specifically, when considering the importance of a given vital sign, all other features related to vitals are removed from the internal test set.
  • The demographics and other information was not ablated.
  • In addition to body temperature, heart rate was also seen to contribute to the performance of COVID-19 positive vs negative model.
  • Taken together, these results suggest that of all the vital signs, body temperature, followed by heart rate and SPO 2 could impact the predictive models' performance in discriminating between influenza, COVID-19 positive and negative patient encounters.

External Validation

  • To further assess the generalizability of the predictive models and confirm the stability of the model features at identifying patients positive for COVID-19 or influenza, the authors validated their models using the TriNetX research network dataset external to WVU Medicine.
  • Patients with any missing data related to body temperature were excluded from the analysis.
  • The influenza versus COVID-19 model demonstrated ROC AUC of 92.3% with an accuracy of 85%, and 81% precision at identifying patients that were positive for COVID-19 with 84% recall .
  • These results suggest that while enforcing no missing values in vitals could support a better overall model performance (i.e., AUC ROC 94.3% vs 92.3%), missingness in most of the vitals does not seem to limit its applicability and generalizability.

Discussion

  • This investigation provides multiple machine-learning models to differentiate between COVID-19 positive, -negative and influenza.
  • Further, this is the first machine learning model to leverage a patient population that includes both the initial (February-April) and secondary (May-September) surge of SARS-CoV 2 infections in the United States 17 .
  • While the initial presentation of COVID-19 and influenza appear similar, the number, and combination, of signs and symptoms can help provide better stratification.
  • These samples had nine times as much alveolar capillary microthrombi compared to that of influenza lung tissue and even showed evidence of new vessel growth through a mechanism of intussusceptive angiogenesis 1, 20 .
  • While the WVU cohort includes patients, who have been confirmed to have a negative SARS-CoV 2 test on presentation, the TriNetX dataset does not have this information.

Conclusions

  • Here the authors highlight how machine learning can effectively classify influenza and COVID-19 positive cases through vital signs on clinical presentation.
  • This work is the first step in building a low-cost, robust classification system for the appropriate triage of patients displaying symptoms of a viral respiratory infection.
  • With these algorithms, the identification of proper treatment modalities for both COVID-19 and influenza can be made more rapidly, increasing the effectiveness of patient care.

Did you find this useful? Give us your feedback

Figures (8)

Content maybe subject to copyright    Report

A Vital Sign-based Prediction Algorithm for Differentiating COVID-19 Versus Seasonal
1
Influenza in Hospitalized Patients
2
3
Naveena Yanamala*
1,2
, Nanda H. Krishna
1,2
, Quincy A. Hathaway
1
, Aditya Radhakrishnan
1,2
,
4
Srinidhi Sunkara
1, 2
, Heenaben Patel
1
, Peter Farjo
1
, Brijesh Patel
1
, Partho P Sengupta*
1
.
5
6
Author Affiliations:
7
1
Division of Cardiology, West Virginia University Medicine Heart & Vascular Institute,
8
Morgantown, WV
9
2
Institute for Software Research, School of Computer Science, Carnegie Mellon University,
10
Pittsburgh, PA
11
12
Funding Information: This work is supported in part by funds from the National Science
13
Foundation (NSF: # 1920920) National Institute of General Medical Sciences of the National
14
Institutes of Health under (NIH: #5U54GM104942-04). The content is solely the responsibility
15
of the authors and does not necessarily represent the official views of the National Institutes of
16
Health or National Science Foundation.
17
18
Word count (main text): 3,437 (Limit 3,500)
19
20
Word count (methods): 1,630 (Limit 3,000)
21
22
Address of Correspondence:
23
24
25
26
27
*Partho P. Sengupta, MD, DM
Heart & Vascular Institute
West Virginia University
1 Medical Center Drive,
Morgantown, WV 26506
E-Mail: Partho.Sengupta@wvumedicine.org
Twitter: @ppsengupta1
*Naveena Yanamala, MS, PhD
Heart & Vascular Institute
West Virginia University
1 Medical Center Drive,
Morgantown, WV 26506
E-Mail: naveena.yanamala.m@wvumedicine.org
Twitter: @YanamalaNaveena
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted March 4, 2021. ; https://doi.org/10.1101/2021.01.13.21249540doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

Abstract
28
Patients with influenza and SARS-CoV2/Coronavirus disease 2019 (COVID-19) infections have
29
different clinical course and outcomes. We developed and validated a supervised machine
30
learning pipeline to distinguish the two viral infections using the available vital signs and
31
demographic dataset from the first hospital/emergency room encounters of 3,883 patients who
32
had confirmed diagnoses of influenza A/B, COVID-19 or negative laboratory test results. The
33
models were able to achieve an area under the receiver operating characteristic curve (ROC
34
AUC) of at least 97% using our multiclass classifier. The predictive models were externally
35
validated on 15,697 encounters in 3,125 patients available on TrinetX database that contains
36
patient-level data from different healthcare organizations. The influenza vs. COVID-19-positive
37
model had an AUC of 98%, and 92% on the internal and external test sets, respectively. Our
38
study illustrates the potentials of machine-learning models for accurately distinguishing the two
39
viral infections. The code is made available at https://github.com/ynaveena/COVID-19-vs-
40
Influenza and may be have utility as a frontline diagnostic tool to aid healthcare workers in
41
triaging patients once the two viral infections start cocirculating in the communities.
42
43
Keywords: Seasonal Influenza, Machine learning, Extreme gradient boosting trees.
44
45
46
47
48
49
50
51
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted March 4, 2021. ; https://doi.org/10.1101/2021.01.13.21249540doi: medRxiv preprint

Main
52
Infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV 2) causing
53
coronavirus disease 2019 (COVID-19) has led to an unprecedented global crisis due to its
54
vigorous transmission, spectrum of respiratory manifestations, and vascular affects
1-3
. The
55
etiology of the disease is further complicated by a diverse set of clinical presentations, ranging
56
from asymptomatic to progressive viral pneumonia and mortality
4
. Due to its similar
57
symptomatology, COVID-19 has drawn comparisons to the seasonal influenza epidemic.
5
Both
58
infections commonly present with overlapping symptoms, leading to a clinical dilemma for
59
clinicians as SARS-CoV 2 carries a case-fatality rate up to 30 times that of influenza and infects
60
healthcare workers at a significantly higher rate.
3,6,7
Moreover, the concurrence of epidemics
61
appears imminent as the considerable COVID-19 incidence continues and even a moderate
62
influenza season would result in over 35 million cases and 30,000 deaths.
5,6
To help curb this
63
dilemma, front-line providers need the ability to rapidly and accurately triage these patients.
64
One approach to quickly classifying patients as COVID-19 positive or negative could be
65
through machine learning algorithms. While the use of machine learning has been applied to
66
contact tracing and forecasting during the COVID-19 epidemic
8
, it has only limitedly been
67
explored as a means for accurately predicting COVID-19 infection on clinical presentation. With
68
just a few important parameters clinicians can diagnose the patients well before a laboratory
69
diagnosis. Preliminary work has shown the utility of machine and deep learning algorithms in
70
predicting COVID-19 for patient features
9-11
and on CT examination
12,13
, but there remains a
71
paucity in research showing the capacity of machine learning algorithms in differentiating
72
between COVID-19 and influenza patients.
73
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted March 4, 2021. ; https://doi.org/10.1101/2021.01.13.21249540doi: medRxiv preprint

Vital signs are critical piece of information used in the initial triage of patients with
74
COVID-19 and/or influenza by care coordinators and health-care responders in community
75
urgent care centers or emergency rooms. It is becoming clearer that patient vital signs may
76
present uniquely in SARS-CoV 2 infection
9
, likely as a result of alterations in gas exchange and
77
microvascular changes
14
. In the present investigation, we therefore explored the use of machine
78
learning models to differentiate between SARS-CoV 2 and influenza infection using basic office-
79
based clinical variables. The use of simple ML-based classification may have utility for the rapid
80
identification, triage, and treatment of COVID-19 and influenza positive patients by front-line
81
healthcare workers, which is especially relevant as the influenza season approaches.
82
83
Results
84
WVU Study Cohort
85
Baseline Characteristics
86
The patient cohort included 3883 patients (mean age 52 ± 24 years, 48% males, and 89%
87
White/Caucasian) of whom 747 (19%) tested positive for SARSCoV-2 (COVID-19 positive
88
cohort), 1913 (49%) tested negative for SARSCoV-2 (COVID-19 negative cohort), and 1223
89
(31%) had influenza (Table 1, Figure S1). The majority of the COVID-19 positive and negative
90
patients were older; whereas the influenza cohort was younger (P<0.001). There was higher
91
prevalence of Black/African Americans in the influenza cohort in comparison with other cohorts.
92
COVID-19 positive patients were more obese (p<0.001). They also had higher mean body
93
temperature compared to COVID-negative patients and exhibited an overall higher systolic and
94
diastolic blood pressures than the other two cohorts (p<0.001 for all variables). While the
95
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted March 4, 2021. ; https://doi.org/10.1101/2021.01.13.21249540doi: medRxiv preprint

Influenza cohort had higher mean body temperature, heart rate and oxygen saturations than
96
COVID-19-positive and -negative patients (p<0.001). Patients in the COVID-19-positive and
97
influenza groups had a higher respiratory rate than the COVID-19-negative group (Table 1).
98
Outcomes
99
The overall mortality for the cohort was 6.7%. The crude case fatality rate was 6.8% in
100
the COVID-19-positive and 4.2% in influenza groups, with a 9.5% case fatality rate in the
101
COVID-19 negative group (p<0.001). The COVID-19-positive patients had more than a 3-fold
102
higher rate for ICU admissions than patients with influenza (19.0% vs. 5.7%; p<0.001), but the
103
rate was lower than the COVID-19 negative group (23.2%) (p<0.001). The average age of
104
patients who died during hospitalizations was significantly higher in COVID-19 positive patients
105
(75 ± 14 years) compared to the influenza (69 ± 13years) and COVID-19 negative groups (72 ±
106
14 years) (p=0.02), as presented in the Table 2.
107
108
TriNetX Cohort
109
The external cohort included a total of 15,697 patient encounters from 3.125 patients with
110
body temperature information available (Supplementary Table 2). This subgroup of the external
111
cohort included 6,613 encounters from 1,057 COVID-positive patients and 9,087 encounters
112
from 2,068 influenza patients. The COVID-19-positive group was predominantly male (54%),
113
while the influenza group involved more female (53%) patients. The COVID-19-positive group
114
included more Black/African-American (47%) patients, while the influenza cohort consisted of
115
more White/Caucasian (52%) patients. The vital signs for mean body temperature, heart rate,
116
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted March 4, 2021. ; https://doi.org/10.1101/2021.01.13.21249540doi: medRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: Wearable technology is an emerging method for the early detection of coronavirus disease 2019 (COVID-19) infection as mentioned in this paper , and wearable technology can be used for remote surveillance.

9 citations

Journal ArticleDOI
TL;DR: Evaluating the appropriateness of alerts and physicians’ responses through a detailed medical record review of the medication-related passive alert system found that only 7.3% of the alerts in a subset of 382 alert cases were clinically appropriate.
Abstract: Background Alert fatigue is unavoidable when many irrelevant alerts are generated in response to a small number of useful alerts. It is necessary to increase the effectiveness of the clinical decision support system (CDSS) by understanding physicians’ responses. Objective This study aimed to understand the CDSS and physicians’ behavior by evaluating the clinical appropriateness of alerts and the corresponding physicians’ responses in a medication-related passive alert system. Methods Data on medication-related orders, alerts, and patients’ electronic medical records were analyzed. The analyzed data were generated between August 2019 and June 2020 while the patient was in the emergency department. We evaluated the appropriateness of alerts and physicians’ responses for a subset of 382 alert cases and classified them. Results Of the 382 alert cases, only 7.3% (n=28) of the alerts were clinically appropriate. Regarding the appropriateness of the physicians’ responses about the alerts, 92.4% (n=353) were deemed appropriate. In the classification of alerts, only 3.4% (n=13) of alerts were successfully triggered, and 2.1% (n=8) were inappropriate in both alert clinical relevance and physician’s response. In this study, the override rate was 92.9% (n=355). Conclusions We evaluated the appropriateness of alerts and physicians’ responses through a detailed medical record review of the medication-related passive alert system. An excessive number of unnecessary alerts are generated, because the algorithm operates as a rule base without reflecting the individual condition of the patient. It is important to maximize the value of the CDSS by comprehending physicians’ responses.

1 citations

Journal ArticleDOI
TL;DR: In this paper , the early detection of Influenza disease among all age groups is done using various machine learning techniques, and ensemble-based stacked algorithms are implemented on the whole data set.
Abstract: Across the world, the seasonal disease influenza is a respiratory illness that impacts all age groups in many ways. Its symptoms are fever, chills, aches, pains, headaches, fatigue, cough, and weakness. Seasonal influenza can cause mild to severe illness and lead to death at times. The task of early detection of influenza is an important research area these days. Various studies show that machine learning techniques have attracted many researchers' attention to the early detection of influenza disease. In this paper, early detection of Influenza disease among all age groups is done using various machine learning techniques. Influenza Research Database and the Human Surveillance Records data sets are used. Data analysis is undertaken, and ensemble-based stacked algorithms are implemented on the whole data set. The performance of different models has been evaluated using different performance metrics. Overall, the study proposes efficient machine learning models that can be implemented to provide a cheaper and quicker diagnostic tool for detecting influenza.
References
More filters
Journal ArticleDOI
TL;DR: Human airway epithelial cells were used to isolate a novel coronavirus, named 2019-nCoV, which formed a clade within the subgenus sarbecovirus, Orthocoronavirinae subfamily, which is the seventh member of the family of coronaviruses that infect humans.
Abstract: In December 2019, a cluster of patients with pneumonia of unknown cause was linked to a seafood wholesale market in Wuhan, China. A previously unknown betacoronavirus was discovered through the use of unbiased sequencing in samples from patients with pneumonia. Human airway epithelial cells were used to isolate a novel coronavirus, named 2019-nCoV, which formed a clade within the subgenus sarbecovirus, Orthocoronavirinae subfamily. Different from both MERS-CoV and SARS-CoV, 2019-nCoV is the seventh member of the family of coronaviruses that infect humans. Enhanced surveillance and further investigation are ongoing. (Funded by the National Key Research and Development Program of China and the National Major Project for Control and Prevention of Infectious Disease in China.).

21,455 citations

Journal ArticleDOI
TL;DR: There is evidence that human-to-human transmission has occurred among close contacts since the middle of December 2019 and considerable efforts to reduce transmission will be required to control outbreaks if similar dynamics apply elsewhere.
Abstract: Background The initial cases of novel coronavirus (2019-nCoV)–infected pneumonia (NCIP) occurred in Wuhan, Hubei Province, China, in December 2019 and January 2020. We analyzed data on the...

13,101 citations

Journal ArticleDOI
TL;DR: In this small series, vascular angiogenesis distinguished the pulmonary pathobiology of Covid-19 from that of equally severe influenza virus infection.
Abstract: Background Progressive respiratory failure is the primary cause of death in the coronavirus disease 2019 (Covid-19) pandemic. Despite widespread interest in the pathophysiology of the dise...

4,134 citations

Journal ArticleDOI
TL;DR: An explanation method for trees is presented that enables the computation of optimal local explanations for individual predictions, and the authors demonstrate their method on three medical datasets.
Abstract: Tree-based machine learning models such as random forests, decision trees and gradient boosted trees are popular nonlinear predictive models, yet comparatively little attention has been paid to explaining their predictions. Here we improve the interpretability of tree-based models through three main contributions. (1) A polynomial time algorithm to compute optimal explanations based on game theory. (2) A new type of explanation that directly measures local feature interaction effects. (3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to (1) identify high-magnitude but low-frequency nonlinear mortality risk factors in the US population, (2) highlight distinct population subgroups with shared risk characteristics, (3) identify nonlinear interaction effects among risk factors for chronic kidney disease and (4) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model’s performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains. Tree-based machine learning models are widely used in domains such as healthcare, finance and public services. The authors present an explanation method for trees that enables the computation of optimal local explanations for individual predictions, and demonstrate their method on three medical datasets.

2,548 citations

Journal ArticleDOI
TL;DR: A new model for automatic COVID-19 detection using raw chest X-ray images is presented and can be employed to assist radiologists in validating their initial screening, and can also be employed via cloud to immediately screen patients.

1,868 citations

Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions in "A vital sign-based prediction algorithm for differentiating covid-19 versus seasonal influenza in hospitalized patients" ?

In this paper, a supervised machine learning pipeline was developed and validated to distinguish the two viral infections using the available vital signs and demographic dataset from the first hospital/emergency room encounters of 3,883 patients who had confirmed diagnoses of influenza A/B, COVID-19 or negative laboratory test results.