scispace - formally typeset

Posted ContentDOI

Predicting individual risk for COVID19 complications using EMR data

05 Jun 2020-medRxiv (Cold Spring Harbor Laboratory Press)-

TL;DR: Two approaches are described that can effectively identify patients at high-risk for complication, thus allowing optimization of resources and more focused follow up and early triage these patients if once symptoms worsen.

AbstractBackground The global pandemic of COVID-19 has challenged healthcare organizations and caused numerous deaths and hospitalizations worldwide. The need for data-based decision support tools for many aspects of controlling and treating the disease is evident but has been hampered by the scarcity of real-world reliable data. Here we describe two approaches: a. the use of an existing EMR-based model for predicting complications due to influenza combined with available epidemiological data to create a model that identifies individuals at high risk to develop complications due to COVID-19 and b. a preliminary model that is trained using existing real world COVID-19 data. Methods We have utilized the computerized data of Maccabi Healthcare Services a 2.3 million member state-mandated health organization in Israel. The age and sex matched matrix used for training the XGBoost ILI-based model included, circa 690,000 rows and 900 features. The available dataset for COVID-based model included a total 2137 SARS-CoV-2 positive individuals who were either not hospitalized (n = 1658), or hospitalized and marked as mild (n = 332), or as having moderate (n = 83) or severe (n = 64) complications. Findings The AUC of our models and the priors on the 2137 COVID-19 patients for predicting moderate and severe complications as cases and all other as controls, the AUC for the ILI-based model was 0.852[0.824–0.879] for the COVID19-based model – 0.872[0.847–0.879]. Interpretation These models can effectively identify patients at high-risk for complication, thus allowing optimization of resources and more focused follow up and early triage these patients if once symptoms worsen. Funding There was no funding for this study Research in context Evidence before this study We have search PubMed for coronavirus[MeSH Major Topic] AND the following MeSH terms: risk score, predictive analytics, algorithm, predictive analytics. Only few studies were found on predictive analytics for developing COVID19 complications using real-world data. Many of the relevant works were based on self-reported information and are therefore difficult to implement at large scale and without patient or physician participation. Added value of this study We have described two models for assessing risk of COVID-19 complications and mortality, based on EMR data. One model was derived by combining a machine-learning model for influenza-complications with epidemiological data for age and sex dependent mortality rates due to COVID-19. The other was directly derived from initial COVID-19 complications data. Implications of all the available evidence The developed models may effectively identify patients at high-risk for developing COVID19 complications. Implementing such models into operational data systems may support COVID-19 care workflows and assist in triaging patients.

Topics: Predictive analytics (51%)

Summary (2 min read)

Introduction

  • Since January 2020, the COVID-19 pandemic has become a global emergency.
  • Healthcare organizations and governments, worldwide, are strained due to shortage of resources and the need to make timely decisions based on very little reliable data.
  • These decisions include – who to test, how to treat positive cases, how to manage social distancing and reach-out to population at risk, contact tracing, and more.
  • Many of these decisions could benefit from decision support tools based on EMR and additional data sources, such as geospatial information.
  • Unfortunately, accurate data-driven tools are still difficult to develop due to the limited availability of COVID19 patients’ data with historical EMR records.

Settings

  • The models were trained using data from Maccabi Health Service (MHS) – a large Israeli HMO with a central EMR database containing longitudinal data for 2 million active individuals each year between 2010 and 2018.
  • The data included full EMR information - demographics (e.g. age and sex), behavioral info (smoking status), vital signs, lab test results, diagnoses and procedures (using the International Classification of Diseases 9 th version ), medication prescriptions and purchases, and hospital admissions (dates and departments only).

Analytic approach

  • Since the number of in MHS members who are positive for SARS-CoV-2 is relatively low, and the data available is biased due to the current limitations of tests and challenges of data collection and curation, the authors have therefore chosen to test two complimentary approaches.
  • First, the authors use a proxy model that they derived for identifying patients with high risk of developing complications due to influenza and apply some required adjustments.
  • It is already apparent that both diseases have common risk factors for developing complications.
  • Following these differences, the authors modified the ILIbased model and forced it to ignore age and sex as risk factors, and then used Bayesian correction to add these risk factors using external priors.
  • For the training COVID-19ased model, the authors used information on SARS-CoV-2 positive individuals aged 19 or above within the MHS population, as well as information regarding hospitalization and in-hospital complications.

Model Derivation

  • For training the ILI-based model, a training set of all MHS members at September 1 st of every calendar year who were not vaccinated during the following flu-season.
  • The authors marked them as cases if they were diagnosed with ILI followed by complications within 3 months, and controls if otherwise.
  • Bins were matched for age (5yegendar groups) and sex.
  • To combine the prediction of the calibrated model with age and sex priors for complications, the authors used the following formula – 𝑃𝐶𝑜𝑚𝑏𝑖𝑛𝑒𝑑 = 𝑃𝑀𝑜𝑑𝑒𝑙𝑃𝑃𝑟𝑖𝑜𝑟 𝑃𝑀𝑜𝑑𝑒𝑙𝑃𝑃𝑟𝑖𝑜𝑟 + 𝑂𝑑𝑑𝑠 × (1 − 𝑃𝑀𝑜𝑑𝑒𝑙)(1 − 𝑃𝑃𝑟𝑖𝑜𝑟).

COVID-19-based model

  • The authors used the definitions of the Israel Ministry of Health for COVID19 complications: moderate (defined as pneumonia, with one of the following: respiratory rate above 30 breaths per minute, Respiratory distress, or oxygen saturation below 90%) or severe (pneumonia accompanied by sepsis, shock, ARDS or death).
  • The authors then created a vector that of features per each individual, including risk factors and underlying conditions .
  • The authors used XGBoost on the features matrix to learn a COVID-19 complications predictor based on these features.

Performance evaluation

  • Given that real world data on COVID-19 are currently limited, it is difficult to evaluate the performance of their models.
  • The authors report here several methods they have used to estimate the value of the models.
  • The authors examined the excess risk of underlying health conditions, compared to information from the CDC [https://www.cdc.gov/mmwr/volumes/69/wr/mm6913e2.htm#F1_down].
  • Evaluating performance of the model on initial COVID19 complications records.
  • Lift was evaluated by calculating the average prediction over the population with the underlying conditions, and comparing to the average prediction over a reference population.

Results

  • The age and sex matched matrix used for training the XGBoost model included, after feature selection, about 690,000 rows and 900 features (compared to about 790,000 rows and 1584 features for the non-matched model).
  • The available dataset included a total 2137 SARS-CoV-2 positive individuals who were either not hospitalized (n=1658), or hospitalized and marked as mild (n=332), or as having moderate (n=83) or severe (n=64) complications.
  • Individuals who were hospitalized but not assigned severity level were excluded.
  • All individuals were linked to their MHS medical record in order to generate the features matrix.

Performance Evaluation

  • The AUC of the full (non-matched) model for predicting influenza-complication was 0.744, the matched model AUC was 0.726.
  • It is reasonable to suspect due to the small size of the dataset that the latter model is too specific to the MHS and less generalizable compared to the ILI-based model.
  • The medical staff are the first to contact confirmed COVID-19 patients, question them and decide on the appropriate treatment facility based on their symptoms and overall medical assessment.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

Predicting individual risk for COVID19 complications using EMR data
Yaron Kinar
1
, PhD, Alon Lanyado
1
, BSc, Avi Shoshan
1
, MSc, Rachel Yesharim
1
, BSc, Tamar
Domany
1
, BSc, Varda Shalev MD
2,3
, Gabriel Chodick
2,3
PhD
1
Medial EarlySign, Hod Hasharon, Israel
2
Faculty of Medicine, Tel Aviv University, Israel
3
Maccabi Institute for Research & Innovation, Israel
Corresponding author
Prof. Gabriel Chodick, PhD
Maccabi Institute for Research & Innovation, Israel
Keufman St. 4, Tel Aviv Israel 68125
Tel: 972-3-514-3755
Fax: 972-73-231-2813
Email: hodk_g@mac.org.il
Word count: 2101
No. of tables:2
No. of figures:0
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 5, 2020. ; https://doi.org/10.1101/2020.06.03.20121574doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

Abstract
Background: The global pandemic of COVID-19 has challenged healthcare organizations and
caused numerous deaths and hospitalizations worldwide. The need for data-based decision
support tools for many aspects of controlling and treating the disease is evident but has been
hampered by the scarcity of real-world reliable data. Here we describe two approaches: a. the use
of an existing EMR-based model for predicting complications due to influenza combined
with available epidemiological data to create a model that identifies individuals at high risk to
develop complications due to COVID-19 and b. a preliminary model that is trained using
existing real world COVID-19 data.
Methods: We have utilized the computerized data of Maccabi Healthcare Services a 2.3 million
member state-mandated health organization in Israel. The age and sex matched matrix used for
training the XGBoost ILI-based model included, circa 690,000 rows and 900 features. The
available dataset for COVID-based model included a total 2137 SARS-CoV-2 positive
individuals who were either not hospitalized (n=1658), or hospitalized and marked as mild
(n=332), or as having moderate (n=83) or severe (n=64) complications.
Findings: The AUC of our models and the priors on the 2137 COVID-19 patients for predicting
moderate and severe complications as cases and all other as controls, the AUC for the ILI-based
model was 0.852[0.824-0.879] for the COVID19-based model - 0.872[0.847-0.879]..
Interpretation: These models can effectively identify patients at high-risk for complication, thus
allowing optimization of resources and more focused follow up and early triage these patients if
once symptoms worsen.
Funding: There was no funding for this study
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 5, 2020. ; https://doi.org/10.1101/2020.06.03.20121574doi: medRxiv preprint

Research in context
Evidence before this study
We have search PubMed for coronavirus[MeSH Major Topic] AND the following MeSH terms:
risk score, predictive analytics, algorithm, predictive analytics. Only few studies were found on
predictive analytics for developing COVID19 complications using real-world data. Many of the
relevant works were based on self-reported information and are therefore difficult to implement
at large scale and without patient or physician participation.
Added value of this study
We have described two models for assessing risk of COVID-19 complications and mortality,
based on EMR data. One model was derived by combining a machine-learning model for
influenza-complications with epidemiological data for age and sex dependent mortality rates due
to COVID-19. The other was directly derived from initial COVID-19 complications data.
Implications of all the available evidence
The developed models may effectively identify patients at high-risk for developing COVID19
complications. Implementing such models into operational data systems may support COVID-19
care workflows and assist in triaging patients.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 5, 2020. ; https://doi.org/10.1101/2020.06.03.20121574doi: medRxiv preprint

Introduction
Since January 2020, the COVID-19 pandemic has become a global emergency. Healthcare
organizations and governments, worldwide, are strained due to shortage of resources and the
need to make timely decisions based on very little reliable data. These decisions include who to
test, how to treat positive cases, how to manage social distancing and reach-out to population at
risk, contact tracing, and more. Many of these decisions could benefit from decision support
tools based on EMR and additional data sources, such as geospatial information. Unfortunately,
accurate data-driven tools are still difficult to develop due to the limited availability of COVID-
19 patients data with historical EMR records. Many of the relevant works
1-3
describe risk
factors and the tools already developed
4,5
are based on self-reported information and are
therefore difficult to implement at large scale and without patient or physician participation.
Here, we describe two approaches and tools to assess the individual risk of developing COVID-
19 complications based on medical records: a model developed by combining a machine-
learning approach for influenza-like illness (ILI) to be used as a proxy model for COVID-19 and
a second model using data on COVID-19 patients.
Methods
Settings
The models were trained using data from Maccabi Health Service (MHS) a large Israeli HMO
with a central EMR database containing longitudinal data for 2 million active individuals each
year between 2010 and 2018. The data included full EMR information - demographics (e.g. age
and sex), behavioral info (smoking status), vital signs, lab test results, diagnoses and procedures
(using the International Classification of Diseases 9
th
version ), medication prescriptions and
purchases, and hospital admissions (dates and departments only).
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 5, 2020. ; https://doi.org/10.1101/2020.06.03.20121574doi: medRxiv preprint

Analytic approach
Since the number of in MHS members who are positive for SARS-CoV-2 is relatively low, and
the data available is biased due to the current limitations of tests and challenges of data collection
and curation, we have therefore chosen to test two complimentary approaches. First, we use a
proxy model that we derived for identifying patients with high risk of developing complications
due to influenza and apply some required adjustments. Although Influenza and COVID-19 are
clearly very different diseases
6
, it is already apparent that both diseases have common risk
factors for developing complications. However, the initial epidemiological data for COVID-19
[China CFR, NYC CFR] already show some major differences between the two diseases
primarily in the effect of increased age on the risk of complications (which seems much stronger
for COVID-19) and the much higher risk among men for COVID-19 complications and
mortality, a trend less evident in Influenza (Another difference is seasonality which is clear for
influenza and less evident for COVID-19). Following these differences, we modified the ILI-
based model and forced it to ignore age and sex as risk factors, and then used Bayesian
correction to add these risk factors using external priors.
For the training COVID-19ased model, we used information on SARS-CoV-2 positive
individuals aged 19 or above within the MHS population, as well as information regarding
hospitalization and in-hospital complications. As an initial prior we used the information based
on COVID-19 mortality available from China
[https://www.worldometers.info/coronavirus/coronavirus-age-sex-demographics/] as proxy for
complications probabilities (appendix table 1). Fatality rate by sex is given in appendix table 2.
Due to the over-representation of women among the elderly, we had to replace the 1:1.65 ratio of
female-to-male risk with a higher 1:2 ratio per age group, as shown in appendix table 3.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted June 5, 2020. ; https://doi.org/10.1101/2020.06.03.20121574doi: medRxiv preprint

Citations
More filters

01 Jan 2020
TL;DR: Prolonged viral shedding provides the rationale for a strategy of isolation of infected patients and optimal antiviral interventions in the future.
Abstract: Summary Background Since December, 2019, Wuhan, China, has experienced an outbreak of coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Epidemiological and clinical characteristics of patients with COVID-19 have been reported but risk factors for mortality and a detailed clinical course of illness, including viral shedding, have not been well described. Methods In this retrospective, multicentre cohort study, we included all adult inpatients (≥18 years old) with laboratory-confirmed COVID-19 from Jinyintan Hospital and Wuhan Pulmonary Hospital (Wuhan, China) who had been discharged or had died by Jan 31, 2020. Demographic, clinical, treatment, and laboratory data, including serial samples for viral RNA detection, were extracted from electronic medical records and compared between survivors and non-survivors. We used univariable and multivariable logistic regression methods to explore the risk factors associated with in-hospital death. Findings 191 patients (135 from Jinyintan Hospital and 56 from Wuhan Pulmonary Hospital) were included in this study, of whom 137 were discharged and 54 died in hospital. 91 (48%) patients had a comorbidity, with hypertension being the most common (58 [30%] patients), followed by diabetes (36 [19%] patients) and coronary heart disease (15 [8%] patients). Multivariable regression showed increasing odds of in-hospital death associated with older age (odds ratio 1·10, 95% CI 1·03–1·17, per year increase; p=0·0043), higher Sequential Organ Failure Assessment (SOFA) score (5·65, 2·61–12·23; p Interpretation The potential risk factors of older age, high SOFA score, and d-dimer greater than 1 μg/mL could help clinicians to identify patients with poor prognosis at an early stage. Prolonged viral shedding provides the rationale for a strategy of isolation of infected patients and optimal antiviral interventions in the future. Funding Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences; National Science Grant for Distinguished Young Scholars; National Key Research and Development Program of China; The Beijing Science and Technology Project; and Major Projects of National Science and Technology on New Drug Creation and Development.

536 citations


Journal ArticleDOI
TL;DR: Externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons were developed and established model interpretability to identify and rank variables that drive model predictions.
Abstract: Background: COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. Objective: The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. Methods: We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19–positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. Results: Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. Conclusions: We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.

61 citations


References
More filters

Journal ArticleDOI
TL;DR: During the first 2 months of the current outbreak, Covid-19 spread rapidly throughout China and caused varying degrees of illness, and patients often presented without fever, and many did not have abnormal radiologic findings.
Abstract: Background Since December 2019, when coronavirus disease 2019 (Covid-19) emerged in Wuhan city and rapidly spread throughout China, data have been needed on the clinical characteristics of...

16,855 citations


Journal ArticleDOI
Abstract: Summary Background Since December, 2019, Wuhan, China, has experienced an outbreak of coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Epidemiological and clinical characteristics of patients with COVID-19 have been reported but risk factors for mortality and a detailed clinical course of illness, including viral shedding, have not been well described. Methods In this retrospective, multicentre cohort study, we included all adult inpatients (≥18 years old) with laboratory-confirmed COVID-19 from Jinyintan Hospital and Wuhan Pulmonary Hospital (Wuhan, China) who had been discharged or had died by Jan 31, 2020. Demographic, clinical, treatment, and laboratory data, including serial samples for viral RNA detection, were extracted from electronic medical records and compared between survivors and non-survivors. We used univariable and multivariable logistic regression methods to explore the risk factors associated with in-hospital death. Findings 191 patients (135 from Jinyintan Hospital and 56 from Wuhan Pulmonary Hospital) were included in this study, of whom 137 were discharged and 54 died in hospital. 91 (48%) patients had a comorbidity, with hypertension being the most common (58 [30%] patients), followed by diabetes (36 [19%] patients) and coronary heart disease (15 [8%] patients). Multivariable regression showed increasing odds of in-hospital death associated with older age (odds ratio 1·10, 95% CI 1·03–1·17, per year increase; p=0·0043), higher Sequential Organ Failure Assessment (SOFA) score (5·65, 2·61–12·23; p Interpretation The potential risk factors of older age, high SOFA score, and d-dimer greater than 1 μg/mL could help clinicians to identify patients with poor prognosis at an early stage. Prolonged viral shedding provides the rationale for a strategy of isolation of infected patients and optimal antiviral interventions in the future. Funding Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences; National Science Grant for Distinguished Young Scholars; National Key Research and Development Program of China; The Beijing Science and Technology Project; and Major Projects of National Science and Technology on New Drug Creation and Development.

15,279 citations


Proceedings ArticleDOI
13 Aug 2016
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

10,428 citations


Journal ArticleDOI
TL;DR: Although high fever was associated with the development of ARDS, it was also associated with better outcomes among patients with ARDS and treatment with methylprednisolone may be beneficial for patients who develop ARDS.
Abstract: Importance Coronavirus disease 2019 (COVID-19) is an emerging infectious disease that was first reported in Wuhan, China, and has subsequently spread worldwide. Risk factors for the clinical outcomes of COVID-19 pneumonia have not yet been well delineated. Objective To describe the clinical characteristics and outcomes in patients with COVID-19 pneumonia who developed acute respiratory distress syndrome (ARDS) or died. Design, Setting, and Participants Retrospective cohort study of 201 patients with confirmed COVID-19 pneumonia admitted to Wuhan Jinyintan Hospital in China between December 25, 2019, and January 26, 2020. The final date of follow-up was February 13, 2020. Exposures Confirmed COVID-19 pneumonia. Main Outcomes and Measures The development of ARDS and death. Epidemiological, demographic, clinical, laboratory, management, treatment, and outcome data were also collected and analyzed. Results Of 201 patients, the median age was 51 years (interquartile range, 43-60 years), and 128 (63.7%) patients were men. Eighty-four patients (41.8%) developed ARDS, and of those 84 patients, 44 (52.4%) died. In those who developed ARDS, compared with those who did not, more patients presented with dyspnea (50 of 84 [59.5%] patients and 30 of 117 [25.6%] patients, respectively [difference, 33.9%; 95% CI, 19.7%-48.1%]) and had comorbidities such as hypertension (23 of 84 [27.4%] patients and 16 of 117 [13.7%] patients, respectively [difference, 13.7%; 95% CI, 1.3%-26.1%]) and diabetes (16 of 84 [19.0%] patients and 6 of 117 [5.1%] patients, respectively [difference, 13.9%; 95% CI, 3.6%-24.2%]). In bivariate Cox regression analysis, risk factors associated with the development of ARDS and progression from ARDS to death included older age (hazard ratio [HR], 3.26; 95% CI 2.08-5.11; and HR, 6.17; 95% CI, 3.26-11.67, respectively), neutrophilia (HR, 1.14; 95% CI, 1.09-1.19; and HR, 1.08; 95% CI, 1.01-1.17, respectively), and organ and coagulation dysfunction (eg, higher lactate dehydrogenase [HR, 1.61; 95% CI, 1.44-1.79; and HR, 1.30; 95% CI, 1.11-1.52, respectively] and D-dimer [HR, 1.03; 95% CI, 1.01-1.04; and HR, 1.02; 95% CI, 1.01-1.04, respectively]). High fever (≥39 °C) was associated with higher likelihood of ARDS development (HR, 1.77; 95% CI, 1.11-2.84) and lower likelihood of death (HR, 0.41; 95% CI, 0.21-0.82). Among patients with ARDS, treatment with methylprednisolone decreased the risk of death (HR, 0.38; 95% CI, 0.20-0.72). Conclusions and Relevance Older age was associated with greater risk of development of ARDS and death likely owing to less rigorous immune response. Although high fever was associated with the development of ARDS, it was also associated with better outcomes among patients with ARDS. Moreover, treatment with methylprednisolone may be beneficial for patients who develop ARDS.

4,614 citations


Journal ArticleDOI
07 Apr 2020-BMJ
TL;DR: Proposed models for covid-19 are poorly reported, at high risk of bias, and their reported performance is probably optimistic, according to a review of published and preprint reports.
Abstract: Objective To review and appraise the validity and usefulness of published and preprint reports of prediction models for diagnosing coronavirus disease 2019 (covid-19) in patients with suspected infection, for prognosis of patients with covid-19, and for detecting people in the general population at increased risk of covid-19 infection or being admitted to hospital with the disease. Design Living systematic review and critical appraisal by the COVID-PRECISE (Precise Risk Estimation to optimise covid-19 Care for Infected or Suspected patients in diverse sEttings) group. Data sources PubMed and Embase through Ovid, up to 1 July 2020, supplemented with arXiv, medRxiv, and bioRxiv up to 5 May 2020. Study selection Studies that developed or validated a multivariable covid-19 related prediction model. Data extraction At least two authors independently extracted data using the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modelling studies) checklist; risk of bias was assessed using PROBAST (prediction model risk of bias assessment tool). Results 37 421 titles were screened, and 169 studies describing 232 prediction models were included. The review identified seven models for identifying people at risk in the general population; 118 diagnostic models for detecting covid-19 (75 were based on medical imaging, 10 to diagnose disease severity); and 107 prognostic models for predicting mortality risk, progression to severe disease, intensive care unit admission, ventilation, intubation, or length of hospital stay. The most frequent types of predictors included in the covid-19 prediction models are vital signs, age, comorbidities, and image features. Flu-like symptoms are frequently predictive in diagnostic models, while sex, C reactive protein, and lymphocyte counts are frequent prognostic factors. Reported C index estimates from the strongest form of validation available per model ranged from 0.71 to 0.99 in prediction models for the general population, from 0.65 to more than 0.99 in diagnostic models, and from 0.54 to 0.99 in prognostic models. All models were rated at high or unclear risk of bias, mostly because of non-representative selection of control patients, exclusion of patients who had not experienced the event of interest by the end of the study, high risk of model overfitting, and unclear reporting. Many models did not include a description of the target population (n=27, 12%) or care setting (n=75, 32%), and only 11 (5%) were externally validated by a calibration plot. The Jehi diagnostic model and the 4C mortality score were identified as promising models. Conclusion Prediction models for covid-19 are quickly entering the academic literature to support medical decision making at a time when they are urgently needed. This review indicates that almost all pubished prediction models are poorly reported, and at high risk of bias such that their reported predictive performance is probably optimistic. However, we have identified two (one diagnostic and one prognostic) promising models that should soon be validated in multiple cohorts, preferably through collaborative efforts and data sharing to also allow an investigation of the stability and heterogeneity in their performance across populations and settings. Details on all reviewed models are publicly available at https://www.covprecise.org/. Methodological guidance as provided in this paper should be followed because unreliable predictions could cause more harm than benefit in guiding clinical decisions. Finally, prediction model authors should adhere to the TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) reporting guideline. Systematic review registration Protocol https://osf.io/ehc47/, registration https://osf.io/wy245. Readers’ note This article is a living systematic review that will be updated to reflect emerging evidence. Updates may occur for up to two years from the date of original publication. This version is update 3 of the original article published on 7 April 2020 (BMJ 2020;369:m1328). Previous updates can be found as data supplements (https://www.bmj.com/content/369/bmj.m1328/related#datasupp). When citing this paper please consider adding the update number and date of access for clarity.

1,358 citations


Frequently Asked Questions (1)
Q1. What have the authors contributed in "Predicting individual risk for covid19 complications using emr data" ?

Here the authors describe two approaches: a. the use of an existing EMR-based model for predicting complications due to influenza combined with available epidemiological data to create a model that identifies individuals at high risk to develop complications due to COVID-19 and b. a preliminary model that is trained using existing real world COVID-19 data. There was no funding for this study ( which was not certified by peer review ) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.