Predicting individual risk for COVID19 complications using EMR data
TL;DR: Two approaches are described that can effectively identify patients at high-risk for complication, thus allowing optimization of resources and more focused follow up and early triage these patients if once symptoms worsen.
Abstract: Background The global pandemic of COVID-19 has challenged healthcare organizations and caused numerous deaths and hospitalizations worldwide. The need for data-based decision support tools for many aspects of controlling and treating the disease is evident but has been hampered by the scarcity of real-world reliable data. Here we describe two approaches: a. the use of an existing EMR-based model for predicting complications due to influenza combined with available epidemiological data to create a model that identifies individuals at high risk to develop complications due to COVID-19 and b. a preliminary model that is trained using existing real world COVID-19 data. Methods We have utilized the computerized data of Maccabi Healthcare Services a 2.3 million member state-mandated health organization in Israel. The age and sex matched matrix used for training the XGBoost ILI-based model included, circa 690,000 rows and 900 features. The available dataset for COVID-based model included a total 2137 SARS-CoV-2 positive individuals who were either not hospitalized (n = 1658), or hospitalized and marked as mild (n = 332), or as having moderate (n = 83) or severe (n = 64) complications. Findings The AUC of our models and the priors on the 2137 COVID-19 patients for predicting moderate and severe complications as cases and all other as controls, the AUC for the ILI-based model was 0.852[0.824–0.879] for the COVID19-based model – 0.872[0.847–0.879]. Interpretation These models can effectively identify patients at high-risk for complication, thus allowing optimization of resources and more focused follow up and early triage these patients if once symptoms worsen. Funding There was no funding for this study Research in context Evidence before this study We have search PubMed for coronavirus[MeSH Major Topic] AND the following MeSH terms: risk score, predictive analytics, algorithm, predictive analytics. Only few studies were found on predictive analytics for developing COVID19 complications using real-world data. Many of the relevant works were based on self-reported information and are therefore difficult to implement at large scale and without patient or physician participation. Added value of this study We have described two models for assessing risk of COVID-19 complications and mortality, based on EMR data. One model was derived by combining a machine-learning model for influenza-complications with epidemiological data for age and sex dependent mortality rates due to COVID-19. The other was directly derived from initial COVID-19 complications data. Implications of all the available evidence The developed models may effectively identify patients at high-risk for developing COVID19 complications. Implementing such models into operational data systems may support COVID-19 care workflows and assist in triaging patients.
Summary (2 min read)
- Since January 2020, the COVID-19 pandemic has become a global emergency.
- Healthcare organizations and governments, worldwide, are strained due to shortage of resources and the need to make timely decisions based on very little reliable data.
- These decisions include – who to test, how to treat positive cases, how to manage social distancing and reach-out to population at risk, contact tracing, and more.
- Many of these decisions could benefit from decision support tools based on EMR and additional data sources, such as geospatial information.
- Unfortunately, accurate data-driven tools are still difficult to develop due to the limited availability of COVID19 patients’ data with historical EMR records.
- The models were trained using data from Maccabi Health Service (MHS) – a large Israeli HMO with a central EMR database containing longitudinal data for 2 million active individuals each year between 2010 and 2018.
- The data included full EMR information - demographics (e.g. age and sex), behavioral info (smoking status), vital signs, lab test results, diagnoses and procedures (using the International Classification of Diseases 9 th version ), medication prescriptions and purchases, and hospital admissions (dates and departments only).
- Since the number of in MHS members who are positive for SARS-CoV-2 is relatively low, and the data available is biased due to the current limitations of tests and challenges of data collection and curation, the authors have therefore chosen to test two complimentary approaches.
- First, the authors use a proxy model that they derived for identifying patients with high risk of developing complications due to influenza and apply some required adjustments.
- It is already apparent that both diseases have common risk factors for developing complications.
- Following these differences, the authors modified the ILIbased model and forced it to ignore age and sex as risk factors, and then used Bayesian correction to add these risk factors using external priors.
- For the training COVID-19ased model, the authors used information on SARS-CoV-2 positive individuals aged 19 or above within the MHS population, as well as information regarding hospitalization and in-hospital complications.
- For training the ILI-based model, a training set of all MHS members at September 1 st of every calendar year who were not vaccinated during the following flu-season.
- The authors marked them as cases if they were diagnosed with ILI followed by complications within 3 months, and controls if otherwise.
- Bins were matched for age (5yegendar groups) and sex.
- To combine the prediction of the calibrated model with age and sex priors for complications, the authors used the following formula – 𝑃𝐶𝑜𝑚𝑏𝑖𝑛𝑒𝑑 = 𝑃𝑀𝑜𝑑𝑒𝑙𝑃𝑃𝑟𝑖𝑜𝑟 𝑃𝑀𝑜𝑑𝑒𝑙𝑃𝑃𝑟𝑖𝑜𝑟 + 𝑂𝑑𝑑𝑠 × (1 − 𝑃𝑀𝑜𝑑𝑒𝑙)(1 − 𝑃𝑃𝑟𝑖𝑜𝑟).
- The authors used the definitions of the Israel Ministry of Health for COVID19 complications: moderate (defined as pneumonia, with one of the following: respiratory rate above 30 breaths per minute, Respiratory distress, or oxygen saturation below 90%) or severe (pneumonia accompanied by sepsis, shock, ARDS or death).
- The authors then created a vector that of features per each individual, including risk factors and underlying conditions .
- The authors used XGBoost on the features matrix to learn a COVID-19 complications predictor based on these features.
- Given that real world data on COVID-19 are currently limited, it is difficult to evaluate the performance of their models.
- The authors report here several methods they have used to estimate the value of the models.
- The authors examined the excess risk of underlying health conditions, compared to information from the CDC [https://www.cdc.gov/mmwr/volumes/69/wr/mm6913e2.htm#F1_down].
- Evaluating performance of the model on initial COVID19 complications records.
- Lift was evaluated by calculating the average prediction over the population with the underlying conditions, and comparing to the average prediction over a reference population.
- The age and sex matched matrix used for training the XGBoost model included, after feature selection, about 690,000 rows and 900 features (compared to about 790,000 rows and 1584 features for the non-matched model).
- The available dataset included a total 2137 SARS-CoV-2 positive individuals who were either not hospitalized (n=1658), or hospitalized and marked as mild (n=332), or as having moderate (n=83) or severe (n=64) complications.
- Individuals who were hospitalized but not assigned severity level were excluded.
- All individuals were linked to their MHS medical record in order to generate the features matrix.
- The AUC of the full (non-matched) model for predicting influenza-complication was 0.744, the matched model AUC was 0.726.
- It is reasonable to suspect due to the small size of the dataset that the latter model is too specific to the MHS and less generalizable compared to the ILI-based model.
- The medical staff are the first to contact confirmed COVID-19 patients, question them and decide on the appropriate treatment facility based on their symptoms and overall medical assessment.
Did you find this useful? Give us your feedback
Related Papers (5)
Frequently Asked Questions (1)
Q1. What have the authors contributed in "Predicting individual risk for covid19 complications using emr data" ?
Here the authors describe two approaches: a. the use of an existing EMR-based model for predicting complications due to influenza combined with available epidemiological data to create a model that identifies individuals at high risk to develop complications due to COVID-19 and b. a preliminary model that is trained using existing real world COVID-19 data. There was no funding for this study ( which was not certified by peer review ) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.