scispace - formally typeset
Open AccessPosted ContentDOI

The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart Validation

Reads0
Chats0
TLDR
The Digital Analytic Patient Reviewer (DAPR) as mentioned in this paper is a web-based chart review tool that integrates patient notes and provides note search functionalities and a patient-specific summary view linked with relevant notes.
Abstract
ObjectiveTo provide high-quality data for COVID-19 research, we validated COVID-19 clinical indicators and 22 associated computed phenotypes, which were derived by machine learning algorithms, in the Mass General Brigham (MGB) COVID-19 Data Mart. Materials and MethodsFifteen reviewers performed a manual chart review for 150 COVID-19 positive patients in the data mart. To support rapid chart review for a wide range of target data, we offered the Digital Analytic Patient Reviewer (DAPR). DAPR is a web-based chart review tool that integrates patient notes and provides note search functionalities and a patient-specific summary view linked with relevant notes. Within DAPR, we developed a COVID-19 validation task-oriented view and information extraction logic, enabled fast access to data, and considered privacy and security issues. ResultsThe concepts for COVID-19 positive cohort, COVID-19 index date, COVID-19 related admission, and the admission date were shown to have high values in all evaluation metrics. For phenotypes, the overall specificities, PPVs, and NPVs were high. However, sensitivities were relatively low. Based on these results, we removed 3 phenotypes from our data mart. In the survey about using the tool, participants expressed positive attitudes towards using DAPR for chart review. They assessed the validation was easy and DAPR helped find relevant information. Some validation difficulties were also discussed. Discussion and ConclusionDAPRs patient summary view accelerated the validation process. We are in the process of automating the workflow to use DAPR for chart reviews. Moreover, we will extend its use case to other domains.

read more

Content maybe subject to copyright    Report

The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart
Validation
Heekyong Park, PhD
1
, Taowei David Wang, PhD
1,2,3
, Nich Wattanasin, MS
1
, Victor M. Castro,
MS
1
, Vivian Gainer, MS
1
, Sergey Goryachev, MS
1
, Shawn Murphy, MD, PhD
1,2,3
1
Mass General Brigham, Somerville, MA, USA;
2
Massachusetts General Hospital, Boston, MA,
USA;
3
Harvard Medical School, Boston, MA, US
Abstract
Objective: To provide high-quality data for COVID-19 research, we validated COVID-19
clinical indicators and 22 associated computed phenotypes, which were derived by machine
learning algorithms, in the Mass General Brigham (MGB) COVID-19 Data Mart.
Materials and Methods: Fifteen reviewers performed a manual chart review for 150 COVID-19
positive patients in the data mart. To support rapid chart review for a wide range of target data,
we offered the Digital Analytic Patient Reviewer (DAPR). DAPR is a web-based chart review
tool that integrates patient notes and provides note search functionalities and a patient-specific
summary view linked with relevant notes. Within DAPR, we developed a COVID-19 validation
task-oriented view and information extraction logic, enabled fast access to data, and considered
privacy and security issues.
Results: The concepts for COVID-19 positive cohort, COVID-19 index date, COVID-19 related
admission, and the admission date were shown to have high values in all evaluation metrics. For
. CC-BY-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

phenotypes, the overall specificities, PPVs, and NPVs were high. However, sensitivities were
relatively low. Based on these results, we removed 3 phenotypes from our data mart. In the
survey about using the tool, participants expressed positive attitudes towards using DAPR for
chart review. They assessed the validation was easy and DAPR helped find relevant information.
Some validation difficulties were also discussed.
Discussion and Conclusion: DAPR’s patient summary view accelerated the validation process.
We are in the process of automating the workflow to use DAPR for chart reviews. Moreover, we
will extend its use case to other domains.
. CC-BY-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

Introduction
Background
When the COVID-19 pandemic arrived in the US [1], there was a growing demand for COVID-
19-related data in the research community. Providing accurate and fluent data in a timely manner
is essential to conquering this unprecedented disease. Mass General Brigham (MGB) Research
Information Science and Computing (RISC) quickly created data tools, including the COVID-19
Data Mart and the COVID-19 Summary Table [2], and made available this information to
research groups across the MGB system [3-12]. The COVID-19 Data Mart contains COVID-19-
tested patients and their associated data, both structured and unstructured. It provides direct
access to data tables as well as one-stop analysis options without having to pull data out of the
Mart. The COVID-19 Summary Table holds COVID-19 positive patient data in discrete data
columns. It is designed for quick identification and analysis of the COVID-19 positive patient
cohort. By the time we performed this study in July 2020, the COVID-19 Data Mart reached
over 88,000 patients and the COVID-19 Summary Table accumulated over 17,000 patients.
However, the advent of the new disease brought many challenges in providing high-quality data.
In the beginning, we did not have a diagnosis code for COVID-19, and there were a lot of false
negatives in COVID-19 test results. Even after the ICD-10 [13], LOINC [14], and CPT [15]
codes for COVID-19 were released, we could not solely rely on the coded data to identify
COVID-19 positive patients. First, most of the codes are recorded for billing purposes at the end
of a hospitalization or after the patient is discharged. If a patient’s data is integrated into a data
mart while the patient is still in hospital, code information is not yet available. Second, COVID-
19 information can be miscoded due to the time gap between a treatment and a COVID-19 test
result. For example, some patients were coded as COVID-19 patient initially but turned out to be
. CC-BY-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

negative later. Lastly, transferred patients often do not have a COVID-19 test result in our
electronic health record (EHR) system. Instead, the information is only available in narrative
reports, making it harder to categorize them. Therefore, various new algorithms are developed
and applied to infer key information.
Associating COVID-19 data with clinically relevant information was also challenging. Since we
did not fully understand COVID-19, it was hard to decide, for example, what are the
comorbidities and what information would be helpful. Moreover, the influx of new patients
created exceptional situations. We did not have data in our system if COVID-19 patients were
transferred in. Large portions of them were healthy prior to admission so they had no rich data to
mine. Large volume of missing data raises concerns about the reliability of our phenotyping
algorithms [16-28]. In addition, during the surge, many seriously ill patients did not get coded as
having an ICU visit (i.e., a major severity indicator) due to the bed shortage. Therefore,
validating the COVID-19 data sets became an urgent goal.
Problems
Unlike other validations, COVID-19 data validation needed to be completed in a short time,
targeted broad disease domains, and was expected to require more note reviews. Our previous
validation efforts [29-34] typically focused on a single target disease and involved a few experts
on that disease to establish a gold standard by reviewing charts. However, the unprecedented
urgency of the pandemic and the novelty of the disease meant that we needed to rely on
volunteers with diverse clinical backgrounds and different chart review skills. The diversity in
clinical background meant that some validation goals were more difficult for some reviewers and
easier for others, depending on their clinical expertise. In addition, COVID-19 patients often lack
. CC-BY-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

reliably coded data, as many of them are new to our system, so our reviewers had to be even
more reliant on text notes that describe patient history in natural language.
Objective
Our aim was to validate data in the COVID-19 Data Mart to provide a high-quality data resource
to the research community at Mass General Brigham. In the first validation phase, we validated
COVID-19 information and 22 phenotypes of COVID-19 positive patients. The target data were
derived facts computed by rule-based or machine learning algorithms. The task was reviewing
patient history manually to verify the derived values. To support the above objectives, we built
the Digital Analytics Patient Reviewer (DAPR) chart review tool. In this paper, we describe how
we transformed DAPR to serve the COVID-19 Data Mart validation work, how we streamlined
the validation process to utilize DAPR, the validation work itself, and the results.
Materials and methods
Data
We used the COVID-19 Summary Table to validate the MGB COVID-19 Data Mart. The
COVID-19 Summary Table originates from the MGB COVID-19 Data Mart. It includes
COVID-19 positive patient data, one row for every patient. The data types in the columns
include patient demographics, EPIC Infection flags, COVID-19 PCR and antibody laboratory
tests, inpatient admission information and phenotype data derived by various algorithms. We
selected 150 patients to validate the MGB COVID-19 Data Mart. The patients were randomly
chosen from the summary table patients who have at least 1 target phenotype in their history.
We asked the validators to validate the COVID-19 patient cohort indicator (Positive), index date
of COVID-19 positive status, admission associated with COVID-19 (Y/N), COVID-19
. CC-BY-ND 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

References
More filters
Journal ArticleDOI

COVID-19 and coagulation: bleeding and thrombotic manifestations of SARS-CoV-2 infection.

TL;DR: Elevated D-dimer at initial presentation was predictive of coagulation-associated complications during hospitalization and ESR, CRP, fibrinogen, ferritin, and procalcitonin were higher in patients with thrombotic complications than in those without.
Journal Article

Electronic Medical Records for Discovery Research in Rheumatoid Arthritis

TL;DR: In this article, a combination of narrative EMR data (obtained using NLP) and codified medical record data (e.g., free-form typed text in physician notes) was used to classify rheumatoid arthritis (RA) patients.
Journal ArticleDOI

Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model

TL;DR: The application of bioinformatics tools such as NLP should enable accurate and efficient determination of longitudinal outcomes, enabling existing EMR data to be applied to clinical research, including biomarker investigations.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions in "The digital analytic patient reviewer (dapr) for covid-19 data mart validation" ?

To support rapid chart review for a wide range of target data, the authors offered the Digital Analytic Patient Reviewer ( DAPR ). Within DAPR, the authors developed a COVID-19 validation task-oriented view and information extraction logic, enabled fast access to data, and considered privacy and security issues. It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. The authors are in the process of automating the workflow to use DAPR for chart reviews. It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.