Open AccessPosted ContentDOI

The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart Validation

- 01 Jun 2021 -

Chats0

TLDR

The Digital Analytic Patient Reviewer (DAPR) as mentioned in this paper is a web-based chart review tool that integrates patient notes and provides note search functionalities and a patient-specific summary view linked with relevant notes.

Abstract:

ObjectiveTo provide high-quality data for COVID-19 research, we validated COVID-19 clinical indicators and 22 associated computed phenotypes, which were derived by machine learning algorithms, in the Mass General Brigham (MGB) COVID-19 Data Mart. Materials and MethodsFifteen reviewers performed a manual chart review for 150 COVID-19 positive patients in the data mart. To support rapid chart review for a wide range of target data, we offered the Digital Analytic Patient Reviewer (DAPR). DAPR is a web-based chart review tool that integrates patient notes and provides note search functionalities and a patient-specific summary view linked with relevant notes. Within DAPR, we developed a COVID-19 validation task-oriented view and information extraction logic, enabled fast access to data, and considered privacy and security issues. ResultsThe concepts for COVID-19 positive cohort, COVID-19 index date, COVID-19 related admission, and the admission date were shown to have high values in all evaluation metrics. For phenotypes, the overall specificities, PPVs, and NPVs were high. However, sensitivities were relatively low. Based on these results, we removed 3 phenotypes from our data mart. In the survey about using the tool, participants expressed positive attitudes towards using DAPR for chart review. They assessed the validation was easy and DAPR helped find relevant information. Some validation difficulties were also discussed. Discussion and ConclusionDAPRs patient summary view accelerated the validation process. We are in the process of automating the workflow to use DAPR for chart reviews. Moreover, we will extend its use case to other domains.

Content maybe subject to copyright Report

The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart

Validation

Heekyong Park, PhD

, Taowei David Wang, PhD

1,2,3

, Nich Wattanasin, MS

, Victor M. Castro,

, Vivian Gainer, MS

, Sergey Goryachev, MS

, Shawn Murphy, MD, PhD

1,2,3

Mass General Brigham, Somerville, MA, USA;

Massachusetts General Hospital, Boston, MA,

USA;

Harvard Medical School, Boston, MA, US

Abstract

Objective: To provide high-quality data for COVID-19 research, we validated COVID-19

clinical indicators and 22 associated computed phenotypes, which were derived by machine

learning algorithms, in the Mass General Brigham (MGB) COVID-19 Data Mart.

Materials and Methods: Fifteen reviewers performed a manual chart review for 150 COVID-19

positive patients in the data mart. To support rapid chart review for a wide range of target data,

we offered the Digital Analytic Patient Reviewer (DAPR). DAPR is a web-based chart review

tool that integrates patient notes and provides note search functionalities and a patient-specific

summary view linked with relevant notes. Within DAPR, we developed a COVID-19 validation

task-oriented view and information extraction logic, enabled fast access to data, and considered

privacy and security issues.

Results: The concepts for COVID-19 positive cohort, COVID-19 index date, COVID-19 related

admission, and the admission date were shown to have high values in all evaluation metrics. For

. CC-BY-ND 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

phenotypes, the overall specificities, PPVs, and NPVs were high. However, sensitivities were

relatively low. Based on these results, we removed 3 phenotypes from our data mart. In the

survey about using the tool, participants expressed positive attitudes towards using DAPR for

chart review. They assessed the validation was easy and DAPR helped find relevant information.

Some validation difficulties were also discussed.

Discussion and Conclusion: DAPR’s patient summary view accelerated the validation process.

We are in the process of automating the workflow to use DAPR for chart reviews. Moreover, we

will extend its use case to other domains.

. CC-BY-ND 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

Introduction

Background

When the COVID-19 pandemic arrived in the US [1], there was a growing demand for COVID-

19-related data in the research community. Providing accurate and fluent data in a timely manner

is essential to conquering this unprecedented disease. Mass General Brigham (MGB) Research

Information Science and Computing (RISC) quickly created data tools, including the COVID-19

Data Mart and the COVID-19 Summary Table [2], and made available this information to

research groups across the MGB system [3-12]. The COVID-19 Data Mart contains COVID-19-

tested patients and their associated data, both structured and unstructured. It provides direct

access to data tables as well as one-stop analysis options without having to pull data out of the

Mart. The COVID-19 Summary Table holds COVID-19 positive patient data in discrete data

columns. It is designed for quick identification and analysis of the COVID-19 positive patient

cohort. By the time we performed this study in July 2020, the COVID-19 Data Mart reached

over 88,000 patients and the COVID-19 Summary Table accumulated over 17,000 patients.

However, the advent of the new disease brought many challenges in providing high-quality data.

In the beginning, we did not have a diagnosis code for COVID-19, and there were a lot of false

negatives in COVID-19 test results. Even after the ICD-10 [13], LOINC [14], and CPT [15]

codes for COVID-19 were released, we could not solely rely on the coded data to identify

COVID-19 positive patients. First, most of the codes are recorded for billing purposes at the end

of a hospitalization or after the patient is discharged. If a patient’s data is integrated into a data

mart while the patient is still in hospital, code information is not yet available. Second, COVID-

19 information can be miscoded due to the time gap between a treatment and a COVID-19 test

result. For example, some patients were coded as COVID-19 patient initially but turned out to be

. CC-BY-ND 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

negative later. Lastly, transferred patients often do not have a COVID-19 test result in our

electronic health record (EHR) system. Instead, the information is only available in narrative

reports, making it harder to categorize them. Therefore, various new algorithms are developed

and applied to infer key information.

Associating COVID-19 data with clinically relevant information was also challenging. Since we

did not fully understand COVID-19, it was hard to decide, for example, what are the

comorbidities and what information would be helpful. Moreover, the influx of new patients

created exceptional situations. We did not have data in our system if COVID-19 patients were

transferred in. Large portions of them were healthy prior to admission so they had no rich data to

mine. Large volume of missing data raises concerns about the reliability of our phenotyping

algorithms [16-28]. In addition, during the surge, many seriously ill patients did not get coded as

having an ICU visit (i.e., a major severity indicator) due to the bed shortage. Therefore,

validating the COVID-19 data sets became an urgent goal.

Problems

Unlike other validations, COVID-19 data validation needed to be completed in a short time,

targeted broad disease domains, and was expected to require more note reviews. Our previous

validation efforts [29-34] typically focused on a single target disease and involved a few experts

on that disease to establish a gold standard by reviewing charts. However, the unprecedented

urgency of the pandemic and the novelty of the disease meant that we needed to rely on

volunteers with diverse clinical backgrounds and different chart review skills. The diversity in

clinical background meant that some validation goals were more difficult for some reviewers and

easier for others, depending on their clinical expertise. In addition, COVID-19 patients often lack

. CC-BY-ND 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

reliably coded data, as many of them are new to our system, so our reviewers had to be even

more reliant on text notes that describe patient history in natural language.

Objective

Our aim was to validate data in the COVID-19 Data Mart to provide a high-quality data resource

to the research community at Mass General Brigham. In the first validation phase, we validated

COVID-19 information and 22 phenotypes of COVID-19 positive patients. The target data were

derived facts computed by rule-based or machine learning algorithms. The task was reviewing

patient history manually to verify the derived values. To support the above objectives, we built

the Digital Analytics Patient Reviewer (DAPR) chart review tool. In this paper, we describe how

we transformed DAPR to serve the COVID-19 Data Mart validation work, how we streamlined

the validation process to utilize DAPR, the validation work itself, and the results.

Materials and methods

Data

We used the COVID-19 Summary Table to validate the MGB COVID-19 Data Mart. The

COVID-19 Summary Table originates from the MGB COVID-19 Data Mart. It includes

COVID-19 positive patient data, one row for every patient. The data types in the columns

include patient demographics, EPIC Infection flags, COVID-19 PCR and antibody laboratory

tests, inpatient admission information and phenotype data derived by various algorithms. We

selected 150 patients to validate the MGB COVID-19 Data Mart. The patients were randomly

chosen from the summary table patients who have at least 1 target phenotype in their history.

We asked the validators to validate the COVID-19 patient cohort indicator (Positive), index date

of COVID-19 positive status, admission associated with COVID-19 (Y/N), COVID-19

. CC-BY-ND 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted June 1, 2021. ; https://doi.org/10.1101/2021.05.30.21257945doi: medRxiv preprint

HTML Viewer

Figures

Table 2. Validation results (a) COVID-19 positive cohort and COVID-19 admission validation result

Figure 2. Quantitative survey results (N=10) This figure illustrates quantitative survey results only. Qualitative results are summarized in the manuscript.

Table 1. Target data for the COVID-19 Data Mart validation

References

PDF

Open Access

More filters

Journal Article

Electronic Medical Records for Discovery Research in Rheumatoid Arthritis

Katherine P. Liao, +11 more

- 01 Mar 2010 -

PubMed Central

TL;DR: In this article, a combination of narrative EMR data (obtained using NLP) and codified medical record data (e.g., free-form typed text in physician notes) was used to classify rheumatoid arthritis (RA) patients.

...read moreread less

Journal ArticleDOI

Portability of an algorithm to identify rheumatoid arthritis in electronic health records

Robert J. Carroll, +21 more

- 01 Jun 2012 -

Journal of the American Medical Informat...

TL;DR: Electronic phenotype algorithms allow rapid identification of case populations in multiple sites with little retraining and are portable to two external hospitals using different EHR systems, different NLP systems, and different target NLP vocabularies.

...read moreread less

Collapse

Towards augmenting structured EHR data: a comparison of manual chart review and patient self-report.

Nicole G. Weiskopf, +4 more

Validation of multisource electronic health record data : An application to blood transfusion data

Loan R. van Hoeven, +8 more

- 14 Jul 2017 -

BMC Medical Informatics and Decision Mak...

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

Alan Brnabic, +1 more

- 01 Feb 2021 -

BMC Medical Informatics and Decision Mak...

Leveraging the EHR4CR platform to support patient inclusion in academic studies: challenges and lessons learned.

Yannick Girardeau, +7 more

- 28 Feb 2017 -

BMC Medical Research Methodology

Frequently Asked Questions (1)

Q1. What are the contributions in "The digital analytic patient reviewer (dapr) for covid-19 data mart validation" ?

To support rapid chart review for a wide range of target data, the authors offered the Digital Analytic Patient Reviewer ( DAPR ). Within DAPR, the authors developed a COVID-19 validation task-oriented view and information extraction logic, enabled fast access to data, and considered privacy and security issues. It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. The authors are in the process of automating the workflow to use DAPR for chart reviews. It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The Digital Analytic Patient Reviewer (DAPR) for COVID-19 Data Mart Validation

Figures

References

COVID-19 and coagulation: bleeding and thrombotic manifestations of SARS-CoV-2 infection.

Electronic Medical Records for Discovery Research in Rheumatoid Arthritis

Development of phenotype algorithms using electronic medical records and incorporating natural language processing

Portability of an algorithm to identify rheumatoid arthritis in electronic health records

Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model

Related Papers (5)

Validation of Electronic Medical Record-Based Phenotyping Algorithms: Results and Lessons Learned From the eMERGE Network

Towards augmenting structured EHR data: a comparison of manual chart review and patient self-report.

Validation of multisource electronic health record data : An application to blood transfusion data

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

Leveraging the EHR4CR platform to support patient inclusion in academic studies: challenges and lessons learned.

Frequently Asked Questions (1)

Q1. What are the contributions in "The digital analytic patient reviewer (dapr) for covid-19 data mart validation" ?