scispace - formally typeset
Search or ask a question
Author

Sebastian Menke

Bio: Sebastian Menke is an academic researcher. The author has co-authored 1 publications.

Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors proposed a methodology for the evaluation of clinical NLP systems to assist NLP experts in carrying out this task, and they presented the application of all phases to evaluate the performance of a cNLP system called ERead Technology.
Abstract: Background: Clinical natural language processing (cNLP) systems are of crucial importance due to their increasing capability in extracting clinically important information from free text contained in electronic health records (EHRs). The conversion of a nonstructured representation of a patient’s clinical history into a structured format enables medical doctors to generate clinical knowledge at a level that was not possible before. Finally, the interpretation of the insights gained provided by cNLP systems has a great potential in driving decisions about clinical practice. However, carrying out robust evaluations of those cNLP systems is a complex task that is hindered by a lack of standard guidance on how to systematically approach them. Objective: Our objective was to offer natural language processing (NLP) experts a methodology for the evaluation of cNLP systems to assist them in carrying out this task. By following the proposed phases, the robustness and representativeness of the performance metrics of their own cNLP systems can be assured. Methods: The proposed evaluation methodology comprised five phases: (1) the definition of the target population, (2) the statistical document collection, (3) the design of the annotation guidelines and annotation project, (4) the external annotations, and (5) the cNLP system performance evaluation. We presented the application of all phases to evaluate the performance of a cNLP system called “EHRead Technology” (developed by Savana, an international medical company), applied in a study on patients with asthma. As part of the evaluation methodology, we introduced the Sample Size Calculator for Evaluations (SLiCE), a software tool that calculates the number of documents needed to achieve a statistically useful and resourceful gold standard. Results: The application of the proposed evaluation methodology on a real use-case study of patients with asthma revealed the benefit of the different phases for cNLP system evaluations. By using SLiCE to adjust the number of documents needed, a meaningful and resourceful gold standard was created. In the presented use-case, using as little as 519 EHRs, it was possible to evaluate the performance of the cNLP system and obtain performance metrics for the primary variable within the expected CIs. Conclusions: We showed that our evaluation methodology can offer guidance to NLP experts on how to approach the evaluation of their cNLP systems. By following the five phases, NLP experts can assure the robustness of their evaluation and avoid unnecessary investment of human and financial resources. Besides the theoretical guidance, we offer SLiCE as an easy-to-use, open-source Python library. Trial Registration:

21 citations


Cited by
More filters
Journal ArticleDOI
10 Feb 2022-PLOS ONE
TL;DR: The authors' results showed high rates of MACE in a large real-world series of PCI-revascularized patients with T2D and CAD with no history of MI or stroke, which represent a potential opportunity to improve the clinical management of these patients.
Abstract: Introduction and objectives Patients with type 2 diabetes (T2D) and stable coronary artery disease (CAD) previously revascularized with percutaneous coronary intervention (PCI) are at high risk of recurrent ischemic events. We aimed to provide real-world insights into the clinical characteristics and management of this clinical population, excluding patients with a history of myocardial infarction (MI) or stroke, using Natural Language Processing (NLP) technology. Methods This is a multicenter, retrospective study based on the secondary use of 2014–2018 real-world data captured in the Electronic Health Records (EHRs) of 1,579 patients (0.72% of the T2D population analyzed; n = 217,632 patients) from 12 representative hospitals in Spain. To access the unstructured clinical information in EHRs, we used the EHRead® technology, based on NLP and machine learning. Major adverse cardiovascular events (MACE) were considered: MI, ischemic stroke, urgent coronary revascularization, and hospitalization due to unstable angina. The association between MACE rates and the variables included in this study was evaluated following univariate and multivariate approaches. Results Most patients were male (72.13%), with a mean age of 70.5±10 years. Regarding T2D, most patients were non-insulin-dependent T2D (61.75%) with high prevalence of comorbidities. The median (Q1-Q3) duration of follow-up was 1.2 (0.3–4.5) years. Overall, 35.66% of patients suffered from at least one MACE during follow up. Using a Cox Proportional Hazards regression model analysis, several independent factors were associated with MACE during follow up: CAD duration (p < 0.001), COPD/Asthma (p = 0.021), heart valve disease (p = 0.031), multivessel disease (p = 0.005), insulin treatment (p < 0.001), statins treatment (p < 0.001), and clopidogrel treatment (p = 0.039). Conclusions Our results showed high rates of MACE in a large real-world series of PCI-revascularized patients with T2D and CAD with no history of MI or stroke. These data represent a potential opportunity to improve the clinical management of these patients.

6 citations

Journal ArticleDOI
TL;DR: This study is the first to use a cNLP system for the identification of CD in EHRs written in Spanish and demonstrates the ability of the EHRead technology to identify patients with CD and their related variables from the free text of E HRs.
Abstract: Background The exploration of clinically relevant information in the free text of electronic health records (EHRs) holds the potential to positively impact clinical practice as well as knowledge regarding Crohn disease (CD), an inflammatory bowel disease that may affect any segment of the gastrointestinal tract. The EHRead technology, a clinical natural language processing (cNLP) system, was designed to detect and extract clinical information from narratives in the clinical notes contained in EHRs. Objective The aim of this study is to validate the performance of the EHRead technology in identifying information of patients with CD. Methods We used the EHRead technology to explore and extract CD-related clinical information from EHRs. To validate this tool, we compared the output of the EHRead technology with a manually curated gold standard to assess the quality of our cNLP system in detecting records containing any reference to CD and its related variables. Results The validation metrics for the main variable (CD) were a precision of 0.88, a recall of 0.98, and an F1 score of 0.93. Regarding the secondary variables, we obtained a precision of 0.91, a recall of 0.71, and an F1 score of 0.80 for CD flare, while for the variable vedolizumab (treatment), a precision, recall, and F1 score of 0.86, 0.94, and 0.90 were obtained, respectively. Conclusions This evaluation demonstrates the ability of the EHRead technology to identify patients with CD and their related variables from the free text of EHRs. To the best of our knowledge, this study is the first to use a cNLP system for the identification of CD in EHRs written in Spanish.

6 citations

Journal ArticleDOI
TL;DR: In this paper , the authors evaluated the frecuencia de different types of cancer in patients diagnosed with hypothyroidism using big data methodology on the Savana Manager platform and found that patients with this diagnosis had a significantly higher frequency of cancer than that found in non-hypothyroid subjects (OR 2.09, 95% confidence interval [CI] 2.92-1.02).

5 citations

Journal ArticleDOI
TL;DR: In this paper , a retrospective study was carried out using data from the electronic medical record (EMR) of the Hospital Universitario Puerta de Hierro Majadahonda (Madrid, Spain).

4 citations

Journal ArticleDOI
TL;DR: This analysis shows that, in a real-life setting, ECOPD hospitalisations are prevalent, complex, repetitive and associated with significant in-hospital mortality.
Abstract: Background Patients with chronic obstructive pulmonary disease (COPD) often suffer episodes of exacerbation of symptoms (ECOPD) that may eventually require hospitalisation due to several, often overlapping, causes. We aimed to analyse the characteristics of patients hospitalised because of ECOPD in a real-life setting using a “big data” approach. Methods The study population included all patients over 40 years old with a diagnosis of COPD (n=69 359; prevalence 3.72%) registered from 1 January 2011 to 1 March 2020 in the database of the public healthcare service (SESCAM) of Castilla-La Mancha (Spain) (n=1 863 759 subjects). We used natural language processing (Savana Manager version 3.0) to identify those who were hospitalised during this period for any cause, including ECOPD. Results During the study 26 453 COPD patients (38.1%) were hospitalised (at least once). Main diagnoses at discharge were respiratory infection (51%), heart failure (38%) or pneumonia (19%). ECOPD was the main diagnosis at discharge (or hospital death) in 8331 patients (12.0% of the entire COPD population and 31.5% of those hospitalised). In-hospital ECOPD-related mortality rate was 3.11%. These patients were hospitalised 2.36 times per patient, with a mean hospital stay of 6.1 days. Heart failure was the most frequent comorbidity in patients hospitalised because of ECOPD (52.6%). Conclusions This analysis shows that, in a real-life setting, ECOPD hospitalisations are prevalent, complex (particularly in relation to heart failure), repetitive and associated with significant in-hospital mortality. In a real-life setting, COPD hospitalisations are prevalent, complex (particularly in relation to heart failure), repetitive and associated with significant in-hospital mortality https://bit.ly/3zCP2ZC

2 citations