scispace - formally typeset
Search or ask a question

Showing papers presented at "American Medical Informatics Association Annual Symposium in 2016"


Proceedings Article
01 Jan 2016
TL;DR: This paper introduces a simple yet powerful knowledge-distillation approach called interpretable mimic learning, which uses gradient boosting trees to learn interpretable models and at the same time achieves strong prediction performance as deep learning models.
Abstract: Exponential surge in health care data, such as longitudinal data from electronic health records (EHR), sensor data from intensive care unit (ICU), etc, is providing new opportunities to discover meaningful data-driven characteristics and patterns ofdiseases Recently, deep learning models have been employedfor many computational phenotyping and healthcare prediction tasks to achieve state-of-the-art performance However, deep models lack interpretability which is crucial for wide adoption in medical research and clinical decision-making In this paper, we introduce a simple yet powerful knowledge-distillation approach called interpretable mimic learning, which uses gradient boosting trees to learn interpretable models and at the same time achieves strong prediction performance as deep learning models Experiment results on Pediatric ICU dataset for acute lung injury (ALI) show that our proposed method not only outperforms state-of-the-art approaches for morality and ventilator free days prediction tasks but can also provide interpretable models to clinicians

234 citations


Proceedings Article
01 Jan 2016
TL;DR: From the study, technical, social, and organizational challenges in sharing and fully leveraging patient-generated data in clinical practices are identified and can provide researchers potential avenues for enablers and barriers in sharing patient- generated data inclinical settings.
Abstract: Patients are tracking and generating an increasingly large volume of personal health data outside the clinic due to an explosion of wearable sensing and mobile health (mHealth) apps. The potential usefulness of these data is enormous as they can provide good measures of everyday behavior and lifestyle. However, how we can fully leverage patient-generated data (PGD) and integrate them in clinical practice is less clear. In this interview study, we aim to understand how patients and clinicians currently share patient-generated data in clinical care practice. From the study, we identified technical, social, and organizational challenges in sharing and fully leveraging patient-generated data in clinical practices. Our findings can provide researchers potential avenues for enablers and barriers in sharing patient-generated data in clinical settings.

88 citations


Proceedings Article
01 Jan 2016
TL;DR: A new approach for the detection of similar questions based on Recognizing Question Entailment (RQE) is proposed, which considers Frequently Asked Question (FAQs) as a valuable and widespread source of information and proposes an existing answer if FAQ similar to a consumer health question exists.
Abstract: With the increasing heterogeneity and specialization of medical texts, automated question answering is becoming more and more challenging In this context, answering a given medical question by retrieving similar questions that are already answered by human experts seems to be a promising solution In this paper, we propose a new approach for the detection of similar questions based on Recognizing Question Entailment (RQE) In particular, we consider Frequently Asked Question (FAQs) as a valuable and widespread source of information Our final goal is to automatically provide an existing answer if FAQ similar to a consumer health question exists We evaluate our approach using consumer health questions received by the National Library of Medicine and FAQs collected from NIH websites Our first results are promising and suggest the feasibility of our approach as a valuable complement to classic question answering approaches

66 citations


Proceedings Article
01 Jan 2016
TL;DR: The study results call for the attention of international stakeholders (educators, managers, policy makers) to improve the current issues with EHRs from a nursing perspective.
Abstract: This study presents a qualitative content analysis of nurses' satisfaction and issues with current electronic health record (EHR) systems, as reflected in one of the largest international surveys o

60 citations


Proceedings Article
Xiang Li1, Haifeng Liu1, Xin Du, Ping Zhang1, Gang Hu1, Guotong Xie1, Shijing Guo1, Meilin Xu2, Xiaoping Xie2 
01 Jan 2016
TL;DR: In this study, integrated machine learning and data mining approaches are used to build 2-year TE prediction models for AF from Chinese Atrial Fibrillation Registry data to achieve higher prediction performance and identify new potential risk factors as well.
Abstract: Atrial fibrillation (AF) is a common cardiac rhythm disorder, which increases the risk of ischemic stroke and other thromboembolism (TE) Accurate prediction of TE is highly valuable for early intervention to AF patients However, the prediction performance of previous TE risk models for AF is not satisfactory In this study, we used integrated machine learning and data mining approaches to build 2-year TE prediction models for AF from Chinese Atrial Fibrillation Registry data We first performed data cleansing and imputation on the raw data to generate available dataset Then a series of feature construction and selection methods were used to identify predictive risk factors, based on which supervised learning methods were applied to build the prediction models The experimental results show that our approach can achieve higher prediction performance (AUC: 071~074) than previous TE prediction models for AF (AUC: 066~069), and identify new potential risk factors as well

36 citations


Proceedings Article
01 Jan 2016
TL;DR: Inpatient portal use was associated with patients who were white, male, and had longer lengths of stay, and viewing health record data and secure messaging were the most commonly used functions.
Abstract: Patient portal research has focused on medical outpatient settings, with little known about portal use during hospitalizations or by surgical patients. We measured portal adoption among patients admitted to surgical services over two years. Surgical services managed 37,025 admissions of 31,310 unique patients. One-fourth of admissions (9,362, 25.3%) involved patients registered for the portal. Registration rates were highest for admissions to laparoscopic/gastrointestinal (55%) and oncology/endocrine (50%) services. Portal use occurred during 1,486 surgical admissions, 4% of all and 16% of those registered at admission. Inpatient portal use was associated with patients who were white, male, and had longer lengths of stay (p < 0.01). Viewing health record data and secure messaging were the most commonly used functions, accessed in 4,836 (72.9%) and 1,626 (24.5%) user sessions. Without specific encouragement, hospitalized surgical patients are using our patient portal. The surgical inpatient setting may provide opportunities for patient engagement using patient portals.

31 citations


Proceedings Article
01 Jan 2016
TL;DR: The feasibility of identifying frailty indicators from clinical notes and linking these to clinically relevant outcomes is demonstrated and future work includes integrating frailtyicators into validated predictive tools.
Abstract: Frailty is an important health outcomes indicator and valuable for guiding healthcare decisions in older adults, but is rarely collected in a quantitative, systematic fashion in routine healthcare. Using a cohort of 12,000 Veterans with heart failure, we investigated the feasibility of topic modeling to identify frailty topics in clinical notes. Topics were generated through unsupervised learning and then manually reviewed by an expert. A total of 53 frailty topics were identified from 100,000 notes. We further examined associations of frailty with age-, sex-, and Charlson Comorbidity Index-adjusted 1-year hospitalizations and mortality (composite outcome) using logistic regression. Frailty (≤ 4 topics versus <4) was associated with twice the risk of the composite outcome, Odds Ratio: 2.2, 95% Confidence Interval: (2.0-2.4). This study demonstrates the feasibility of identifying frailty indicators from clinical notes and linking these to clinically relevant outcomes. Future work includes integrating frailty indicators into validated predictive tools.

29 citations


Proceedings Article
01 Jan 2016
TL;DR: This work determines the most common topics in patient comments, design automatic topic classifiers, identify comments ' sentiment, and find new topics in negative comments, and uses topic modeling to search for unexpected topics within negative comments.
Abstract: Important information is encoded in free-text patient comments. We determine the most common topics in patient comments, design automatic topic classifiers, identify comments ' sentiment, and find new topics in negative comments. Our annotation scheme consisted of 28 topics, with positive and negative sentiment. Within those 28 topics, the seven most frequent accounted for 63% of annotations. For automated topic classification, we developed vocabulary-based and Naive Bayes ' classifiers. For sentiment analysis, another Naive Bayes ' classifier was used. Finally, we used topic modeling to search for unexpected topics within negative comments. The seven most common topics were appointment access, appointment wait, empathy, explanation, friendliness, practice environment, and overall experience. The best F-measures from our classifier were 0.52(NB), 0.57(NB), 0.36(Vocab), 0.74(NB), 0.40(NB), and 0.44(Vocab), respectively. F- scores ranged from 0.16 to 0.74. The sentiment classification F-score was 0.84. Negative comment topic modeling revealed complaints about appointment access, appointment wait, and time spent with physician.

29 citations


Proceedings Article
01 Jan 2016
TL;DR: This work presents a realistic use case by testing several semi-supervised classification algorithms on a large hand-annotated medical record of occurrences of 74 ambiguous abbreviations and demonstrates that semi- supervised abbreviation disambiguation is a viable and extensible option for medical NLP systems.
Abstract: Abbreviation disambiguation in clinical texts is a problem handled well by fully supervised machine learning methods. Acquiring training data, however, is expensive and would be impractical for large numbers of abbreviations in specialized corpora. An alternative is a semi-supervised approach, in which training data are automatically generated by substituting long forms in natural text with their corresponding abbreviations. Most prior implementations of this method either focus on very few abbreviations or do not test on real-world data. We present a realistic use case by testing several semi-supervised classification algorithms on a large hand-annotated medical record of occurrences of 74 ambiguous abbreviations. Despite notable differences between training and test corpora, classifiers achieve up to 90% accuracy. Our tests demonstrate that semi-supervised abbreviation disambiguation is a viable and extensible option for medical NLP systems.

28 citations


Proceedings Article
01 Jan 2016
TL;DR: A time motion study to understand nursing workflow, specifically multitasking and task switching activities, using TimeCaT, a comprehensive electronic time capture tool to capture observational data and reviewing workflow visualization to uncover the multitasking events.
Abstract: A fundamental understanding of multitasking within nursing workflow is important in today's dynamic and complex healthcare environment. We conducted a time motion study to understand nursing workflow, specifically multitasking and task switching activities. We used TimeCaT, a comprehensive electronic time capture tool, to capture observational data. We established inter-observer reliability prior to data collection. We completed 56 hours of observation of 10 registered nurses. We found, on average, nurses had 124 communications and 208 hands-on tasks per 4-hour block of time. They multitasked (having communication and hands-on tasks simultaneously) 131 times, representing 39.48% of all times; the total multitasking duration ranges from 14.6 minutes to 109 minutes, 44.98 minutes (18.63%) on average. We also reviewed workflow visualization to uncover the multitasking events. Our study design and methods provide a practical and reliable approach to conducting and analyzing time motion studies from both quantitative and qualitative perspectives.

27 citations


Proceedings Article
01 Jan 2016
TL;DR: Simulation studies based on the characteristics of the Electronic Medical Records and Genomics (eMERGE) Network quantified the loss of power due to misclassifications in case ascertainment and measurement errors in covariate status extraction under a variety of conditions in EHR based association studies.
Abstract: Over the last decade, Electronic Health Records (EHR) systems have been increasingly implemented at US hospitals Despite their great potential, the complex and uneven nature of clinical documentation and data quality brings additional challenges for analyzing EHR data A critical challenge is the information bias due to the measurement errors in outcome and covariates We conducted empirical studies to quantify the impacts of the information bias on association study Specifically, we designed our simulation studies based on the characteristics of the Electronic Medical Records and Genomics (eMERGE) Network Through simulation studies, we quantified the loss of power due to misclassifications in case ascertainment and measurement errors in covariate status extraction, with respect to different levels of misclassification rates, disease prevalence, and covariate frequencies These empirical findings can inform investigators for better understanding of the potential power loss due to misclassification and measurement errors under a variety of conditions in EHR based association studies

Proceedings Article
01 Jan 2016
TL;DR: A feature selection module to identify most discriminative Single Nucleotide Polymorphism (SNP) based on informativeness and an Expectation Maximization (EM)-based Maximum Likelihood estimator to identify the individual admixture are implemented.
Abstract: In this paper we proposed a framework: PRivacy-preserving EstiMation of Individual admiXture (PREMIX) using Intel software guard extensions (SGX). SGX is a suite of software and hardware architectures to enable efficient and secure computation over confidential data. PREMIX enables multiple sites to securely collaborate on estimating individual admixture within a secure enclave inside Intel SGX. We implemented a feature selection module to identify most discriminative Single Nucleotide Polymorphism (SNP) based on informativeness and an Expectation Maximization (EM)-based Maximum Likelihood estimator to identify the individual admixture. Experimental results based on both simulation and 1000 genome data demonstrated the efficiency and accuracy of the proposed framework. PREMIX ensures a high level of security as all operations on sensitive genomic data are conducted within a secure enclave using SGX.

Proceedings Article
01 Jan 2016
TL;DR: The authors' approach scored best in all widely used metrics like precision, recall or the ratio of relevant predictions present among the top ranked results, and was as much as 125.79% over the next best approach.
Abstract: We propose a new computational method for discovery of possible adverse drug reactions. The method consists of two key steps. First we use openly available resources to semi-automatically compile a consolidated data set describing drugs and their features (e.g., chemical structure, related targets, indications or known adverse reaction). The data set is represented as a graph, which allows for definition of graph-based similarity metrics. The metrics can then be used for propagating known adverse reactions between similar drugs, which leads to weighted (i.e., ranked) predictions of previously unknown links between drugs and their possible side effects. We implemented the proposed method in the form of a software prototype and evaluated our approach by discarding known drug-side effect links from our data and checking whether our prototype is able to re-discover them. As this is an evaluation methodology used by several recent state of the art approaches, we could compare our results with them. Our approach scored best in all widely used metrics like precision, recall or the ratio of relevant predictions present among the top ranked results. The improvement was as much as 125.79% over the next best approach. For instance, the F1 score was 0.5606 (66.35% better than the next best method). Most importantly, in 95.32% of cases, the top five results contain at least one, but typically three correctly predicted side effect (36.05% better than the second best approach).

Proceedings Article
01 Jan 2016
TL;DR: An automated postoperative complications detection application is developed by using structured electronic health record (EHR) data and several machine learning methods to the detection of commonly occurring complications, including three subtypes of surgical site infection, pneumonia, urinary tract infection, sepsis, and septic shock.
Abstract: Manual Chart Review (MCR) is an important but labor-intensive task for clinical research and quality improvement In this study, aiming to accelerate the process of extracting postoperative outcomes from medical charts, we developed an automated postoperative complications detection application by using structured electronic health record (EHR) data We applied several machine learning methods to the detection of commonly occurring complications, including three subtypes of surgical site infection, pneumonia, urinary tract infection, sepsis, and septic shock Particularly, we applied one single-task and five multi-task learning methods and compared their detection performance The models demonstrated high detection performance, which ensures the feasibility of accelerating MCR Specifically, one of the multi-task learning methods, propensity weighted observations (PWO) demonstrated the highest detection performance, with single-task learning being a close second

Proceedings Article
01 Jan 2016
TL;DR: Examination of use of patient-provider messaging in a patient portal across pediatric specialties during the three years after implementation of pediatric portal accounts at Vanderbilt University Medical Center found rapid growth in messaging volume over time.
Abstract: Few studies have explored adoption of patient portals for pediatric patients outside primary care or disease-specific applications. We examined use of patient-provider messaging in a patient portal across pediatric specialties during the three years after implementation of pediatric portal accounts at Vanderbilt University Medical Center. We determined the number of patient-initiated message threads and clinic visits for pediatric specialties and percentage of these outpatient interactions (i.e., message threads + clinic visits) done through messaging. Generalized estimating equations measured the likelihood of message-based interaction. During the study period, pediatric families initiated 33,503 messages and participated in 318,386 clinic visits. The number of messages sent (and messaging percentage of outpatient interaction) increased each year from 2,860 (2.7%) to 18,772 (17%). Primary care received 4,368 messages (3.4% of outpatient interactions); pediatric subspecialties, 29,135 (13.0%). Rapid growth in messaging volume over time was seen in primary care and most pediatric specialties (OR>1.0; p<0.05).

Proceedings Article
01 Jan 2016
TL;DR: This study validate and extend the 13 steps that Shiffman et al.5 identified for translating CPG knowledge for use in CDS and provides details on an updated model that outlines all of the steps used to translate CPGknowledge into a CDS integrated with existing health information technology.
Abstract: As utilization of clinical decision support (CDS) increases, it is important to continue the development and refinement of methods to accurately translate the intention of clinical practice guidelines (CPG) into a computable form In this study, we validate and extend the 13 steps that Shiffman et al5 identified for translating CPG knowledge for use in CDS During an implementation project of ATHENA-CDS, we encoded complex CPG recommendations for five common chronic conditions for integration into an existing clinical dashboard Major decisions made during the implementation process were recorded and categorized according to the 13 steps During the implementation period, we categorized 119 decisions and identified 8 new categories required to complete the project We provide details on an updated model that outlines all of the steps used to translate CPG knowledge into a CDS integrated with existing health information technology

Proceedings Article
01 Jan 2016
TL;DR: It is found that participants' EHR-interactive behavior was associated with their routine processes, patient case complexity, and EHR default settings, and the proposed approach has significant potential to inform resource allocation for observation and training.
Abstract: There are numerous methods to study workflow However, few produce the kinds of in-depth analyses needed to understand EHR-mediated workflow Here we investigated variations in clinicians' EHR workflow by integrating quantitative analysis of patterns of users' EHR-interactions with in-depth qualitative analysis of user performance We characterized 6 clinicians' patterns of information-gathering using a sequential process-mining approach The analysis revealed 519 different screen transition patterns performed across 1569 patient cases No one pattern was followed for more than 10% of patient cases, the 15 most frequent patterns accounted for over half ofpatient cases (53%), and 27% of cases exhibited unique patterns By triangulating quantitative and qualitative analyses, we found that participants' EHR-interactive behavior was associated with their routine processes, patient case complexity, and EHR default settings The proposed approach has significant potential to inform resource allocation for observation and training In-depth observations helped us to explain variation across users

Proceedings Article
01 Jan 2016
TL;DR: An NLP ensemble pipeline is built to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts.
Abstract: Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR) To extract numerous and diverse concepts, such as data elements (ie, important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort

Proceedings Article
01 Jan 2016
TL;DR: A compartmental model with a vector-host structure for ZIKV is proposed and the basic reproduction number is derived to gain insight on containment strategies and to help decision makers select and invest in the strategies most effective to combat the infection spread.
Abstract: The Zika virus (ZIKV) outbreak in South American countries and its potential association with microcephaly in newborns and Guillain-Barre Syndrome led the World Health Organization to declare a Public Health Emergency of International Concern. To understand the ZIKV disease dynamics and evaluate the effectiveness of different containment strategies, we propose a compartmental model with a vector-host structure for ZIKV. The model utilizes logistic growth in human population and dynamic growth in vector population. Using this model, we derive the basic reproduction number to gain insight on containment strategies. We contrast the impact and influence of different parameters on the virus trend and outbreak spread. We also evaluate different containment strategies and their combination effects to achieve early containment by minimizing total infections. This result can help decision makers select and invest in the strategies most effective to combat the infection spread. The decision-support tool demonstrates the importance of "digital disease surveillance" in response to waves of epidemics including ZIKV, Dengue, Ebola and cholera.

Proceedings Article
01 Jan 2016
TL;DR: A double-loop interactive machine learning process, named ReQ-ReC (ReQuery-ReClassify), which is comparable with or faster than current active learning methods in building WSD models.
Abstract: Resolving word ambiguity in clinical text is critical for many natural language processing applications. Effective word sense disambiguation (WSD) systems rely on training a machine learning based classifier with abundant clinical text that is accurately annotated, the creation of which can be costly and time-consuming. We describe a double-loop interactive machine learning process, named ReQ-ReC (ReQuery-ReClassify), and demonstrate its effectiveness on multiple evaluation corpora. Using ReQ-ReC, a human expert first uses her domain knowledge to include sense-specific contextual words into the ReQuery loops and searches for instances relevant to the senses. Then, in the ReClassify loops, the expert only annotates the most ambiguous instances found by the current WSD model. Even with machine-generated queries only, the framework is comparable with or faster than current active learning methods in building WSD models. The process can be further accelerated when human experts use their domain knowledge to guide the search process.

Proceedings Article
01 Jan 2016
TL;DR: This work evaluates machine learning classifiers applied to high-dimensional vector representations of relationships extracted from the literature as a means to identify substantiated drug/ADE connections and shows applying classifiers to such representations improves performance over previous approaches.
Abstract: An important aspect of post-marketing drug surveillance involves identifying potential side-effects utilizing adverse drug event (ADE) reporting systems and/or Electronic Health Records. These data are noisy, necessitating identified drug/ADE associations be manually reviewed - a human-intensive process that scales poorly with large numbers of possibly dangerous associations and rapid growth of biomedical literature. Recent work has employed Literature Based Discovery methods that exploit implicit relationships between biomedical entities within the literature to estimate the plausibility of drug/ADE connections. We extend this work by evaluating machine learning classifiers applied to high-dimensional vector representations of relationships extracted from the literature as a means to identify substantiated drug/ADE connections. Using a curated reference standard, we show applying classifiers to such representations improves performance (+≈37%AUC) over previous approaches. These trained systems reproduce outcomes of the manual literature review process used to create the reference standard, but further research is required to establish their generalizability.

Proceedings Article
01 Jan 2016
TL;DR: The results suggest that multitrajectory modeling via GBTM can shed light on the developmental course of CKD and the interactions between related complications.
Abstract: An ever increasing number of people are affected by chronic kidney disease (CKD) A better understanding of the progression ofCKD and its complications is needed to address what is becoming a major burden for health-care systems worldwide Utilizing a rich data set consisting of the Electronic Health Records (EHRs) of more than 33,000 patients from a leading community nephrology practice in Western Pennsylvania, we applied group-based trajectory modeling (GBTM) in order to detect patient risk groups and uncover typical progressions of CKD and related comorbidities and complications We have found distinct risk groups with differing trajectories and are able to classify new patients into these groups with high accuracy (up to ≈ 90%) Our results suggest that multitrajectory modeling via GBTM can shed light on the developmental course ofCKD and the interactions between related complications

Proceedings Article
01 Jan 2016
TL;DR: A multi-modal EEG patient cohort retrieval system called MERCuRY is presented which leverages the heterogeneous nature of EEG data by processing both the clinical narratives from EEG reports as well as the raw electrode potentials derived from the recorded EEG signal data.
Abstract: Clinical electroencephalography (EEG) is the most important investigation in the diagnosis and management of epilepsies An EEG records the electrical activity along the scalp and measures spontaneous electrical activity of the brain Because the EEG signal is complex, its interpretation is known to produce moderate inter-observer agreement among neurologists This problem can be addressed by providing clinical experts with the ability to automatically retrieve similar EEG signals and EEG reports through a patient cohort retrieval system operating on a vast archive of EEG data In this paper, we present a multi-modal EEG patient cohort retrieval system called MERCuRY which leverages the heterogeneous nature of EEG data by processing both the clinical narratives from EEG reports as well as the raw electrode potentials derived from the recorded EEG signal data At the core of MERCuRY is a novel multimodal clinical indexing scheme which relies on EEG data representations obtained through deep learning The index is used by two clinical relevance models that we have generated for identifying patient cohorts satisfying the inclusion and exclusion criteria expressed in natural language queries Evaluations of the MERCuRY system measured the relevance of the patient cohorts, obtaining MAP scores of 6987% and a NDCG of 8321%

Proceedings Article
01 Jan 2016
TL;DR: It was found that patients and parents both valued MyChart, but had different views about the role of the PHR for care communication and management, and different attitudes about its impact on the patient's ability to manage care.
Abstract: Supporting adolescent patient engagement in care is an important yet underexplored topic in consumer health informatics Personal Health Records (PHRs) show potential, but designing PHR systems to accommodate both emerging adults and their parents is challenging We conducted a mixed-methods study with teenage adolescent patients (ages 13-17) with cancer and blood disorders, and their parents, to investigate their experiences with My-Chart, a tethered PHR system Through analyses of usage logs and independently-conducted surveys and interviews, we found that patients and parents both valued MyChart, but had different views about the role of the PHR for care communication and management, and different attitudes about its impact on the patient's ability to manage care Specific motivations for using MyChart included patient-parent coordination of care activities, communication around hospital encounters, and support for transitioning to adult care Finally, some parents had concerns about certain diagnostic test results being made available to their children

Proceedings Article
01 Jan 2016
TL;DR: The results indicate that structured fields for tobacco use alone may not be able to provide complete tobacco use information, and further work is needed to improve Tobacco use information integration from different parts of the EHR.
Abstract: The electronic health record (EHR) provides an opportunity for improved use of clinical documentation including leveraging tobacco use information by clinicians and researchers In this study, we investigated the content, consistency, and completeness of tobacco use data from structured and unstructured sources in the EHR A natural language process (NLP) pipeline was utilized to extract details about tobacco use from clinical notes and free-text tobacco use comments within the social history module of an EHR system We analyzed the consistency of tobacco use information within clinical notes, comments, and available structured fields for tobacco use Our results indicate that structured fields for tobacco use alone may not be able to provide complete tobacco use information While there was better consistency for some elements (eg, status and type), inconsistencies were found particularly for temporal information Further work is needed to improve tobacco use information integration from different parts of the EHR

Proceedings Article
01 Jan 2016
TL;DR: This work focused on post-discharge ICU mortality prediction and incorporated ICD-9-CM hierarchy into Bayesian topic model learning and extracted topic features from medical notes to emphasize the interpretability of topic features derived from topic model which may facilitates the understanding of the complexity between mortality and diseases.
Abstract: Electronic health records provide valuable resources for understanding the correlation between various diseases and mortality The analysis of post-discharge mortality is critical for healthcare professionals to follow up potential causes of death after a patient is discharged from the hospital and give prompt treatment Moreover, it may reduce the cost derived from readmissions and improve the quality of healthcare Our work focused on post-discharge ICU mortality prediction In addition to features derived from physiological measurements, we incorporated ICD-9-CM hierarchy into Bayesian topic model learning and extracted topic features from medical notes We achieved highest AUCs of 0835 and 0829 for 30-day and 6-month post-discharge mortality prediction using baseline and topic proportions derived from Labeled-LDA Moreover, our work emphasized the interpretability of topic features derived from topic model which may facilitates the understanding and investigation of the complexity between mortality and diseases

Proceedings Article
01 Jan 2016
TL;DR: Identification of communicability and high-level clinical reasoning as important factors determining user satisfaction can lead to development and design of more usable electronic health records with higher user satisfaction.
Abstract: Introduction. Implementations of electronic health records (EHR) have been met with mixed outcome reviews. Complaints about these systems have led to many attempts to have useful measures of end-user satisfaction. However, most user satisfaction assessments do not focus on high-level reasoning, despite the complaints of many physicians. Our study attempts to identify some of these determinants. Method. We developed a user satisfaction survey instrument, based on pre-identified and important clinical and non-clinical clinician tasks. We surveyed a sample of in-patient physicians and focused on using exploratory factor analyses to identify underlying high-level cognitive tasks. We used the results to create unique, orthogonal variables representative of latent structure predictive of user satisfaction. Results. Our findings identified 3 latent high-level tasks that were associated with end-user satisfaction: a) High- level clinical reasoning b) Communicate/coordinate care and c) Follow the rules/compliance. Conclusion: We were able to successfully identify latent variables associated with satisfaction. Identification of communicability and high-level clinical reasoning as important factors determining user satisfaction can lead to development and design of more usable electronic health records with higher user satisfaction.

Proceedings Article
01 Jan 2016
TL;DR: Overall, participants utilized the tool regularly and appreciated its presence and their interactions and indicate the feasibility of a digital companion for people with MCI.
Abstract: Study Objective: The purpose of this study was to examine the feasibility of a digital companion system used by older adults with mild cognitive impairment (MCI) We utilized a commercially available system that is comprehensive in its functionalities (including conversation ability, use of pictures and other media, and reminders) to explore the system's impact on older adults ' social interactions, anxiety, depressive symptoms, and acceptance of the system Study Design: We conducted a three-month mixed methods evaluation study of the digital companion Results: Ten female community-dwelling older adults (average age 783 years) participated in the study Overall, participants utilized the tool regularly and appreciated its presence and their interactions Participants scored higher at the end of the study in cognition and social support scales, and lower in presence of depressive symptoms Conclusion: Findings indicate the feasibility of a digital companion for people with MCI and inform the need for additional research

Proceedings Article
01 Jan 2016
TL;DR: A web-based patient-facing decision aid to inform high-risk women about the risks and benefits of chemoprevention and facilitate shared decision-making with their primary care provider, and implement interface changes to make RealRisks accessible to users with varying health literacy and acculturation.
Abstract: Chemoprevention with antiestrogens could decrease the incidence of invasive breast cancer but uptake has been low among high-risk women in the United States We have designed a web-based patient-facing decision aid, called RealRisks, to inform high-risk women about the risks and benefits of chemoprevention and facilitate shared decision-making with their primary care provider We conducted two rounds of usability testing to determine how subjects engaged with and understood the information in RealRisks A total of 7 English-speaking and 4 Spanish-speaking subjects completed testing Using surveys, think-aloud protocols, and subject recordings, we identified several themes relating to the usability of RealRisks, specifically in the content, ease of use, and navigability of the application By conducting studies in two languages with a diverse multi-ethnic population, we were able to implement interface changes to make RealRisks accessible to users with varying health literacy and acculturation

Proceedings Article
01 Jan 2016
TL;DR: This paper proposes a topic recognition approach based on biomedical and open-domain knowledge bases that outperformed the results obtained by individual knowledge bases by up to 16.5% F1 and achieved state-of-the-art performance.
Abstract: Determining the main topics in consumer health questions is a crucial step in their processing as it allows narrowing the search space to a specific semantic context In this paper we propose a topic recognition approach based on biomedical and open-domain knowledge bases In the first step of our method, we recognize named entities in consumer health questions using an unsupervised method that relies on a biomedical knowledge base, UMLS, and an open-domain knowledge base, DBpedia In the next step, we cast topic recognition as a binary classification problem of deciding whether a named entity is the question topic or not We evaluated our approach on a dataset from the National Library of Medicine (NLM), introduced in this paper, and another from the Genetic and Rare Disease Information Center (GARD) The combination of knowledge bases outperformed the results obtained by individual knowledge bases by up to 165% F1 and achieved state-of-the-art performance Our results demonstrate that combining open-domain knowledge bases with biomedical knowledge bases can lead to a substantial improvement in understanding user-generated health content