scispace - formally typeset
Search or ask a question
Author

Madonna Kemp

Bio: Madonna Kemp is an academic researcher from Commonwealth Scientific and Industrial Research Organisation. The author has contributed to research in topics: SNOMED CT & Systematized Nomenclature of Medicine. The author has an hindex of 5, co-authored 10 publications receiving 278 citations. Previous affiliations of Madonna Kemp include Royal Brisbane and Women's Hospital.

Papers
More filters
Journal ArticleDOI
TL;DR: A new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs), which contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules.

217 citations

Journal ArticleDOI
TL;DR: The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths.
Abstract: Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV.

44 citations

01 Jan 2015
TL;DR: In this paper, a set of machine learning and rule-based methods were used to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV.
Abstract: Background: Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV Methods: Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000–2007 in New South Wales, Australia Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness A detailed error analysis was performed on classification errors Results: Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 096) More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 080) The error analysis revealed that word variations as well as certain word combinations adversely affected classification In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness Conclusions: The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates

32 citations

Proceedings Article
05 Dec 2018
TL;DR: The results show the potential for advanced NLP-based approaches that leverage SNOMED CT to ICD-10 mapping for hospital in-patient coding on a broad spectrum of diagnostic codes and, in particular, the effectiveness of utilising SNOMed CT for I CD-10 diagnosis coding.
Abstract: Computer-assisted (diagnostic) coding (CAC) aims to improve the operational productivity and accuracy of clinical coders. The level of accuracy, especially for a wide range of complex and less prevalent clinical cases, remains an open research problem. This study investigates this problem on a broad spectrum of diagnostic codes and, in particular, investigates the effectiveness of utilising SNOMED CT for ICD-10 diagnosis coding. Hospital progress notes were used to provide the narrative rich electronic patient records for the investigation. A natural language processing (NLP) approach using mappings between SNOMED CT and ICD-10-AM (Australian Modification) was used to guide the coding. The proposed approach achieved 54.1% sensitivity and 70.2% positive predictive value. Given the complexity of the task, this was encouraging given the simplicity of the approach and what was projected as possible from a manual diagnosis code validation study (76.3% sensitivity). The results show the potential for advanced NLP-based approaches that leverage SNOMED CT to ICD-10 mapping for hospital in-patient coding.

31 citations

Journal ArticleDOI
TL;DR: Mapping existing sets of clinical terms to a national emergency department SNOMED CT reference set will facilitate consistency between emergency department data collections and improve the usefulness of the data for clinical and analytical purposes.
Abstract: • Emergency departments around Australia use a range of software to capture data on patients' reason for encounter, presenting problem and diagnosis The data collected are mainly based on descriptions and codes of the International Classification of Diseases, 10th revision, Australian modification (ICD-10-AM), with each emergency department having a tailored list of terms • The National E-Health Transition Authority is introducing a standard clinical terminology, the Systematized Nomenclature of Medicine ― Clinical Terms (SNOMED CT), as one of the building blocks of an e-health infrastructure in Australia The Australian e-Health Research Centre has developed a software platform, Snapper, which facilitates mapping of existing clinical terms to the SNOMED CT terminology • Using the Snapper software, reference sets of terms for emergency departments are being developed, based on the Australian version of SNOMED CT (SNOMED CT-AU) Existing software systems need to be able to implement these reference sets to support standardised recording of data at the point of care • As the terms collected will be part of a larger terminology, they will be useful for patients' admission and discharge summaries and for computerised clinical decision making • Mapping existing sets of clinical terms to a national emergency department SNOMED CT reference set will facilitate consistency between emergency department data collections and improve the usefulness of the data for clinical and analytical purposes

26 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This Review provides a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting information demands of chemical information contained in scientific literature, patents, technical reports, or the web.
Abstract: Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the const...

197 citations

Proceedings ArticleDOI
06 Jun 2016
TL;DR: This work investigates the use of neural networks to learn the transition between layman’s language used in social media messages and formal medicallanguage used in the descriptions of medical concepts in a standard ontology, and proposes approaches to outperform existing effective baselines.
Abstract: Automatically recognising medical concepts mentioned in social media messages (e.g. tweets) enables several applications for enhancing health quality of people in a community, e.g. real-time monitoring of infectious diseases in population. However, the discrepancy between the type of language used in social media and medical ontologies poses a major challenge. Existing studies deal with this challenge by employing techniques, such as lexical term matching and statistical machine translation. In this work, we handle the medical concept normalisation at the semantic level. We investigate the use of neural networks to learn the transition between layman’s language used in social media messages and formal medical language used in the descriptions of medical concepts in a standard ontology. We evaluate our approaches using three different datasets, where social media texts are extracted from Twitter messages and blog posts. Our experimental results show that our proposed approaches significantly and consistently outperform existing effective baselines, which achieved state-of-the-art performance on several medical concept normalisation tasks, by up to 44%.

127 citations

Proceedings ArticleDOI
01 Dec 2020
TL;DR: It is shown that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets, through experiments on two data sets from the biomedical and materials science domains.
Abstract: Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.

124 citations