Showing papers by "Sarvnaz Karimi published in 2015"

PDF

Open Access

Journal Article•DOI•

Cadec: A corpus of adverse drug event annotations

[...]

Sarvnaz Karimi¹, Alejandro Metke-Jimenez¹, Madonna Kemp¹, Chen Wang¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

01 Jun 2015-Journal of Biomedical Informatics

TL;DR: A new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs), which contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules.

...read moreread less

217 citations

Journal Article•DOI•

Text and Data Mining Techniques in Adverse Drug Reaction Detection

[...]

Sarvnaz Karimi¹, Chen Wang¹, Alejandro Metke-Jimenez¹, Raj Gaire¹, Cecile Paris¹ - Show less +1 more•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

11 May 2015-ACM Computing Surveys

TL;DR: In order to highlight the importance of contributions made by computer scientists in this area so far, the existing approaches are categorized and review, and most importantly, areas where more research should be undertaken are identified.

...read moreread less

Abstract: We review data mining and related computer science techniques that have been studied in the area of drug safety to identify signals of adverse drug reactions from different data sources, such as spontaneous reporting databases, electronic health records, and medical literature. Development of such techniques has become more crucial for public heath, especially with the growth of data repositories that include either reports of adverse drug reactions, which require fast processing for discovering signals of adverse reactions, or data sources that may contain such signals but require data or text mining techniques to discover them. In order to highlight the importance of contributions made by computer scientists in this area so far, we categorize and review the existing approaches, and most importantly, we identify areas where more research should be undertaken.

...read moreread less

124 citations

Proceedings Article•

Using social media to enhance emergency situation awareness

[...]

Jie Yin¹, Sarvnaz Karimi¹, Andrew Lampert², Mark Cameron¹, Bella Robinson¹, Robert Power¹ - Show less +2 more•Institutions (2)

Commonwealth Scientific and Industrial Research Organisation¹, Palantir Technologies²

25 Jul 2015

TL;DR: The described system uses natural language processing and data mining techniques to extract situation awareness information from Twitter messages generated during various disasters and crises.

...read moreread less

Abstract: Social media platforms, such as Twitter, offer a rich source of real-time information about real-world events, particularly during mass emergencies. Sifting valuable information from social media provides useful insight into time-critical situations for emergency officers to understand the impact of hazards and act on emergency responses in a timely manner. This work focuses on analyzing Twitter messages generated during natural disasters, and shows how natural language processing and data mining techniques can be utilized to extract situation awareness information from Twitter. We present key relevant approaches that we have investigated including burst detection, tweet filtering and classification, online clustering, and geotagging.

...read moreread less

46 citations

Journal Article•DOI•

Automatic classification of diseases from free-text death certificates for real-time surveillance

[...]

Bevan Koopman¹, Sarvnaz Karimi¹, Anthony Nguyen¹, Rhydwyn McGuire², David Muscatello², Madonna Kemp¹, Donna Truran¹, Ming Zhang¹, Sarah Thackway² - Show less +5 more•Institutions (2)

Royal Brisbane and Women's Hospital¹, Ministry of Health (New South Wales)²

15 Jul 2015-BMC Medical Informatics and Decision Making

TL;DR: The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths.

...read moreread less

Abstract: Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV.

...read moreread less

44 citations

Automatic classification of diseases from free-text death certificates for real-time

[...]

Sarvnaz Karimi, Anthony Nguyen, Rhydwyn McGuire, David Muscatello, Madonna Kemp, Donna Truran, Ming Zhang, Sarah Thackway - Show less +4 more

01 Jan 2015

TL;DR: In this paper, a set of machine learning and rule-based methods were used to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV.

...read moreread less

Abstract: Background: Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV Methods: Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000–2007 in New South Wales, Australia Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness A detailed error analysis was performed on classification errors Results: Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 096) More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 080) The error analysis revealed that word variations as well as certain word combinations adversely affected classification In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness Conclusions: The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates

...read moreread less

32 citations

Posted Content•

Concept Extraction to Identify Adverse Drug Reactions in Medical Forums: A Comparison of Algorithms.

[...]

Alejandro Metke-Jimenez¹, Sarvnaz Karimi¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

27 Apr 2015-arXiv: Artificial Intelligence

TL;DR: This study is the first to systematically examine the effect of popular concept extraction methods in the area of signal detection for adverse reactions, and shows that the choice of algorithm or controlled vocabulary has a significant impact on concept extraction, which will impact the overall signal detection process.

...read moreread less

Abstract: Social media is becoming an increasingly important source of information to complement traditional pharmacovigilance methods. In order to identify signals of potential adverse drug reactions, it is necessary to first identify medical concepts in the social media text. Most of the existing studies use dictionary-based methods which are not evaluated independently from the overall signal detection task. We compare different approaches to automatically identify and normalise medical concepts in consumer reviews in medical forums. Specifically, we implement several dictionary-based methods popular in the relevant literature, as well as a method we suggest based on a state-of-the-art machine learning method for entity recognition. MetaMap, a popular biomedical concept extraction tool, is used as a baseline. Our evaluations were performed in a controlled setting on a common corpus which is a collection of medical forum posts annotated with concepts and linked to controlled vocabularies such as MedDRA and SNOMED CT. To our knowledge, our study is the first to systematically examine the effect of popular concept extraction methods in the area of signal detection for adverse reactions. We show that the choice of algorithm or controlled vocabulary has a significant impact on concept extraction, which will impact the overall signal detection process. We also show that our proposed machine learning approach significantly outperforms all the other methods in identification of both adverse reactions and drugs, even when trained with a relatively small set of annotated text.

...read moreread less

24 citations

Proceedings Article•

Using Social Media to Enhance Emergency Situation Awareness: Extended Abstract

[...]

Jie Yin¹, Sarvnaz Karimi¹, Andrew Lampert², Mark Cameron¹, Bella Robinson¹, Robert Power¹ - Show less +2 more•Institutions (2)

Commonwealth Scientific and Industrial Research Organisation¹, Palantir Technologies²

27 Jun 2015

TL;DR: This work focuses on analyzing Twitter messages generated during natural disasters, and shows how natural language processing and data mining techniques can be utilized to extract situation awareness information from Twitter.

...read moreread less

14 citations

Journal Article•DOI•

Evaluation methods for statistically dependent text

[...]

Sarvnaz Karimi¹, Jie Yin¹, Jiri Baum•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

01 Sep 2015-Computational Linguistics

TL;DR: By ignoring the statistical dependence of the text messages published in social media, standard cross-validation can result in misleading conclusions in a machine learning task, and this work explores alternative evaluation methods that explicitly deal with statistical dependence in text.

...read moreread less

Abstract: In recent years, many studies have been published on data collected from social media, especially microblogs such as Twitter. However, rather few of these studies have considered evaluation methodologies that take into account the statistically dependent nature of such data, which breaks the theoretical conditions for using cross-validation. Despite concerns raised in the past about using cross-validation for data of similar characteristics, such as time series, some of these studies evaluate their work using standard k-fold cross-validation. Through experiments on Twitter data collected during a two-year period that includes disastrous events, we show that by ignoring the statistical dependence of the text messages published in social media, standard cross-validation can result in misleading conclusions in a machine learning task. We explore alternative evaluation methods that explicitly deal with statistical dependence in text. Our work also raises concerns for any other data for which similar conditions might hold.

...read moreread less

13 citations

Proceedings Article•DOI•

CADEminer: A System for Mining Consumer Reports on Adverse Drug Side Effects

[...]

Sarvnaz Karimi¹, Alejandro Metke-Jimenez¹, Anthony Nguyen¹•Institutions (1)

Commonwealth Scientific and Industrial Research Organisation¹

22 Oct 2015

TL;DR: CADEminer, a system that mines consumer reviews on medications in order to facilitate discovery of drug side effects that may not have been identified in clinical trials, is introduced.

...read moreread less

Abstract: We introduce CADEminer, a system that mines consumer reviews on medications in order to facilitate discovery of drug side effects that may not have been identified in clinical trials. CADEminer utilises search and natural language processing techniques to (a) extract mentions of side effects, and other relevant concepts such as drug names and diseases in reviews; (b) normalise the extracted mentions to their unified representation in ontologies such as SNOMED CT and MedDRA; (c) identify relationships between extracted concepts, such as a drug caused a side effect; (d) search in authoritative lists of known drug side effects to identify whether or not the extracted side effects are new and therefore require further investigation; and finally (e) provide statistics and visualisation of the data.

...read moreread less

12 citations