scispace - formally typeset
Search or ask a question

Showing papers by "Sarvnaz Karimi published in 2019"


Proceedings ArticleDOI
01 Jun 2019
TL;DR: Three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data are proposed and demonstrate that these measures are good predictors of the usefulness of pretrained models for Named Entity Recognition (NER) over 30 data pairs.
Abstract: Word vectors and Language Models (LMs) pretrained on a large amount of unlabelled data can dramatically improve various Natural Language Processing (NLP) tasks. However, the measure and impact of similarity between pretraining data and target task data are left to intuition. We propose three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data. We demonstrate that these measures are good predictors of the usefulness of pretrained models for Named Entity Recognition (NER) over 30 data pairs. Results also suggest that pretrained LMs are more effective and more predictable than pretrained word vectors, but pretrained word vectors are better when pretraining data is dissimilar.

36 citations


Proceedings ArticleDOI
01 Jul 2019
TL;DR: NNE as discussed by the authors is a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB) with up to 6 layers of nested entity mentions.
Abstract: Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

29 citations


Journal ArticleDOI
TL;DR: This survey discusses approaches for epidemic intelligence that use textual datasets, referring to it as “text-based epidemic intelligence,” view past work in terms of two broad categories: health mention classification and health event detection.
Abstract: Epidemic intelligence deals with the detection of outbreaks using formal (such as hospital records) and informal sources (such as user-generated text on the web) of information. In this survey, we discuss approaches for epidemic intelligence that use textual datasets, referring to it as “text-based epidemic intelligence.” We view past work in terms of two broad categories: health mention classification (selecting relevant text from a large volume) and health event detection (predicting epidemic events from a collection of relevant text). The focus of our discussion is the underlying computational linguistic techniques in the two categories. The survey also provides details of the state of the art in annotation techniques, resources, and evaluation strategies for epidemic intelligence.

23 citations


Posted Content
TL;DR: This work describes NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank, which comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting.
Abstract: Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

22 citations


Journal ArticleDOI
TL;DR: This article will discuss and compare the approaches of using an Exponentially Weighted Moving Average (EWMA) statistic for the TBEs to the EWMA of counts, and has a robust monitoring plan which is able to efficiently detect many different levels of shifts.
Abstract: This article focuses on monitor plans aimed at the early detection of the increase in the frequency of events. The literature recommends either monitoring the time between events (TBE) if events ar...

19 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: An encoder-decoder based framework that can automatically generate radiology reports from medical images is proposed that uses a Convolutional Neural Network as an encoder coupled with a multi-stage Stacked Long Short-Term Memory as a decoder to generate reports.
Abstract: Interpreting medical images and summarising them in the form of radiology reports is a challenging, tedious, and complex task. A radiologist provides a complete description of a medical image in the form of radiology report by describing normal or abnormal findings and providing a summary for decision making. Research shows that the radiology practice is error-prone due to the limited number of experts, increasing patient volumes, and the subjective nature of human perception. To reduce the number of diagnostic errors and to alleviate the task of radiologists, there is a need for a computer-aided report generation system that can automatically generate a radiology report for a given medical image. We propose an encoder-decoder based framework that can automatically generate radiology reports from medical images. Specifically, we use a Convolutional Neural Network as an encoder coupled with a multi-stage Stacked Long Short-Term Memory as a decoder to generate reports. We perform experiments on the Indiana University Chest X-ray collection, a publicly available dataset, to measure the effectiveness of our model. Experimental results show the effectiveness of our model in automatically generating radiology reports from medical images.

16 citations


Posted Content
TL;DR: This survey discusses approaches for epidemic intelligence that use textual datasets, referring to it as `text-based epidemic intelligence', and views past work in terms of two broad categories: health mention classification and health event detection.
Abstract: Epidemic intelligence deals with the detection of disease outbreaks using formal (such as hospital records) and informal sources (such as user-generated text on the web) of information. In this survey, we discuss approaches for epidemic intelligence that use textual datasets, referring to it as `text-based epidemic intelligence'. We view past work in terms of two broad categories: health mention classification (selecting relevant text from a large volume) and health event detection (predicting epidemic events from a collection of relevant text). The focus of our discussion is the underlying computational linguistic techniques in the two categories. The survey also provides details of the state-of-the-art in annotation techniques, resources and evaluation strategies for epidemic intelligence.

15 citations


Posted Content
TL;DR: In this paper, the authors combine a state-of-the-art figurative usage detection with CNN-based personal health mention detection for predicting whether or not a given sentence is a report of a health condition.
Abstract: Personal health mention detection deals with predicting whether or not a given sentence is a report of a health condition. Past work mentions errors in this prediction when symptom words, i.e. names of symptoms of interest, are used in a figurative sense. Therefore, we combine a state-of-the-art figurative usage detection with CNN-based personal health mention detection. To do so, we present two methods: a pipeline-based approach and a feature augmentation-based approach. The introduction of figurative usage detection results in an average improvement of 2.21% F-score of personal health mention detection, in the case of the feature augmentation-based approach. This paper demonstrates the promise of using figurative usage detection to improve personal health mention detection.

14 citations


Proceedings ArticleDOI
01 Jan 2019
TL;DR: The promise of using figurative usage detection to improve personal health mention detection is demonstrated by presenting two methods: a pipeline-based approach and a feature augmentation- based approach.
Abstract: Personal health mention detection deals with predicting whether or not a given sentence is a report of a health condition. Past work mentions errors in this prediction when symptom words, i.e., names of symptoms of interest, are used in a figurative sense. Therefore, we combine a state-of-the-art figurative usage detection with CNN-based personal health mention detection. To do so, we present two methods: a pipeline-based approach and a feature augmentation-based approach. The introduction of figurative usage detection results in an average improvement of 2.21% F-score of personal health mention detection, in the case of the feature augmentation-based approach. This paper demonstrates the promise of using figurative usage detection to improve personal health mention detection.

14 citations


Posted Content
TL;DR: This article proposed three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data and demonstrate that these measures are good predictors of the usefulness of pretrained models for NER over 30 data pairs.
Abstract: Word vectors and Language Models (LMs) pretrained on a large amount of unlabelled data can dramatically improve various Natural Language Processing (NLP) tasks. However, the measure and impact of similarity between pretraining data and target task data are left to intuition. We propose three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data. We demonstrate that these measures are good predictors of the usefulness of pretrained models for Named Entity Recognition (NER) over 30 data pairs. Results also suggest that pretrained LMs are more effective and more predictable than pretrained word vectors, but pretrained word vectors are better when pretraining data is dissimilar.

13 citations


01 Jan 2019
TL;DR: It is shown that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect, and three recommendations for future work that uses RouGE to evaluate opinion summarisation are presented.
Abstract: One of the most common metrics to automatically evaluate opinion summaries is ROUGE, a metric developed for text summarisation. ROUGE counts the overlap of word or word units between a candidate summary against reference summaries. This formulation treats all words in the reference summary equally.In opinion summaries, however, not all words in the reference are equally important. Opinion summarisation requires to correctly pair two types of semantic information: (1) aspect or opinion target; and (2) polarity of candidate and reference summaries. We investigate the suitability of ROUGE for evaluating opin-ion summaries of online reviews. Using three simulation-based experiments, we evaluate the behaviour of ROUGE for opinion summarisation on the ability to match aspect and polarity. We show that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect. Moreover,ROUGE scores have significant variance under different configuration settings. As a result, we present three recommendations for future work that uses ROUGE to evaluate opinion summarisation.

Proceedings ArticleDOI
18 Jul 2019
TL;DR: An on-line system which enables experimentation in search for precision medicine within the framework provided by the TREC Precision Medicine (PM) track is presented and some of the most promising gene mention expansion methods, as well as learning-to-rank using neural networks are provided.
Abstract: Precision medicine - where data from patients, their genes, their lifestyles and the available treatments and their combination are taken into account for finding a suitable treatment - requires searching the biomedical literature and other resources such as clinical trials with the patients' information. The retrieved information could then be used in curating data for clinicians for decision-making. We present information retrieval researchers with an on-line system which enables experimentation in search for precision medicine within the framework provided by the TREC Precision Medicine (PM) track. A number of query and document processing and ranking approaches are provided. These include some ofthe most promising gene mention expansion methods, as well as learning-to-rank using neural networks.

Proceedings ArticleDOI
01 Aug 2019
TL;DR: The authors compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification, and personal health mention classification, showing that context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology.
Abstract: Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.

Posted Content
TL;DR: Context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology for statistical classifiers trained for three classification problems: influenza infection classification, drug usage classification and personal health mention classification.
Abstract: Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.


Proceedings Article
09 Sep 2019
TL;DR: A Convolutional Neural Network based multi-label image classifier to predict relevant concepts present in medical images and achieves an F1-score of 0.1435 on the held-out test set of the 2019 ImageCLEFmed Caption Task.
Abstract: We describe our concept detection system submitted for the ImageCLEFmed Caption task, part of the ImageCLEF 2019 challenge. The advancements in imaging technologies has improved the ability of clinicians to detect, diagnose, and treat diseases. Radiologists routinely interpret medical images and summarise their findings in the form of radiology reports. The mapping of visual information present in medical images to the condensed textual description is a tedious, time-consuming, expensive, and error-prone task. The development of methods that can automatically detect the presence and location of medical concepts in medical images can improve the efficiency of radiologists, reduce the burden of manual interpretation, and also help in reducing diagnostic errors. We propose a Convolutional Neural Network based multi-label image classifier to predict relevant concepts present in medical images. The proposed method achieved an F1-score of 0.1435 on the held-out test set of the 2019 ImageCLEFmed Caption Task. We present our proposed system with data analysis, experimental results, comparison, and discussion.

01 Jan 2019
TL;DR: While the observations agree with the promise of MTL as compared to single-task learning, for health informatics, it is shown that the benefit also comes with caveats in terms of the choice of shared layers and the relatedness between the participating tasks.
Abstract: Multi-Task Learning (MTL) has been an attractive approach to deal with limited labeled datasets or leverage related tasks, for a variety of NLP problems. We examine the benefit of MTL for three specific pairs of health informatics tasks that deal with: (a) overlapping symptoms for the same classification problem (personal health mention classification for influenza and for a set of symptoms); (b) overlapping medical concepts for related classification problems (vaccine usage and drug usage detection); and, (c) related classification problems (vaccination intent and vaccination relevance detection). We experiment with a simple neural architecture: a shared layer followed by task-specific dense layers. The novelty of this work is that it compares alternatives for shared layers for these pairs of tasks. While our observations agree with the promise of MTL as compared to single-task learning, for health informatics, we show that the benefit also comes with caveats in terms of the choice of shared layers and the relatedness between the participating tasks.

01 Jan 2019
TL;DR: This work investigates the effects of a specialised in- domain vocabulary trained from scratch on a biomedical corpus, and suggests that, although the in-domain vocabulary is useful, it is usually constrained by the corpora size because these models needs to be training from scratch.
Abstract: Transformer-based models have been popular recently and have improved performance for many Natural Language Processing (NLP) Tasks, including those in the biomedical field. Previous research suggests that, when using these models, an in-domain vocabulary is more suitable than using an open-domain vocabulary. We investigate the effects of a specialised in-domain vocabulary trained from scratch on a biomedical corpus. Our research suggests that, although the in-domain vocabulary is useful, it is usually constrained by the corpora size because these models needs to be trained from scratch. Instead, it is more useful to have more data, perform additional pretraining steps with a corpus-specific vocabulary.1

Proceedings ArticleDOI
01 Aug 2019
TL;DR: A system that incorporates open-domain and biomedical domain approaches to improve semantic understanding and ambiguity resolution in the medical domain for the ACL BioNLP 2019 Shared Task, MEDIQA is proposed.
Abstract: We report on our system for textual inference and question entailment in the medical domain for the ACL BioNLP 2019 Shared Task, MEDIQA. Textual inference is the task of finding the semantic relationships between pairs of text. Question entailment involves identifying pairs of questions which have similar semantic content. To improve upon medical natural language inference and question entailment approaches to further medical question answering, we propose a system that incorporates open-domain and biomedical domain approaches to improve semantic understanding and ambiguity resolution. Our models achieve 80% accuracy on medical natural language inference (6.5% absolute improvement over the original baseline), 48.9% accuracy on recognising medical question entailment, 0.248 Spearman’s rho for question answering ranking and 68.6% accuracy for question answering classification.

Book ChapterDOI
12 Aug 2019
TL;DR: In this paper, the authors focus on monitor plans aimed at the early detection of the increase in the frequency of events and explore monitoring TBE when the daily counts are quite high.
Abstract: This paper focuses on monitor plans aimed at the early detection of the increase in the frequency of events. The literature recommends either monitoring the Time Between Events (TBE), if events are rare, or counting the number of events per unit non-overlapping time intervals, if events are not rare. Recent monitoring work has suggested that monitoring counts in preference to TBE is not recommended even when counts are low (less than 10). Monitoring TBE is the real-time option for outbreak detection, because outbreak information is accumulated when an event occurs. This is preferred to waiting for the end of a period to count events if outbreaks are large and occur in a short time frame. If the TBE reduces significantly, then the incidence of these events increases significantly. This paper explores monitoring TBE when the daily counts are quite high. We consider the case when TBEs are Weibull distributed.