scispace - formally typeset
Open AccessProceedings ArticleDOI

A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics

Reads0
Chats0
TLDR
The authors compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification, and personal health mention classification, showing that context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology.
Abstract
Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.

read more

Citations
More filters
Proceedings ArticleDOI

Contextual and Non-Contextual Word Embeddings: an in-depth Linguistic Investigation.

TL;DR: It is shown that, although BERT is capable of understanding the full context of each word in an input sequence, the implicit knowledge encoded in its aggregated sentence representations is still comparable to that of a contextual-independent model.
Journal ArticleDOI

Survey of Text-based Epidemic Intelligence: A Computational Linguistics Perspective

TL;DR: This survey discusses approaches for epidemic intelligence that use textual datasets, referring to it as “text-based epidemic intelligence,” view past work in terms of two broad categories: health mention classification and health event detection.

To BERT or not to BERT - Comparing Contextual Embeddings in a Deep Learning Architecture for the Automatic Recognition of four Types of Speech, Thought and Writing Representation.

TL;DR: An evaluation of the recognizers for four very different types of speech, thought and writing representation (STWR) for German texts based on deep learning with two different customized contextual embeddings, namely FLAIR and BERT.
Book ChapterDOI

End-to-End Fine-Grained Neural Entity Recognition of Patients, Interventions, Outcomes.

TL;DR: This paper used multitask learning (MTL) to improve fine-grained PICO recognition using a related auxiliary task and compared it with single-task learning (STL) and achieved state-of-the-art performance.
Journal ArticleDOI

Aggregation levels when the time between events is Weibull distributed

TL;DR: In this paper, the authors present an aggregation process that is best for early detection of any outbreak of events including sales, warrantee claims or disease outbreaks, such as hurricanes and floods.
References
More filters
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal Article

LIBLINEAR: A Library for Large Linear Classification

TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.
Proceedings ArticleDOI

Deep contextualized word representations

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Related Papers (5)