scispace - formally typeset
Open AccessJournal ArticleDOI

A Comparative Analysis of Active Learning for Biomedical Text Mining

Reads0
Chats0
TLDR
Experiments show that AL has the potential to significantly reducing the cost of manual labelling, and AL-assisted pre-annotations accelerates the de novo annotation process with less annotation time required.
Abstract
An enormous amount of clinical free-text information, such as pathology reports, progress reports, clinical notes and discharge summaries have been collected at hospitals and medical care clinics. These data provide an opportunity of developing many useful machine learning applications if the data could be transferred into a learn-able structure with appropriate labels for supervised learning. The annotation of this data has to be performed by qualified clinical experts, hence, limiting the use of this data due to the high cost of annotation. An underutilised technique of machine learning that can label new data called active learning (AL) is a promising candidate to address the high cost of the label the data. AL has been successfully applied to labelling speech recognition and text classification, however, there is a lack of literature investigating its use for clinical purposes. We performed a comparative investigation of various AL techniques using ML and deep learning (DL)-based strategies on three unique biomedical datasets. We investigated random sampling (RS), least confidence (LC), informative diversity and density (IDD), margin and maximum representativeness-diversity (MRD) AL query strategies. Our experiments show that AL has the potential to significantly reducing the cost of manual labelling. Furthermore, pre-labelling performed using AL expediates the labelling process by reducing the time required for labelling.

read more

Citations
More filters
Posted Content

A Comprehensive Survey on Word Representation Models: From Classical to State-Of-The-Art Word Representation Language Models

TL;DR: A variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs are described, which can transform large volumes of text into effective vector representations capturing the same semantic information.
Journal ArticleDOI

A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models

TL;DR: For a survey of word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS), see as mentioned in this paper.
Journal ArticleDOI

A Novel Approach of Transcriptomic microRNA Analysis Using Text Mining Methods: An Early Detection of Multiple Sclerosis Disease

TL;DR: In this article, the authors presented a complete predictive model by combining consecutive transcriptomic data preprocessing procedures, followed by the proposed KmerFIDF method as a feature extraction method and linear discriminant analysis for dimensionality reduction.
Proceedings ArticleDOI

A Novel Approach for Implementing Conventional LBIST by High Execution Microprocessors

TL;DR: In this article , lower built-in self-test (LBIS T) mechanism is used to design a microprocessor and the proposed methodology is giving performance measure like power efficiency 97.5% and area had been attained.
References
More filters
Journal ArticleDOI

What makes a gene name? Named entity recognition in the biomedical literature.

TL;DR: The problems and resources in NER research are described, the principal algorithms underlying most systems sketched, and the current state-of-the-art in the field surveyed are surveyed.
Journal ArticleDOI

Transformer based Deep Intelligent Contextual Embedding for Twitter sentiment analysis

TL;DR: This paper presents D I C E T, a transformer-based method for sentiment analysis that encodes representation from a transformer and applies deep intelligent contextual embedding to enhance the quality of tweets by removing noise while taking word sentiments, polysemy, syntax, and semantic knowledge into account.
Journal ArticleDOI

COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis

TL;DR: This study presents a new large-scale sentiment data set COVIDSENTI, which consists of 90 000 COVID-19-related tweets collected in the early stages of the pandemic, from February to March 2020 and supports the view that there is a need to develop a proactive and agile public health presence to combat the spread of negative sentiment on social media following a pandemic.
Journal ArticleDOI

Progress in medical information management. Systematized nomenclature of medicine (SNOMED).

Roger A. Côté, +1 more
- 22 Feb 1980 - 
TL;DR: The Systematized Nomenclature of Medicine (SNOMED) as discussed by the authors is a system for the classification of medical information that uses a standardized language for disease classification and schemas for the naming of signs, symptoms, diseases, and procedures.
Related Papers (5)