scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Transformer-Based Abstract Generation of Medical Case Reports

TL;DR: A deep learning methodology for the generation of the automatic summaries of the medical case reports is presented and a proposed fine-tuned summarizer on the test data set generated a mean precision and Rouge-1 Score of 0.2803.
Abstract: A medical case report gives medical researchers and healthcare providers a thorough account of the symptoms, treatment, and diagnosis of a specific patient. This clinical data is essential because they aid in diagnosing novel or uncommon illnesses, analyzing specific medical occurrences, and enhancing knowledge of current medical education. The summary of the medical case report is needed so that one can decide on further reading as going through the entire contents of a medical case report istime-consuming. In this paper, we present a deep learning methodology for the generation of the automatic summaries of the medical case reports. The final proposed fine-tuned summarizer on the test data set generated a mean precision of 0.4481 and Rouge-1 Score of 0.2803.
References
More filters
Proceedings ArticleDOI
01 Jul 2020
TL;DR: BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.
Abstract: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and other recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 3.5 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also replicate other pretraining schemes within the BART framework, to understand their effect on end-task performance.

4,505 citations

Journal ArticleDOI
TL;DR: The paper discusses thoroughly the promising paths for future research in medical documents summarization, including the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applications.

201 citations

Proceedings ArticleDOI
26 Mar 2018
TL;DR: The authors presented a new dataset for machine comprehension in the medical domain using clinical case reports with around 100,000 gap-filling queries about these cases and applied several baselines and state-of-the-art neural readers to the dataset, and observed a considerable gap in performance (20% F1) between the best human and machine readers.
Abstract: We present a new dataset for machine comprehension in the medical domain. Our dataset uses clinical case reports with around 100,000 gap-filling queries about these cases. We apply several baselines and state-of-the-art neural readers to the dataset, and observe a considerable gap in performance (20% F1) between the best human and machine readers. We analyze the skills required for successful answering and show how reader performance varies depending on the applicable skills. We find that inferences using domain knowledge and object tracking are the most frequently required skills, and that recognizing omitted information and spatio-temporal reasoning are the most difficult for the machines.

68 citations

Posted Content
TL;DR: This work releases MSˆ2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20K summaries derived from the scientific literature that facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.
Abstract: To assess the effectiveness of any medical intervention, researchers must conduct a time-intensive and highly manual literature review. NLP systems can help to automate or assist in parts of this expensive process. In support of this goal, we release MS^2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature. This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain. We experiment with a summarization system based on BART, with promising early results. We formulate our summarization inputs and targets in both free text and structured forms and modify a recently proposed metric to assess the quality of our system's generated summaries. Data and models are available at this https URL

49 citations

Journal ArticleDOI
TL;DR: A survey of the recent work in medical documents summarization can be found in this article, where the authors present a general background on summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques.
Abstract: Objective: The aim of this paper is to survey the recent work in medical documents summarization. Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc. Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics. Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applications

16 citations