Automating data extraction in systematic reviews: a systematic review

doi:10.1186/S13643-015-0066-7

Open AccessJournal ArticleDOI

Automating data extraction in systematic reviews: a systematic review

Siddhartha Jonnalagadda, +2 more

- 15 Jun 2015 -

Systematic Reviews

- Vol. 4, Iss: 1, pp 78-78

Chats0

TLDR

A systematic review of published and unpublished methods to automate data extraction for systematic reviews found no unified information extraction framework tailored to the systematic review process and published reports focused on a limited number of data elements.

Abstract:

Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews. We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports. Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %. We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.

Automating data extraction in systematic reviews: a systematic review

Citations

PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews.

Guidelines for including grey literature and conducting multivocal literature reviews in software engineering

Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry

Toward systematic review automation: a practical guide to using machine learning tools in research synthesis

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Cochrane Handbook for Systematic Reviews of Interventions

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Cochrane handbook for systematic reviews of interventions. Version 5.1.0 [updated March 2011]. The Cochrane Collaboration

Related Papers (5)

Using text mining for study identification in systematic reviews: a systematic review of current approaches

Systematic review automation technologies

Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry

Reducing workload in systematic review preparation using automated citation classification.

Semi-automated screening of biomedical citations for systematic reviews