Automating data extraction in systematic reviews: a systematic review
Reads0
Chats0
TLDR
A systematic review of published and unpublished methods to automate data extraction for systematic reviews found no unified information extraction framework tailored to the systematic review process and published reports focused on a limited number of data elements.Abstract:
Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews. We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports. Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %. We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1–7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.read more
Citations
More filters
Journal ArticleDOI
PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews.
Matthew J. Page,David Moher,Patrick M.M. Bossuyt,Isabelle Boutron,Tammy Hoffmann,Cynthia D. Mulrow,Larissa Shamseer,Jennifer Tetzlaff,Elie A. Akl,Sue E. Brennan,Roger Chou,Julie Glanville,Jeremy M. Grimshaw,Asbjørn Hróbjartsson,Manoj M. Lalu,Tianjing Li,Elizabeth Loder,Evan Mayo-Wilson,Steve McDonald,Luke A McGuinness,Lesley A. Stewart,James Thomas,Andrea C. Tricco,Vivian Welch,Penny Whiting,Joanne E. McKenzie +25 more
TL;DR: The preferred reporting items for systematic reviews and meta-analyses (PRISMA 2020) as mentioned in this paper was developed to facilitate transparent and complete reporting of systematic reviews, and has been updated to reflect recent advances in systematic review methodology and terminology.
Journal ArticleDOI
Guidelines for including grey literature and conducting multivocal literature reviews in software engineering
TL;DR: The provided MLR guidelines will support researchers to effectively and efficiently conduct new MLRs in any area of SE and are recommended to utilize in their MLR studies and then share their lessons learned and experiences.
Journal ArticleDOI
Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry
TL;DR: The logistical aspects of recently completed systematic reviews that were registered in the International Prospective Register of Systematic Reviews (PROSPERO) registry are summarized to quantify the time and resources required to complete such projects.
Journal ArticleDOI
Toward systematic review automation: a practical guide to using machine learning tools in research synthesis
TL;DR: An overview of current machine learning methods that have been proposed to expedite evidence synthesis is provided, including which of these are ready for use, their strengths and weaknesses, and how a systematic review team might go about using them in practice.
Proceedings ArticleDOI
A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.
Benjamin E. Nye,Junyi Jessy Li,Roma Patel,Yinfei Yang,Iain J. Marshall,Ani Nenkova,Byron C. Wallace +6 more
TL;DR: This paper present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials, including demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal Article
Cochrane Handbook for Systematic Reviews of Interventions
Proceedings Article
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.