Data reuse and the open data citation advantage
Reads0
Chats0
TLDR
There is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data, and a robust citation benefit from open data is found, although a smaller one than previously reported.Abstract:
Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets.
Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties.
Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.read more
Citations
More filters
Journal ArticleDOI
ImmPort: disseminating data to the public for the future of immunology
Sanchita Bhattacharya,Sandra Andorf,Linda Gomes,Patrick Dunn,Henry Schaefer,Joan Pontius,Patty Berger,Vince Desborough,Thomas Smith,John D.M. Campbell,Elizabeth Thomson,Ruth Monteiro,Patricia Guimaraes,Bryan Walters,Jeffrey Wiser,Atul J. Butte,Atul J. Butte +16 more
TL;DR: The immunology database and analysis portal (ImmPort) system is the archival repository and dissemination vehicle for clinical and molecular datasets created by research consortia funded by the National Institute of Allergy and Infectious Diseases Division of All allergy, Immunology, and Transplantation.
Journal ArticleDOI
How open science helps researchers succeed.
Erin C. McKiernan,Philip E. Bourne,C. Titus Brown,Stuart Buck,Amye Kenall,Jennifer Lin,Damon McDougall,Brian A. Nosek,Karthik Ram,Courtney K. Soderberg,Jeffrey R. Spies,Jeffrey R. Spies,Kaitlin Thaney,Andrew Updegrove,Kara H. Woo,Tal Yarkoni +15 more
TL;DR: There is evidence that open research practices bring significant benefits to researchers relative to more traditional closed practices, including increases in citations, media attention, potential collaborators, job opportunities and funding opportunities.
Journal ArticleDOI
Global lake responses to climate change
R. Iestyn Woolway,R. Iestyn Woolway,Benjamin M. Kraemer,John D. Lenters,John D. Lenters,John D. Lenters,Christopher J. Merchant,Catherine M. O'Reilly,Sapna Sharma +8 more
TL;DR: A review of physical lake variables and their responses to climate change is presented in this paper, where the authors discuss recent and expected lake responses and look towards future research opportunities in lake monitoring and modelling.
Journal ArticleDOI
Open science challenges, benefits and tips in early career and beyond
TL;DR: Key benefits are described, including reputational gains, increased chances of publication, and a broader increase in the reliability of research, which should benefit both the ECR and the quality of research.
Journal ArticleDOI
Systematic integration of biomedical knowledge prioritizes drugs for repurposing.
Daniel Himmelstein,Antoine Lizee,Christine Hessler,Leo Brueggeman,Leo Brueggeman,Sabrina L Chen,Sabrina L Chen,Dexter Hadley,Ari J. Green,Pouya Khankhanian,Pouya Khankhanian,Sergio E. Baranzini +11 more
TL;DR: In this article, an integrative network encoding knowledge from millions of biomedical studies is used to predict whether a compound treats a disease and improve the economy and success rate of drug approval.
References
More filters
Book
ggplot2: Elegant Graphics for Data Analysis
TL;DR: This book describes ggplot2, a new data visualization package for R that uses the insights from Leland Wilkisons Grammar of Graphics to create a powerful and flexible system for creating data graphics.
BookDOI
Regression Modeling Strategies
TL;DR: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas.
Journal ArticleDOI
Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis
TL;DR: The basic Bayesian framework must be constrained, use of the step function in computing the probability that a team would rank best or worst in a league, and implementation of a Dirichlet process prior are presented.
gplots: Various R Programming Tools for Plotting Data
Gregory R. Warnes,Ben Bolker,L. Bonebakker,Robert Gentleman,Wha Liaw,T. Lumley,M Maechler,A. Magnusson,S. Moeller,Michael L. Schwartz,B Venables,Wolfgang Huber,Andy Liaw,R. Gregory,BB Warnes,Lodewijk Bonebakker,Robert Gentleman,Wolfgang Huber Andy Liaw,Thomas Lumley,Martin Maechler,Arni Magnusson,Steffen Moeller,Marc Schwartz,Bill Venables +23 more
Journal ArticleDOI
The Split-Apply-Combine Strategy for Data Analysis
TL;DR: This paper gives rise to a new R package that allows you to smoothly apply a split-apply-combine strategy, without having to worry about the type of structure in which your data is stored.
Related Papers (5)
Sharing Detailed Research Data Is Associated with Increased Citation Rate
The FAIR Guiding Principles for scientific data management and stewardship
Mark Wilkinson,Michel Dumontier,IJsbrand Jan Aalbersberg,Gabrielle Appleton,Myles Axton,Arie Baak,Niklas Blomberg,Jan-Willem Boiten,Luiz Olavo Bonino da Silva Santos,Philip E. Bourne,Jildau Bouwman,Anthony J. Brookes,Timothy Clark,Mercè Crosas,Ingrid Dillo,Olivier G. Dumon,Scott C. Edmunds,Chris T. Evelo,Richard Finkers,Alejandra Gonzalez-Beltran,Alasdair J. G. Gray,Paul Groth,Carole Goble,Jeffrey S. Grethe,Jaap Heringa,Peter A C 't Hoen,Rob Hooft,Tobias Kuhn,Ruben Kok,Joost N. Kok,Scott J. Lusher,Maryann E. Martone,Albert Mons,Abel L. Packer,Bengt Persson,Philippe Rocca-Serra,Marco Roos,Rene van Schaik,Susanna-Assunta Sansone,Erik Anthony Schultes,Thierry Sengstag,Ted Slater,George Strawn,Morris A. Swertz,Mark Thompson,Johan van der Lei,Erik M. van Mulligen,Jan Velterop,Andra Waagmeester,Peter Wittenburg,Katherine Wolstencroft,Jun Zhao,Barend Mons,Barend Mons +53 more