scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Data Sharing by Scientists: Practices and Perceptions

29 Jun 2011-PLOS ONE (Public Library of Science)-Vol. 6, Iss: 6, pp 1-21
TL;DR: Large scale programs, such as the NSF-sponsored DataNET will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.
Abstract: Background: Scientific research in the 21st century is more data intensive and collaborative than in the past. It is important to study the data practices of researchers – data accessibility, discovery, re-use, preservation and, particularly, data sharing. Data sharing is a valuable part of the scientific method allowing for verification of results and extending research from prior results. Methodology/Principal Findings: A total of 1329 scientists participated in this survey exploring current data sharing practices and perceptions of the barriers and enablers of data sharing. Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation. Many organizations do not provide support to their researchers for data management both in the shortand long-term. If certain conditions are met (such as formal citation and sharing reprints) respondents agree they are willing to share their data. There are also significant differences and approaches in data management practices based on primary funding agency, subject discipline, age, work focus, and world region. Conclusions/Significance: Barriers to effective data sharing and preservation are deeply rooted in the practices and culture of the research process as well as the researchers themselves. New mandates for data management plans from NSF and other federal agencies and world-wide attention to the need to share and preserve data could lead to changes. Large scale programs, such as the NSF-sponsored DataNET (including projects like DataONE) will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
01 Oct 2013-PeerJ
TL;DR: There is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data, and a robust citation benefit from open data is found, although a smaller one than previously reported.
Abstract: Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

423 citations


Cites background from "Data Sharing by Scientists: Practic..."

  • ...Scientists report that receiving additional citations is an important motivator for publicly archiving their data (Tenopir et al., 2011)....

    [...]

Journal ArticleDOI
10 Jan 2013-Nature
TL;DR: A new funding policy by the US National Science Foundation represents a sea-change in how researchers are evaluated, says Heather Piwowar.
Abstract: A new funding policy by the US National Science Foundation represents a sea-change in how researchers are evaluated, says Heather Piwowar.

383 citations

Journal ArticleDOI
TL;DR: This review provides an extensive account of the state of the art in both scholarly use of social media and altmetrics, reviewing the various functions these platforms have in the scholarly communication process and the factors that affect this use.
Abstract: Social media has become integrated into the fabric of the scholarly communication system in fundamental ways, principally through scholarly use of social media platforms and the promotion of new indicators on the basis of interactions with these platforms. Research and scholarship in this area has accelerated since the coining and subsequent advocacy for altmetrics—that is, research indicators based on social media activity. This review provides an extensive account of the state-of-the art in both scholarly use of social media and altmetrics. The review consists of 2 main parts: the first examines the use of social media in academia, reviewing the various functions these platforms have in the scholarly communication process and the factors that affect this use. The second part reviews empirical studies of altmetrics, discussing the various interpretations of altmetrics, data collection and methodological limitations, and differences according to platform. The review ends with a critical discussion of the implications of this transformation in the scholarly communication system.

380 citations


Cites background from "Data Sharing by Scientists: Practic..."

  • ...Data sharing has become a requirement of several funders and journals (Piwowar & Chapman, 2010; Tenopir et al., 2011) on the basis of enhanced verifiability and replicability in science....

    [...]

Journal ArticleDOI
23 Jul 2013-PLOS ONE
TL;DR: It is found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data.
Abstract: Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are research domains in which data tend to be local in character, minimally structured, and minimally documented. We report on a ten-year study of the Center for Embedded Network Sensing (CENS), a National Science Foundation Science and Technology Center. We found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data. Few repositories exist to accept data in CENS research areas.. Data sharing tends to occur only through interpersonal exchanges. CENS researchers obtain data from repositories, and occasionally from registries and individuals, to provide context, calibration, or other forms of background for their studies. Neither CENS researchers nor those who request access to CENS data appear to use external data for primary research questions or for replication of studies. CENS researchers are willing to share data if they receive credit and retain first rights to publish their results. Practices of releasing, sharing, and reusing of data in CENS reaffirm the gift culture of scholarship, in which goods are bartered between trusted colleagues rather than treated as commodities.

349 citations


Cites background from "Data Sharing by Scientists: Practic..."

  • ...When asked in hypothetical terms whether they are willing to share their data, most researchers say they will share or that they do share [17,24,59,56]....

    [...]

  • ...[24] study, only 36% of respondents agree that their data are easy to access, although ‘‘easy access’’ is undefined....

    [...]

  • ...Tenopir et al, [24] studied general trends in data sharing by conducting an online survey of scientists....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the benefits of long-term ecological studies are discussed and five key values of such studies are briefly discussed, including quantifying ecological responses to drivers of ecosystem change, understanding complex ecosystem processes that occur over prolonged periods, providing core ecological data that may be used to develop theoretical ecological models and to parameterize and validate simulation models, acting as platforms for collaborative studies, thus promoting multidisciplinary research, and providing data and understanding at scales relevant to management, and hence critically supporting evidence-based policy, decision making and the management of ecosystems.
Abstract: Long-term ecological studies are critical for providing key insights in ecology, environmental change, natural resource management and biodiversity conservation. In this paper, we briefly discuss five key values of such studies. These are: (1) quantifying ecological responses to drivers of ecosystem change; (2) understanding complex ecosystem processes that occur over prolonged periods; (3) providing core ecological data that may be used to develop theoretical ecological models and to parameterize and validate simulation models; (4) acting as platforms for collaborative studies, thus promoting multidisciplinary research; and (5) providing data and understanding at scales relevant to management, and hence critically supporting evidence-based policy, decision making and the management of ecosystems. We suggest that the ecological research community needs to put higher priority on communicating the benefits of long-term ecological studies to resource managers, policy makers and the general public. Long-term research will be especially important for tackling large-scale emerging problems confronting humanity such as resource management for a rapidly increasing human population, mass species extinction, and climate change detection, mitigation and adaptation. While some ecologically relevant, long-term data sets are now becoming more generally available, these are exceptions. This deficiency occurs because ecological studies can be difficult to maintain for long periods as they exceed the length of government administrations and funding cycles. We argue that the ecological research community will need to coordinate ongoing efforts in an open and collaborative way, to ensure that discoverable long-term ecological studies do not become a long-term deficiency. It is important to maintain publishing outlets for empirical field-based ecology, while simultaneously developing new systems of recognition that reward ecologists for the use and collaborative sharing of their long-term data sets. Funding schemes must be re-crafted to emphasize collaborative partnerships between field-based ecologists, theoreticians and modellers, and to provide financial support that is committed over commensurate time frames.

328 citations

References
More filters
Book
Tony Hey1
16 Oct 2009
TL;DR: This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration.
Abstract: This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive science will be used to illustrate the explosion of data and the associated new challenges for data capture, curation, analysis, and sharing. The role of cloud computing, collaboration services, and research repositories will be discussed.

2,171 citations


"Data Sharing by Scientists: Practic..." refers methods in this paper

  • ...Following the previous research paradigms (experimental, theoretical, and computational), this new era has been called ‘‘the fourth paradigm: data-intensive scientific discovery’’ where ‘‘all of the science literature is online, all of the science data is online, and they interoperate with each other’’ [3]....

    [...]

Journal ArticleDOI
23 Jan 2002-JAMA
TL;DR: Data withholding occurs in academic genetics and it affects essential scientific activities such as the ability to confirm published results.
Abstract: ContextThe free and open sharing of information, data, and materials regarding published research is vital to the replication of published results, the efficient advancement of science, and the education of students. Yet in daily practice, the ideal of free sharing is often breached.ObjectiveTo understand the nature, extent, and consequences of data withholding in academic genetics.Design, Setting, and ParticipantsMailed survey (March-July 2000) of geneticists and other life scientists in the 100 US universities that received the most funding from the National Institutes of Health in 1998. Of a potential 3000 respondents, 2893 were eligible and 1849 responded, yielding an overall response rate of 64%. We analyzed a subsample of 1240 self-identified geneticists and made a limited number of comparisons with 600 self-identified nongeneticists.Main Outcome MeasuresPercentage of faculty who made requests for data that were denied; percentage of respondents who denied requests; influences on and consequences of withholding data; and changes over time in perceived willingness to share data.ResultsForty-seven percent of geneticists who asked other faculty for additional information, data, or materials regarding published research reported that at least 1 of their requests had been denied in the preceding 3 years. Ten percent of all postpublication requests for additional information were denied. Because they were denied access to data, 28% of geneticists reported that they had been unable to confirm published research. Twelve percent said that in the previous 3 years, they had denied another academician's request for data concerning published results. Among geneticists who said they had intentionally withheld data regarding their published work, 80% reported that it required too much effort to produce the materials or information; 64%, that they were protecting the ability of a graduate student, postdoctoral fellow, or junior faculty member to publish; and 53%, that they were protecting their own ability to publish. Thirty-five percent of geneticists said that sharing had decreased during the last decade; 14%, that sharing had increased. Geneticists were as likely as other life scientists to deny others' requests (odds ratio [OR], 1.39; 95% confidence interval [CI], 0.81-2.40) and to have their own requests denied (OR, 0.97; 95% CI, 0.69-1.40). However, other life scientists were less likely to report that withholding had a negative impact on their own research as well as their field of research.ConclusionsData withholding occurs in academic genetics and it affects essential scientific activities such as the ability to confirm published results. Lack of resources and issues of scientific priority may play an important role in scientists' decisions to withhold data, materials, and information from other academic geneticists.

372 citations


"Data Sharing by Scientists: Practic..." refers background or result in this paper

  • ...These results do not include other data practices which may also negatively affect the progress of science, such as significant delays in the fulfillment of requests, refusals to publicly present research findings, and the failure to discuss research with others [16]....

    [...]

  • ...from other researchers in the field were denied [16]....

    [...]

Journal ArticleDOI
18 Sep 2009-PLOS ONE
TL;DR: It is suggested that journal policies requiring data sharing do not lead to authors making their data sets available to independent investigators, as only one of ten raw data sets requested was received.
Abstract: Background Many journals now require authors share their data with other investigators, either by depositing the data in a public repository or making it freely available upon request. These policies are explicit, but remain largely untested. We sought to determine how well authors comply with such policies by requesting data from authors who had published in one of two journals with clear data sharing policies. Methods and Findings We requested data from ten investigators who had published in either PLoS Medicine or PLoS Clinical Trials. All responses were carefully documented. In the event that we were refused data, we reminded authors of the journal's data sharing guidelines. If we did not receive a response to our initial request, a second request was made. Following the ten requests for raw data, three investigators did not respond, four authors responded and refused to share their data, two email addresses were no longer valid, and one author requested further details. A reminder of PLoS's explicit requirement that authors share data did not change the reply from the four authors who initially refused. Only one author sent an original data set. Conclusions We received only one of ten raw data sets requested. This suggests that journal policies requiring data sharing do not lead to authors making their data sets available to independent investigators.

339 citations


"Data Sharing by Scientists: Practic..." refers background in this paper

  • ...Only one author sent an original dataset [14]....

    [...]

  • ...Savage and Vickers noted reasons that include concerns about patient privacy (for medical fields), concerns about future publishing opportunities, and the desire to retain exclusive rights to data that had taken many years to produce [14]....

    [...]

Journal ArticleDOI
TL;DR: The authors present their research findings, based closely on their report to OECD, on key issues in data access, as well as operating principles and management aspects necessary to successful data access regimes.
Abstract: Access to and sharing of data are essential for the conduct and advancement of science. This article argues that publicly funded research data should be openly available to the maximum extent possible. To seize upon advancements of cyberinfrastructure and the explosion of data in a range of scientific disciplines, this access to and sharing of publicly funded data must be advanced within an international framework, beyond technological solutions. The authors, members of an OECD Follow-up Group, present their research findings, based closely on their report to OECD, on key issues in data access, as well as operating principles and management aspects necessary to successful data access regimes.

274 citations


"Data Sharing by Scientists: Practic..." refers background in this paper

  • ...Several previous surveys have explored the benefits and barriers of sharing data [13] and the extent to which researchers share or withhold data....

    [...]

Journal ArticleDOI
09 Sep 2009-Nature
TL;DR: Bryn Nelson investigates why many researchers choose not to share data, and why open access to data is the scientific ideal.
Abstract: Most researchers agree that open access to data is the scientific ideal, so what is stopping it happening? Bryn Nelson investigates why many researchers choose not to share.

238 citations

Related Papers (5)