Author

Christine L Borgman

Bio: Christine L Borgman is an academic researcher. The author has an hindex of 1, co-authored 1 publications receiving 575 citations.

Papers

PDF

Open Access

More filters

科研数据共享的挑战 (The Conundrum of Sharing Research Data)

[...]

Christine L Borgman

01 Jan 2013

TL;DR: Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: to reproduce or to verify research, to make results of publicly funded research available to the public, to enable others to ask new questions of extant data, and to advance the state of research and innovation.

...read moreread less

Abstract: We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much-improved data curation. (Hanson, Sugden, & Alberts) Researchers are producing an unprecedented deluge of data by using new methods and instrumentation. Others may wish to mine these data for new discoveries and innovations. However, research data are not readily available as sharing is common in only a few fields such as astronomy and genomics. Data sharing practices in other fields vary widely. Moreover, research data take many forms, are handled in many ways, using many approaches, and often are difficult to interpret once removed from their initial context. Data sharing is thus a conundrum. Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: (1) to reproduce or to verify research, (2) to make results of publicly funded research available to the public, (3) to enable others to ask new questions of extant data, and (4) to advance the state of research and innovation. These rationales differ by the arguments for sharing, by beneficiaries, and by the motivations and incentives of the many stakeholders involved. The challenges are to understand which data might be shared, by whom, with whom, under what conditions, why, and to what effects. Answers will inform data policy and practice. © 2012 Wiley Periodicals, Inc.

...read moreread less

634 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

YFCC100M: the new data in multimedia research

[...]

Bart Thomee¹, David A. Shamma¹, Gerald Friedland², Benjamin Elizalde², Karl Ni³, Douglas N. Poland³, Damian Borth², Li-Jia Li¹ - Show less +4 more•Institutions (3)

Yahoo!¹, International Computer Science Institute², Lawrence Livermore National Laboratory³

25 Jan 2016-Communications of The ACM

TL;DR: This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.

...read moreread less

Abstract: This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.

...read moreread less

1,157 citations

Journal Article•DOI•

MeDShare: Trust-Less Medical Data Sharing Among Cloud Service Providers via Blockchain

[...]

Qi Xia¹, Emmanuel Boateng Sifah¹, Kwame Omono Asamoah¹, Jianbin Gao¹, Xiaojiang Du², Mohsen Guizani³ - Show less +2 more•Institutions (3)

University of Electronic Science and Technology of China¹, Temple University², University of Idaho³

24 Jul 2017-IEEE Access

TL;DR: The proposed MeDShare system is blockchain-based and provides data provenance, auditing, and control for shared medical data in cloud repositories among big data entities and employs smart contracts and an access control mechanism to effectively track the behavior of the data.

...read moreread less

Abstract: The dissemination of patients’ medical records results in diverse risks to patients’ privacy as malicious activities on these records cause severe damage to the reputation, finances, and so on of all parties related directly or indirectly to the data. Current methods to effectively manage and protect medical records have been proved to be insufficient. In this paper, we propose MeDShare, a system that addresses the issue of medical data sharing among medical big data custodians in a trust-less environment. The system is blockchain-based and provides data provenance, auditing, and control for shared medical data in cloud repositories among big data entities. MeDShare monitors entities that access data for malicious use from a data custodian system. In MeDShare, data transitions and sharing from one entity to the other, along with all actions performed on the MeDShare system, are recorded in a tamper-proof manner. The design employs smart contracts and an access control mechanism to effectively track the behavior of the data and revoke access to offending entities on detection of violation of permissions on data. The performance of MeDShare is comparable to current cutting edge solutions to data sharing among cloud service providers. By implementing MeDShare, cloud service providers and other data guardians will be able to achieve data provenance and auditing while sharing medical data with entities such as research and medical institutions with minimal risk to data privacy.

...read moreread less

819 citations

Posted Content•

Scientific Data Management in the Coming Decade

[...]

Jim Gray¹, David T. Liu², Maria Nieto-Santisteban³, Alexander S. Szalay³, David J. DeWitt, Gerd Heber⁴ - Show less +2 more•Institutions (4)

Microsoft¹, University of California, Berkeley², Johns Hopkins University³, Cornell University⁴

02 Feb 2005-arXiv: Databases

TL;DR: Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.

...read moreread less

Abstract: This is a thought piece on data-intensive science requirements for databases and science centers. It argues that peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. Next-generation science instruments and simulations will generate these peta-scale datasets. The need to publish and share data and the need for generic analysis and visualization tools will finally create a convergence on common metadata standards. Database systems will be judged by their support of these metadata standards and by their ability to manage and access peta-scale datasets. The procedural stream-of-bytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Non-procedural query and analysis of schematized self-describing data is both easier to use and allows much more parallelism.

...read moreread less

476 citations

Posted Content•

The Pushshift Reddit Dataset

[...]

Jason Baumgartner, Savvas Zannettou¹, Brian Keegan², Megan Squire³, Jeremy Blackburn⁴ - Show less +1 more•Institutions (4)

Max Planck Society¹, University of Colorado Boulder², Elon University³, Binghamton University⁴

23 Jan 2020-arXiv: Social and Information Networks

TL;DR: The Pushshift Reddit dataset makes it possible for social media researchers to reduce time spent in the data collection, cleaning, and storage phases of their projects.

...read moreread less

Abstract: Social media data has become crucial to the advancement of scientific understanding. However, even though it has become ubiquitous, just collecting large-scale social media data involves a high degree of engineering skill set and computational resources. In fact, research is often times gated by data engineering problems that must be overcome before analysis can proceed. This has resulted recognition of datasets as meaningful research contributions in and of themselves. Reddit, the so called "front page of the Internet," in particular has been the subject of numerous scientific studies. Although Reddit is relatively open to data acquisition compared to social media platforms like Facebook and Twitter, the technical barriers to acquisition still remain. Thus, Reddit's millions of subreddits, hundreds of millions of users, and hundreds of billions of comments are at the same time relatively accessible, but time consuming to collect and analyze systematically. In this paper, we present the Pushshift Reddit dataset. Pushshift is a social media data collection, analysis, and archiving platform that since 2015 has collected Reddit data and made it available to researchers. Pushshift's Reddit dataset is updated in real-time, and includes historical data back to Reddit's inception. In addition to monthly dumps, Pushshift provides computational tools to aid in searching, aggregating, and performing exploratory analysis on the entirety of the dataset. The Pushshift Reddit dataset makes it possible for social media researchers to reduce time spent in the data collection, cleaning, and storage phases of their projects.

...read moreread less

428 citations

Journal Article•DOI•

If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology.

[...]

Jillian C. Wallis¹, Elizabeth Rolando¹, Christine L. Borgman¹•Institutions (1)

University of California, Los Angeles¹

23 Jul 2013-PLOS ONE

TL;DR: It is found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data.

...read moreread less

Abstract: Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are research domains in which data tend to be local in character, minimally structured, and minimally documented. We report on a ten-year study of the Center for Embedded Network Sensing (CENS), a National Science Foundation Science and Technology Center. We found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data. Few repositories exist to accept data in CENS research areas.. Data sharing tends to occur only through interpersonal exchanges. CENS researchers obtain data from repositories, and occasionally from registries and individuals, to provide context, calibration, or other forms of background for their studies. Neither CENS researchers nor those who request access to CENS data appear to use external data for primary research questions or for replication of studies. CENS researchers are willing to share data if they receive credit and retain first rights to publish their results. Practices of releasing, sharing, and reusing of data in CENS reaffirm the gift culture of scholarship, in which goods are bartered between trusted colleagues rather than treated as commodities.

...read moreread less

349 citations

Collapse