scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The central role of metadata in a science data literacy course

20 Oct 2010-Journal of Library Metadata (Taylor & Francis Group)-Vol. 10, pp 188-204
TL;DR: An NSF-funded project at the Syracuse University School of Information Studies examined this changing environment and directed specific attention to digital data management practices and metadata proved to have a central role in how scientists operate in the e-science information environment.
Abstract: Science research is increasingly computer and network enabled and referred to as e-science. The change has had an impact on the information environment in which scientists across disciplines operate to conduct their research. This paper reports on an NSF-funded project at the Syracuse University School of Information Studies (2007–2009) that examined this changing environment and directed specific attention to digital data management practices. A local faculty survey of data management practices and attitudes was conducted, as was a scan of related courses at peer institutions. Knowledge about data management in e-science was used to design a new course addressing data-related literacy for science students and teach them skills for managing data created as part of the scientific research process. Throughout the project, metadata proved to have a central role in how scientists operate in the e-science information environment and to be a key component of data literacy in the e-science environment.
Citations
More filters
Posted Content
TL;DR: Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
Abstract: This is a thought piece on data-intensive science requirements for databases and science centers. It argues that peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. Next-generation science instruments and simulations will generate these peta-scale datasets. The need to publish and share data and the need for generic analysis and visualization tools will finally create a convergence on common metadata standards. Database systems will be judged by their support of these metadata standards and by their ability to manage and access peta-scale datasets. The procedural stream-of-bytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Non-procedural query and analysis of schematized self-describing data is both easier to use and allows much more parallelism.

476 citations

Journal ArticleDOI
TL;DR: The need for a data information literacy program (DIL) to prepare students to engage in such an "e-research" environment is articulated.
Abstract: Researchers increasingly need to integrate the disposition, management, and curation of their data into their current workflows. However, it is not yet clear to what extent faculty and students are sufficiently prepared to take on these responsibilities. This paper articulates the need for a data information literacy program (DIL) to prepare students to engage in such an "e-research" environment. Assessments of faculty interviews and student performance in a geoinformatics course provide complementary sources of information, which are then filtered through the perspective of ACRL's information literacy competency standards to produce a draft set of outcomes for a data information literacy program.

242 citations

Journal ArticleDOI
01 Jun 2013-Libri
TL;DR: The present paper aims to contribute to the advancement of data literacy with the proposal of a set of core competencies and contents that can serve as a framework of reference for its inclusion in libraries’ information literacy programs.
Abstract: Abstract The growing importance of data in society in general and scientific domains in particular, mirrored in the Open Data initiative and in the advent of eScience, requires public, school and academic libraries to contribute to both data and information literacy, as part of their mission to further knowledge and innovation in their respective fields of action. No specific library standards have been proposed to date, however, and most research studies conducted adopt a partial view of data literacy, stressing only the components needed in any given context. The present paper aims to contribute to the advancement of data literacy with the proposal of a set of core competencies and contents that can serve as a framework of reference for its inclusion in libraries’ information literacy programs. The various definitions of data literacy are discussed, the coverage of the competencies listed in information literacy standards is described, and the competencies considered in the experiments conducted to date in education and libraries are identified. The conclusion drawn is that the model proposed can favour the development of data literacy support resources and services. Topics for further research are also specified.

213 citations


Cites background or result from "The central role of metadata in a s..."

  • ...Qin and D’Ignazio (2010b), for instance, stress the importance of scientific data management in future science workforce training and propose a term, science data literacy (SDL), defined to be “the ability to understand, use, and manage science data,” which entails “skills in collecting, processing, managing, evaluating, and using data for scientific inquiry....

    [...]

  • ...Qin and D’Ignazio (2010b), for instance, stress the importance of scientific data management in future science workforce training and propose a term, science data literacy (SDL), defined to be “the ability to understand, use, and manage science data,” which entails “skills in collecting,…...

    [...]

  • ...Along lines similar to those drawn by Qin and D’Ignazio (2010b) and Carlson et al. (2011), the Lamar Soutter Library, University of Massachusetts Medical School and the George C. Gordon Library, Worcester Polytechnic Institute, developed a set of Frameworks for a Data Management Curriculum (2012)…...

    [...]

  • ...Qin and D’Ignazio (2010b), for instance, stress the importance of scientific data management in future science workforce training and propose a term, science data literacy (SDL), defined to be “the ability to understand, use, and manage science data,” which entails “skills in collecting, processing, managing, evaluating, and using data for scientific inquiry.” Their Science Data Management undergraduate course at Syracuse University, designed from input provided by a group of STEM teaching staff, consists of the following modules and topics (2010a): – Fundamentals of science data and data management: science data life cycle, databases, types of data, data sets description and data management – Managing data in aggregation: data collections, data and users, organizational planning – Broader issues in science data management: archiving practices, data curation, enabling technologies, data presentation, and data sharing....

    [...]

References
More filters
Posted Content
TL;DR: Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
Abstract: This is a thought piece on data-intensive science requirements for databases and science centers. It argues that peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. Next-generation science instruments and simulations will generate these peta-scale datasets. The need to publish and share data and the need for generic analysis and visualization tools will finally create a convergence on common metadata standards. Database systems will be judged by their support of these metadata standards and by their ability to manage and access peta-scale datasets. The procedural stream-of-bytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Non-procedural query and analysis of schematized self-describing data is both easier to use and allows much more parallelism.

476 citations


"The central role of metadata in a s..." refers background in this paper

  • ...One example on the data level is data formats (Gray et al., 2005)....

    [...]

Journal ArticleDOI
TL;DR: Based on a comprehensive survey of lineage research and previous prototypes, a metamodel is presented to help identify and assess the basic components of systems that provide lineage retrieval for scientific data products.
Abstract: Scientific research relies as much on the dissemination and exchange of data sets as on the publication of conclusions Accurately tracking the lineage (origin and subsequent processing history) of scientific data sets is thus imperative for the complete documentation of scientific work Researchers are effectively prevented from determining, preserving, or providing the lineage of the computational data products they use and create, however, because of the lack of a definitive model for lineage retrieval and a poor fit between current data management tools and scientific software Based on a comprehensive survey of lineage research and previous prototypes, we present a metamodel to help identify and assess the basic components of systems that provide lineage retrieval for scientific data products

447 citations


"The central role of metadata in a s..." refers background in this paper

  • ...…widely accepted in the scientific research community, where applicable, in which the lowest level (level 0) is defined as “reconstructed unprocessed instrument data at full resolutions” and the highest level (4) is the “model output or results from analyses of lower-level data” (Bose & Frew, 2005)....

    [...]

Journal ArticleDOI
01 Dec 2005
TL;DR: In this article, the authors propose algorithms that can simultaneously deal with huge datasets and that can find very subtle effects, finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.
Abstract: Scientific instruments and computer simulations are creating vast data stores that require new scientific methods to analyze and organize the data. Data volumes are approximately doubling each year. Since these new instruments have extraordinary precision, the data quality is also rapidly improving. Analyzing this data to find the subtle effects missed by previous studies requires algorithms that can simultaneously deal with huge datasets and that can find very subtle effects --- finding both needles in the haystack and finding very small haystacks that were undetected in previous measurements.

432 citations

Book ChapterDOI
01 Jan 2007
TL;DR: This chapter uses LIGO as an application case study in workflow design and implementation and outlines a few directions for future development and provides some long-term vision for applications related to gravitational wave data analysis.
Abstract: Modern scientific experiments acquire large amounts of data that must be analyzed in subtle and complicated ways to extract the best results. The Laser Interferometer Gravitational Wave Observatory (LIGO) is an ambitious effort to detect gravitational waves produced by violent events in the universe, such as the collision of two black holes or the explosion of supernovae [37,258]. The experiment records approximately 1 TB of data per day, which is analyzed by scientists in a collaboration that spans four continents. LIGO and distributed computing have grown up side by side over the past decade, and the analysis strategies adopted by LIGO scientists have been strongly influenced by the increasing power of tools to manage distributed computing resources and the workflows to run on them. In this chapter, we use LIGO as an application case study in workflow design and implementation. The software architecture outlined here has been used with great efficacy to analyze LIGO data [2–5] using dedicated computing facilities operated by the LIGO Scientific Collaboration, the LIGO Data Grid. It is just the first step, however. Workflow design and implementation lies at the interface between computing and traditional scientific activities. In the conclusion, we outline a few directions for future development and provide some long-term vision for applications related to gravitational wave data analysis.

148 citations


"The central role of metadata in a s..." refers background in this paper

  • ...This type of metadata can alert researchers to data quality or usability issues related to types of instruments used and issues in performance or calibration (Brown & Brady, 2007)....

    [...]

Journal ArticleDOI
16 Aug 2005
TL;DR: In this article, the evaluation of information is a key element in information literacy, statistical literacy and data literacy, and more attention is needed on how these three literacies relate and how they may be taught synergistically.
Abstract: Introduction The evaluation of information is a key element in information literacy, statistical literacy and data literacy. As such, all three literacies are inter-related. It is difficult to promote information literacy or data literacy without promoting statistical literacy. While their relative importance varies with one’s perspective, these three literacies are united in dealing with similar problems that face students in college. More attention is needed on how these three literacies relate and how they may be taught synergistically. All librarians are interested in information literacy; archivists and data librarians are interested in data literacy. Both should consider teaching statistical literacy as a service to students who need to critically evaluate information in arguments.

144 citations


"The central role of metadata in a s..." refers background in this paper

  • ...As Schield points out, data literacy should include understanding a wide variety of tools for accessing, converting, and manipulating data (Schield, 2004)....

    [...]