scispace - formally typeset
Search or ask a question

Showing papers on "Data management published in 2022"


Journal ArticleDOI
TL;DR: A prior-dependent graph (PDG) construction method can achieve substantial performance, which can be deployed in edge computing module to provide efficient solutions for massive data management and applications in AIoT.

71 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce multiple perspectives for event detection in the big social data era, and thoroughly investigate and summarize the significant progress in social event detection and visualization techniques, by emphasizing crucial challenges ranging from the management, fusion, and mining of big data, to the applicability of these methods to different platforms, multiple languages and dialects rather than a single language, and with multiple modalities.

20 citations


Journal ArticleDOI
TL;DR: This research proposes a collaborative clustering method where the exchange of raw data is not required and can help to reduce communication costs while ensuring privacy.

20 citations


Proceedings ArticleDOI
10 Jun 2022
TL;DR: A new storage engine, called CompressDB, is developed, which can support data processing for databases without decompression, and utilizes context-free grammar to compress data, and supports both data query and data manipulation.
Abstract: In modern data management systems, directly performing operations on compressed data has been proven to be a big success facing big data problems. These systems have demonstrated significant compression benefits and performance improvement for data analytics applications. However, current systems only focus on data queries, while a complete big data system must support both data query and data manipulation. We develop a new storage engine, called CompressDB, which can support data processing for databases without decompression. CompressDB has the following advantages. First, CompressDB utilizes context-free grammar to compress data, and supports both data query and data manipulation. Second, for adaptability, we integrate CompressDB to file systems so that a wide range of databases can directly use CompressDB without any change. Third, we enable operation pushdown to storage so that we can perform data query and manipulation in storage systems without bringing large data to memory for high efficiency. We validate the efficacy of CompressDB supporting various kinds of database systems, including SQLite, LevelDB, MongoDB, and ClickHouse. We evaluate our method using six real-world datasets with various lengths, structures, and content in both single node and cluster environments. Experiments show that CompressDB achieves 40% throughput improvement and 44% latency reduction, along with 1.81 compression ratio on average.

18 citations


Journal ArticleDOI
TL;DR: In this paper , the authors propose a Service Oriented Architecture approach for integrated management and analysis of multi-omics and biomedical imaging data, and exemplify its applicability for basic biology research and clinical studies.
Abstract: As technical developments in omics and biomedical imaging increase the throughput of data generation in life sciences, the need for information systems capable of managing heterogeneous digital assets is increasing. In particular, systems supporting the findability, accessibility, interoperability, and reusability (FAIR) principles of scientific data management.We propose a Service Oriented Architecture approach for integrated management and analysis of multi-omics and biomedical imaging data. Our architecture introduces an image management system into a FAIR-supporting, web-based platform for omics data management. Interoperable metadata models and middleware components implement the required data management operations. The resulting architecture allows for FAIR management of omics and imaging data, facilitating metadata queries from software applications. The applicability of the proposed architecture is demonstrated using two technical proofs of concept and a use case, aimed at molecular plant biology and clinical liver cancer research, which integrate various imaging and omics modalities.We describe a data management architecture for integrated, FAIR-supporting management of omics and biomedical imaging data, and exemplify its applicability for basic biology research and clinical studies. We anticipate that FAIR data management systems for multi-modal data repositories will play a pivotal role in data-driven research, including studies which leverage advanced machine learning methods, as the joint analysis of omics and imaging data, in conjunction with phenotypic metadata, becomes not only desirable but necessary to derive novel insights into biological processes.

18 citations


Journal ArticleDOI
TL;DR: In this article , the authors propose a Service Oriented Architecture approach for integrated management and analysis of multi-omics and biomedical imaging data, and exemplify its applicability for basic biology research and clinical studies.
Abstract: As technical developments in omics and biomedical imaging increase the throughput of data generation in life sciences, the need for information systems capable of managing heterogeneous digital assets is increasing. In particular, systems supporting the findability, accessibility, interoperability, and reusability (FAIR) principles of scientific data management.We propose a Service Oriented Architecture approach for integrated management and analysis of multi-omics and biomedical imaging data. Our architecture introduces an image management system into a FAIR-supporting, web-based platform for omics data management. Interoperable metadata models and middleware components implement the required data management operations. The resulting architecture allows for FAIR management of omics and imaging data, facilitating metadata queries from software applications. The applicability of the proposed architecture is demonstrated using two technical proofs of concept and a use case, aimed at molecular plant biology and clinical liver cancer research, which integrate various imaging and omics modalities.We describe a data management architecture for integrated, FAIR-supporting management of omics and biomedical imaging data, and exemplify its applicability for basic biology research and clinical studies. We anticipate that FAIR data management systems for multi-modal data repositories will play a pivotal role in data-driven research, including studies which leverage advanced machine learning methods, as the joint analysis of omics and imaging data, in conjunction with phenotypic metadata, becomes not only desirable but necessary to derive novel insights into biological processes.

14 citations


Journal ArticleDOI
TL;DR: In this article , a BigKE-based intelligent bridge management and maintenance framework consisting of the layers of data-sources, storage and computing, knowledge representation, knowledge computing, and knowledge services is proposed.

14 citations


Journal ArticleDOI
TL;DR: This study focuses on limiting third-party engagement in medical health data and improving data security by developing a website where patients and doctors will both benefit because of the use of blockchain technology to ensure the security of medical data.
Abstract: Bangladesh should have owned a decentralized medical record server. We face a lot of issues, such as doctor’s appointments, report organization in one spot, and report follow-ups. People now bring a large number of papers to the doctor’s chamber. They carry prescriptions, reports, and X-ray files, among other things. It complicates everyone’s life as a result. All of the reports must be reviewed by doctors on a regular basis. It is difficult to read old reports on a regular basis, and patients do not receive the correct medications or treatment. Doctors also find it extremely difficult to comprehend handwritten prescriptions. Data security, authenticity, time management, and other areas of data administration are dramatically improved when blockchain (smart contract) technology is linked with standard database management solutions. Blockchain is a groundbreaking, decentralized technology that protects data from unauthorized access. After smart contracts are implemented, the management will be satisfied with the patients. As a result, maintaining data privacy and accountability in the system is tough. It signifies that the information is only accessible to those who have been authenticated. This study focuses on limiting third-party engagement in medical health data and improving data security. Throughout the process, this will improve accessibility and time efficiency. People will feel safer during the payment procedure, which is the most significant benefit. A smart contract and a peer-to-peer encrypted technology were used. The hacker will not be able to gain access to this system since this document uses an immutable ledger. They will not be able to change any of the data if they gain access to the system. If the items are found to be defective, the transaction will be halted. Transaction security will be a viable option for recasting these problems using cryptographic methodologies. We developed a website where patients and doctors will both benefit because of the use of blockchain technology to ensure the security of medical data. We have different profiles for doctors and patients. In the patient profile, they can create their own account by using a unique address, name, and age. This unique address will be created from the genesis block. The unique address is completely private to the owner, who will remain fully secure in our network. After creating an account, the patient can view the doctors’ list and they can upload their medical reports such as prescriptions and X-rays. All the records uploaded by the patient will be stored on our local server (Ganache). The records are stored as hashed strings of the data. Those files will also have a unique address, and it will be shown in the patient profile. After granting access, the doctors will be able to view their records in the respective doctor’s profile. For accessing the options such as uploading, viewing, or editing the data, Ethereum currency (a fee) will have to be paid in order to complete the request. On the other hand, doctors can enter their profile using their name and unique address. After logging in, they can view their name, unique address, and the list of patients that have granted access to the doctor to view their files. On our website, the front end is handled by JavaScript, ReactJS, HTML, and CSS. The backend is handled by Solidity. Storage is handled by Ganache as the local host. Finally, this paper will show how to ensure that the procedure is as safe as feasible. We are also maintaining transparency and efficiency here.

13 citations


Journal ArticleDOI
TL;DR: The authors argue that alternative data are an open-ended placeholder for every data source potentially relevant for investment management purposes and harnessing these disparate data sources requires certain standardization efforts by different market participants.
Abstract: Social media commentary, satellite imagery and GPS data are a part of ‘alternative data’, that is, data that originate outside of the standard repertoire of market data but are considered useful for predicting stock prices, detecting different risk exposures and discovering new price movement indicators. With the availability of sophisticated machine-learning analytics tools, alternative data are gaining traction within the investment management and algorithmic trading industries. Drawing on interviews with people working in investment management and algorithmic trading firms utilizing alternative data, as well as firms providing and sourcing such data, we emphasize social media-based sentiment analytics as one manifestation of how alternative data are deployed for stock price prediction purposes. This demonstrates both how sentiment analytics are developed and subsequently utilized by investment management firms. We argue that ‘alternative data’ are an open-ended placeholder for every data source potentially relevant for investment management purposes and harnessing these disparate data sources requires certain standardization efforts by different market participants. Besides showing how market participants understand and use alternative data, we demonstrate that alternative data often undergo processes of (a) prospecting (i.e. rendering such data amenable to processing with the aid of analytics tools) and (b) assetization (i.e. the transformation of data into tradable assets). We further contend that the widespread embracement of alternative data in investment management and trading encourages a financialization process at the data level which raises new governance issues.

13 citations


Journal ArticleDOI
TL;DR: In this article, a cognitive analytics platform for anomaly detection which is capable to handle, analyze and exploit resourcefully machine data from a shop-floor of factory, so as to support the emerging and growing needs of manufacturing industry is presented.

12 citations


Journal ArticleDOI
01 Feb 2022
TL;DR: The research progress on multisource heterogeneous urban sensor access and data management technologies provide strong support for intelligent perception and scientific management at the city scale and can accelerate the construction of smart cities or digital twin cities with virtual reality features.
Abstract: Urban sensors are an important part of urban infrastructures and are usually heterogeneous. Urban sensors with different uses vary greatly in hardware structure, communication protocols, data formats, interaction modes, sampling frequencies, data accuracy and service quality, thus posing an enormous challenge to the unified integration and sharing of massive sensor information resources. Consequently, access and data management methods for these multisource heterogeneous urban sensors are extremely important. Additionally, multisource heterogeneous urban sensor access and data management technologies provide strong support for intelligent perception and scientific management at the city scale and can accelerate the construction of smart cities or digital twin cities with virtual reality features. We systematically summarize the related research on these technologies. First, we present a summary of the concepts and applications of urban sensors. Then, the research progress on multisource heterogeneous urban sensor access technologies is analysed in relation to communication protocols, data transmission formats, access standards, access technologies and data transmission technologies. Subsequently, the data management technologies for urban sensors are reviewed from the perspectives of data cleaning, data compression, data storage, data indexing and data querying. In addition, the challenges faced by the technologies above and corresponding feasible solutions are discussed from three aspects, namely, the integration of massive Internet of Things (IoT), computational burden and energy consumption and cybersecurity. Finally, a summary of this paper is given, and possible future development directions are analysed and discussed.

Book ChapterDOI
18 Jan 2022
TL;DR: The authors argue that linguists should consider how their research affects not only individual research participants, but also the wider community, taking into account the community's cultural norms and values, and should strive to determine what will be constructive for all those involved in a research encounter.
Abstract: While acknowledging that what constitutes the relevant community is a complex issue, we urge linguists to consider how their research affects not only individual research participants, but also the wider community. In general, linguists should strive to determine what will be constructive for all those involved in a research encounter, taking into account the community’s cultural norms and values.

Journal ArticleDOI
TL;DR: In this article , the authors present a detailed overview of the roles of data warehouses and data lakes in modern enterprise data management, and explain the architecture and design considerations of the current state of the art.
Abstract: Data is the lifeblood of any organization. In today’s world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its performance and services. Major organizations generate, collect and process vast amounts of data, falling under the category of big data. Managing and analyzing the sheer volume and variety of big data is a cumbersome process. At the same time, proper utilization of the vast collection of an organization’s information can generate meaningful insights into business tactics. In this regard, two of the popular data management systems in the area of big data analytics (i.e., data warehouse and data lake) act as platforms to accumulate the big data generated and used by organizations. Although seemingly similar, both of them differ in terms of their characteristics and applications. This article presents a detailed overview of the roles of data warehouses and data lakes in modern enterprise data management. We detail the definitions, characteristics and related works for the respective data management frameworks. Furthermore, we explain the architecture and design considerations of the current state of the art. Finally, we provide a perspective on the challenges and promising research directions for the future.

Journal ArticleDOI
TL;DR: This paper describes Oceans 2.0 and Oceans 3.0, the comprehensive Data Management and Archival System that ONC developed to capture all data and associated metadata into an ever-expanding dynamic database.
Abstract: The advent of large-scale cabled ocean observatories brought about the need to handle large amounts of ocean-based data, continuously recorded at a high sampling rate over many years and made accessible in near-real time to the ocean science community and the public. Ocean Networks Canada (ONC) commenced installing and operating two regional cabled observatories on Canada’s Pacific Coast, VENUS inshore and NEPTUNE offshore in the 2000s, and later expanded to include observatories in the Atlantic and Arctic in the 2010s. The first data streams from the cabled instrument nodes started flowing in February 2006. This paper describes Oceans 2.0 and Oceans 3.0, the comprehensive Data Management and Archival System that ONC developed to capture all data and associated metadata into an ever-expanding dynamic database. Oceans 2.0 was the name for this software system from 2006–2021; in 2022, ONC revised this name to Oceans 3.0, reflecting the system’s many new and planned capabilities aligning with Web 3.0 concepts. Oceans 3.0 comprises both tools to manage the data acquisition and archival of all instrumental assets managed by ONC as well as end-user tools to discover, process, visualize and download the data. Oceans 3.0 rests upon ten foundational pillars: (1) A robust and stable system architecture to serve as the backbone within a context of constant technological progress and evolving needs of the operators and end users; (2) a data acquisition and archival framework for infrastructure management and data recording, including instrument drivers and parsers to capture all data and observatory actions, alongside task management options and support for data versioning; (3) a metadata system tracking all the details necessary to archive Findable, Accessible, Interoperable and Reproducible (FAIR) data from all scientific and non-scientific sensors; (4) a data Quality Assurance and Quality Control lifecycle with a consistent workflow and automated testing to detect instrument, data and network issues; (5) a data product pipeline ensuring the data are served in a wide variety of standard formats; (6) data discovery and access tools, both generalized and use-specific, allowing users to find and access data of interest; (7) an Application Programming Interface that enables scripted data discovery and access; (8) capabilities for customized and interactive data handling such as annotating videos or ingesting individual campaign-based data sets; (9) a system for generating persistent data identifiers and data citations, which supports interoperability with external data repositories; (10) capabilities to automatically detect and react to emergent events such as earthquakes. With a growing database and advancing technological capabilities, Oceans 3.0 is evolving toward a future in which the old paradigm of downloading packaged data files transitions to the new paradigm of cloud-based environments for data discovery, processing, analysis, and exchange.

Journal ArticleDOI
TL;DR: In this paper , the authors explore the data management challenges encountered by practitioners developing systems with DL components, identify the potential solutions from the literature and validate the solutions through a multiple case study.

Journal ArticleDOI
TL;DR: Current practices in soil data synthesis across all stages of database creation: availability, input, harmonization, curation, and publication are summarized and new soil-focused semantic tools to improve existing data pipelines are suggested.
Abstract: Abstract. In the age of big data, soil data are more available and richer than ever, but – outside of a few large soil survey resources – they remain largely unusable for informing soil management and understanding Earth system processes beyond the original study. Data science has promised a fully reusable research pipeline where data from past studies are used to contextualize new findings and reanalyzed for new insight. Yet synthesis projects encounter challenges at all steps of the data reuse pipeline, including unavailable data, labor-intensive transcription of datasets, incomplete metadata, and a lack of communication between collaborators. Here, using insights from a diversity of soil, data, and climate scientists, we summarize current practices in soil data synthesis across all stages of database creation: availability, input, harmonization, curation, and publication. We then suggest new soil-focused semantic tools to improve existing data pipelines, such as ontologies, vocabulary lists, and community practices. Our goal is to provide the soil data community with an overview of current practices in soil data and where we need to go to fully leverage big data to solve soil problems in the next century.

Journal ArticleDOI
TL;DR: The AusGeochem platform as discussed by the authors is an open, cloud-based data repository and a data analysis tool for geochemical data, which can be used to preserve, disseminate and collate geochronology and isotopic data.
Abstract: To promote a more efficient and transparent geochemistry data ecosystem, a consortium of Australian university research laboratories called the AuScope Geochemistry Network assembled to build a collaborative platform for the express purpose of preserving, disseminating and collating geochronology and isotopic data. In partnership with geoscience‐data‐solutions company Lithodat Pty Ltd, the open, cloud‐based AusGeochem platform (https://ausgeochem.auscope.org.au) was developed to simultaneously serve as a geosample registry, a geochemical data repository and a data analysis tool. Informed by method‐specific groups of geochemistry experts and established international data reporting practices, community‐agreed database schemas were developed for rock and mineral geosample metadata and secondary ion mass spectrometry U‐Pb analysis, with additional models for laser ablation‐inductively coupled‐mass spectrometry U‐Pb and Lu‐Hf, Ar‐Ar, fission‐track and (U‐Th‐Sm)/He under development. Collectively, the AusGeochem platform provides the geochemistry community with a new, dynamic resource to help facilitate FAIR (Findable, Accessible, Interoperable, Reusable) data management, streamline data dissemination and advanced quantitative investigations of Earth system processes. By systematically archiving detailed geochemical (meta‐)data in structured schemas, intractably large datasets comprising thousands of analyses produced by numerous laboratories can be readily interrogated in novel and powerful ways. These include rapid derivation of inter‐data relationships, facilitating on‐the‐fly data compilation, analysis and visualisation.

Journal ArticleDOI
TL;DR: In this article , the authors present 10 guiding data quality questions to help managers and scientists identify appropriate workflows to improve data quality by describing the data ecosystem, creating a data quality plan, identifying roles and responsibilities, building data collection and data management workflows, training and calibrating data collectors, detecting and correcting errors, and describing sources of variability.

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , the authors analyzed the scientific production of research data management indexed in Dimensions and found that about 60% of the publications had at least one citation, with 3,598 citations found, featuring a growing academic impact.
Abstract: The study aims to analyze the scientific production of research data management indexed in Dimensions. Using the term “research data management”, 677 articles were retrieved and analyzed using output and citation bibliometric indicators. The multidisciplinary in research data management was demonstrated by publications occurring in different research areas, such as computer science, information systems, library and information science, medicine and health sciences, and history and archeology. The countries with the highest publication rates were the United States, Germany, and the United Kingdom. About 60% of the publications had at least one citation, with 3,598 citations found, featuring a growing academic impact since the volume of production and citations have grown over time. When it comes to the Big Data era, data management is a topic under development that ensures its sharing and reuse and, consequently, the advancement of science. This bibliometric study made it possible to monitor the literature performance on research data management.

Journal ArticleDOI
TL;DR: Research data management (RDM) is needed to assist experimental advances and data collection in the chemical sciences as mentioned in this paper , and an agreement is needed on minimum information standards for data handling to support structured approaches to data reporting.
Abstract: Research data management (RDM) is needed to assist experimental advances and data collection in the chemical sciences. Many funders require RDM because experiments are often paid for by taxpayers and the resulting data should be deposited sustainably for posterity. However, paper notebooks are still common in laboratories and research data is often stored in proprietary and/or dead-end file formats without experimental context. Data must mature beyond a mere supplement to a research paper. Electronic lab notebooks (ELN) and laboratory information management systems (LIMS) allow researchers to manage data better and they simplify research and publication. Thus, an agreement is needed on minimum information standards for data handling to support structured approaches to data reporting. As digitalization becomes part of curricular teaching, future generations of digital native chemists will embrace RDM and ELN as an organic part of their research.

Journal ArticleDOI
TL;DR: The International Photoacoustic Standardisation Consortium (IPASC) as discussed by the authors has established a data format with a defined consensus metadata structure and developed an open-source software application programming interface (API) to enable conversion from proprietary file formats into the IPASC format.

Journal ArticleDOI
TL;DR: Research data management (RDM) is the cornerstone of a successful research project and yet it often remains an underappreciated art that gets overlooked in the hustle and bustle of everyday project management even when required by funding bodies as discussed by the authors .
Abstract: Research data management (RDM) is the cornerstone of a successful research project, and yet it often remains an underappreciated art that gets overlooked in the hustle and bustle of everyday project management even when required by funding bodies. If researchers are to strive for reproducible science that adheres to the principles of FAIR, then they need to manage the data associated with their research projects effectively. It is imperative to plan your RDM strategies early on, and setup your project organisation before embarking on the work. There are several different factors to consider: data management plans, data organisation and storage, publishing and sharing your data, ensuring reproducibility and adhering to data standards. Additionally it is important to reflect upon the ethical implications that might need to be planned for, and adverse issues that may need a mitigation strategy. This short article discusses these different areas, noting some best practices and detailing how to incorporate these strategies into your work. Finally, the article ends with a set of top ten tips for effective research data management.

Journal ArticleDOI
TL;DR: A comprehensive review of recent hydroinformatics applications that employ visual computing techniques to support complex data-driven research problems and support the communication and decision-makings in the water resources management sector is presented in this article .
Abstract: Recent advances in information, communication, and environmental monitoring technologies have increased the availability, spatiotemporal resolution, and quality of water-related data, thereby leading to the emergence of many innovative big data applications. Among these applications, visualization and visual analytics, also known as the visual computing techniques, empower the synergy of computational methods (e.g., machine learning and statistical models) with human reasoning to improve the understanding and solution toward complex science and engineering problems. These approaches are frequently integrated with geographic information systems and cyberinfrastructure to provide new opportunities and methods for enhancing water resources management. In this paper, we present a comprehensive review of recent hydroinformatics applications that employ visual computing techniques to (1) support complex data-driven research problems, and (2) support the communication and decision-makings in the water resources management sector. Then, we conduct a technical review of the state-of-the-art web-based visualization technologies and libraries to share our experiences on developing shareable, adaptive, and interactive visualizations and visual interfaces for water resources management applications. We close with a vision that applies the emerging visual computing technologies and paradigms to develop the next generation of hydroinformatics applications.

Journal ArticleDOI
TL;DR: Research data management (RDM) is the cornerstone of a successful research project and yet it often remains an underappreciated art that gets overlooked in the hustle and bustle of everyday project management even when required by funding bodies as mentioned in this paper .
Abstract: Research data management (RDM) is the cornerstone of a successful research project, and yet it often remains an underappreciated art that gets overlooked in the hustle and bustle of everyday project management even when required by funding bodies. If researchers are to strive for reproducible science that adheres to the principles of FAIR, then they need to manage the data associated with their research projects effectively. It is imperative to plan your RDM strategies early on, and setup your project organisation before embarking on the work. There are several different factors to consider: data management plans, data organisation and storage, publishing and sharing your data, ensuring reproducibility and adhering to data standards. Additionally it is important to reflect upon the ethical implications that might need to be planned for, and adverse issues that may need a mitigation strategy. This short article discusses these different areas, noting some best practices and detailing how to incorporate these strategies into your work. Finally, the article ends with a set of top ten tips for effective research data management.

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper, the authors present a real case from the system design, its birth, and its proper use for damage detection, up to the detection of a structural failure, showing that a trade-off must be looked for between the big redundancy offered by the actual networks and the need of a simple and prompt information, granting the structure safety.
Abstract: The progress in the world of new sensors is running fast, offering good performances, and reliable solutions with initial costs orders of magnitude lower than those faced only a few years ago. The spread of new electronic devices, like microcontrollers, the increasing power of the networks for data transmission and management and in the end the availability of the new data-driven approaches have created a revolution in SHM approaches, not yet fully mastered. The way to design a SHM system is going to be deeply revised in an industrial perspective, within a complex framework in which everything has to be planned into details since the beginning, including the development of a metrological culture, the personnel education, the need of spare parts, re-calibration, …. This also means a revolution in data management: huge data flows not only create hardware problems related to their transfer; the software too requires a great deal of effort to compress data, also due to the actual cost of cloud resources. All these facts, accounting for the real metrological performances of the best MEMS sensors available at present, also require simplified data analyses, as software complexity is now mainly transferred to the network management. A trade-off must be looked for between the big redundancy offered by the actual networks and the need of a simple and prompt information, granting the structure safety: That is why as the data rates increase, the algorithms to be adopted must be simple, reliable, eventually adapted to edge computing at the sensor level, where hardware power is now present though at a reduced scale. The chapter shows such an approach in a real case from the system design, its birth, and its proper use for damage detection, up to the detection of a structural failure.

Journal ArticleDOI
TL;DR: CrowdMed-II as discussed by the authors , a health data management framework based on blockchain, which could address the above-mentioned problems of health data, study the design of major smart contracts in their framework and propose two smart contract structures.
Abstract: The healthcare industry faces serious problems with health data. Firstly, health data is fragmented and its quality needs to be improved. Data fragmentation means that it is difficult to integrate the patient data stored by multiple health service providers. The quality of these heterogeneous data also needs to be improved for better utilization. Secondly, data sharing among patients, healthcare service providers and medical researchers is inadequate. Thirdly, while sharing health data, patients' right to privacy must be protected, and patients should have authority over who can access their data. In traditional health data sharing system, because of centralized management, data can easily be stolen, manipulated. These systems also ignore patient's authority and privacy. Researchers have proposed some blockchain-based health data sharing solutions where blockchain is used for consensus management. Blockchain enables multiple parties who do not fully trust each other to exchange their data. However, the practice of smart contracts supporting these solutions has not been studied in detail. We propose CrowdMed-II, a health data management framework based on blockchain, which could address the above-mentioned problems of health data. We study the design of major smart contracts in our framework and propose two smart contract structures. We also introduce a novel search contract for searching patients in the framework. We evaluate their efficiency based on the execution costs on Ethereum. Our design improves on those previously proposed, lowering the computational costs of the framework. This allows the framework to operate at scale and is more feasible for widespread adoption.

Journal ArticleDOI
TL;DR: In this article , the authors propose strategic guidance for an integrated European exposure data production and management framework for use in science and policy, building on current and future data analysis and digitalization trends.

Journal ArticleDOI
20 Jan 2022-Data
TL;DR: In this paper , the authors evaluate and provide examples of case studies currently using PDI and use its long-term continental US database (18 locations and 24 years) to test the cover crop and grazing effects on soil organic carbon (SOC) storage, and show that legume and rye (Secale cereale L.) cover crops increased SOC storage by 36% and 50%, respectively, compared with oat (Avena sativa L.) and rye mixtures and low and high grazing intensities improving the upper SOC by 69-72% compared with a medium grazing intensity.
Abstract: Combining data into a centralized, searchable, and linked platform will provide a data exploration platform to agricultural stakeholders and researchers for better agricultural decision making, thus fully utilizing existing data and preventing redundant research. Such a data repository requires readiness to share data, knowledge, and skillsets and working with Big Data infrastructures. With the adoption of new technologies and increased data collection, agricultural workforces need to update their knowledge, skills, and abilities. The partnerships for data innovation (PDI) effort integrates agricultural data by efficiently capturing them from field, lab, and greenhouse studies using a variety of sensors, tools, and apps and provides a quick visualization and summary of statistics for real-time decision making. This paper aims to evaluate and provide examples of case studies currently using PDI and use its long-term continental US database (18 locations and 24 years) to test the cover crop and grazing effects on soil organic carbon (SOC) storage. The results show that legume and rye (Secale cereale L.) cover crops increased SOC storage by 36% and 50%, respectively, compared with oat (Avena sativa L.) and rye mixtures and low and high grazing intensities improving the upper SOC by 69–72% compared with a medium grazing intensity. This was likely due to legumes providing a more favorable substrate for SOC formation and high grazing intensity systems having continuous manure deposition. Overall, PDI can be used to democratize data regionally and nationally and therefore can address large-scale research questions aimed at addressing agricultural grand challenges.

Journal ArticleDOI
TL;DR: In this paper , the authors used focus groups with scientists from five disciplines (atmospheric and earth science, computer science, chemistry, ecology, and neuroscience) to understand their perceptions of data repositories and their perspectives on data management and sharing practices.
Abstract: Data sharing can accelerate scientific discovery while increasing return on investment beyond the researcher or group that produced them. Data repositories enable data sharing and preservation over the long term, but little is known about scientists' perceptions of them and their perspectives on data management and sharing practices. Using focus groups with scientists from five disciplines (atmospheric and earth science, computer science, chemistry, ecology, and neuroscience), we asked questions about data management to lead into a discussion of what features they think are necessary to include in data repository systems and services to help them implement the data sharing and preservation parts of their data management plans. Participants identified metadata quality control and training as problem areas in data management. Additionally, participants discussed several desired repository features, including: metadata control, data traceability, security, stable infrastructure, and data use restrictions. We present their desired repository features as a rubric for the research community to encourage repository utilization. Future directions for research are discussed.

Journal ArticleDOI
TL;DR: The Catalysis Data Infrastructure (CDI) as mentioned in this paper is an infrastructure to facilitate the management of research data produced by researchers, the CDI is proposed to encompass the presentation of research outputs (publications and data) in a digital repository that brings together an array of heterogeneous data types.