scispace - formally typeset
Search or ask a question

Showing papers on "Data management published in 2020"


Journal ArticleDOI
03 Feb 2020-Agronomy
TL;DR: In this article, the authors review the current status of advanced farm management systems by revisiting each crucial step, from data acquisition in crop fields to variable rate applications, so that growers can make optimized decisions to save money while protecting the environment and transforming how food will be produced to sustainably match the forthcoming population growth.
Abstract: The information that crops offer is turned into profitable decisions only when efficiently managed. Current advances in data management are making Smart Farming grow exponentially as data have become the key element in modern agriculture to help producers with critical decision-making. Valuable advantages appear with objective information acquired through sensors with the aim of maximizing productivity and sustainability. This kind of data-based managed farms rely on data that can increase efficiency by avoiding the misuse of resources and the pollution of the environment. Data-driven agriculture, with the help of robotic solutions incorporating artificial intelligent techniques, sets the grounds for the sustainable agriculture of the future. This paper reviews the current status of advanced farm management systems by revisiting each crucial step, from data acquisition in crop fields to variable rate applications, so that growers can make optimized decisions to save money while protecting the environment and transforming how food will be produced to sustainably match the forthcoming population growth.

349 citations


Journal ArticleDOI
TL;DR: This review is addressing an analytical survey of the current and potential application of Internet of Things in arable farming, where spatial data, highly varying environments, task diversity and mobile devices pose unique challenges to be overcome compared to other agricultural systems.

195 citations


Journal ArticleDOI
TL;DR: This is a foundational study that formalises and categorises the existing usage of AR and VR in the construction industry and provides a roadmap to guide future research efforts.

182 citations


DOI
20 Aug 2020
TL;DR: China National GeneBank DataBase (CNGBdb) is a data platform aiming to systematically archiving and sharing of multi-omics data in life science to promote the systematic management, application and industrial utilization.
Abstract: China National GeneBank DataBase (CNGBdb) is a data platform aiming to systematically archiving and sharing of multi-omics data in life science. As the service portal of Bio-informatics Data Center of the core structure, namely, "Three Banks and Two Platforms" of China National GeneBank (CNGB), CNGBdb has the advantages of rich sample resources, data resources, cooperation projects, powerful data computation and analysis capabilities. With the advent of high throughput sequencing technologies, research in life science has entered the big data era, which is in the need of closer international cooperation and data sharing. With the development of China's economy and the increase of investment in life science research, we need to establish a national public platform for data archiving and sharing in life science to promote the systematic management, application and industrial utilization. Currently, CNGBdb can provide genomic data archiving, information search engines, data management and data analysis services. The data schema of CNGBdb has covered projects, samples, experiments, runs, assemblies, variations and sequences. Until May 22, 2020, CNGBdb has archived 2176 research projects and more than 2221 TB sequencing data submitted by researchers globally. In the future, CNGBdb will continue to be dedicated to promoting data sharing in life science research and improving the service capability. CNGBdb website is: https://db.cngb.org/.

180 citations


Journal ArticleDOI
TL;DR: The goals of the platform are to provide decentralised mechanisms to both service providers and data owners for processing personal data; meanwhile, empower data provenance and transparency by leveraging advanced features of the blockchain technology.
Abstract: The General Data Protection Regulation (GDPR) gives control of personal data back to the owners by appointing higher requirements and obligations on service providers who manage and process personal data. As the verification of GDPR-compliance, handled by a supervisory authority, is irregularly conducted; it is challenging to be certified that a service provider has been continuously adhering to the GDPR. Furthermore, it is beyond the data owner’s capability to perceive whether a service provider complies with the GDPR and effectively protects her personal data. This motivates us to envision a design concept for developing a GDPR-compliant personal data management platform leveraging the emerging blockchain and smart contract technologies. The goals of the platform are to provide decentralised mechanisms to both service providers and data owners for processing personal data; meanwhile, empower data provenance and transparency by leveraging advanced features of the blockchain technology. The platform enables data owners to impose data usage consent, ensures only designated parties can process personal data, and logs all data activities in an immutable distributed ledger using smart contract and cryptography techniques. By honestly participating in the platform, a service provider can be endorsed by the blockchain network that it is fully GDPR-compliant; otherwise, any violation is immutably recorded and is easily figured out by associated parties. We then demonstrate the feasibility and efficiency of the proposed design concept by developing a profile management platform implemented on top of the Hyperledger Fabric permissioned blockchain framework, following by valuable analysis and discussion.

139 citations


Journal ArticleDOI
TL;DR: A data management method for digital twin of product based on blockchain technology is proposed and the results show that the proposed method can solve the abovementioned data management problems simultaneously.

135 citations


Journal ArticleDOI
TL;DR: Analysis and evaluations manifest that the proposed BlockTDM scheme provides a general, flexible, and configurable blockchain-based paradigm for trusted data management with tamper-resistance, which is suitable for edge computing with high-level security and creditability.
Abstract: With rapid development of computing technologies, large amount of data are gathered from edge terminals or Internet of Things (IoT) devices, however data trust and security in edge computing environment are very important issues to be considered, especially when the gathered data are fraud or dishonest, or the data are misused or spread without any authorization, which may lead to serious problems. In this article, a blockchain-based trusted data management scheme (called BlockTDM) in edge computing is proposed to solve the above problems, in which we proposed a flexible and configurable blockchain architecture that includes mutual authentication protocol, flexible consensus, smart contract, block and transaction data management, blockchain nodes management, and deployment. The BlockTDM scheme can support matrix-based multichannel data segment and isolation for sensitive or privacy data protection, and moreover, we have designed user-defined sensitive data encryption before the transaction payload stores in blockchain system, and have implemented conditional access and decryption query of the protected blockchain data and transactions through smart contract. Finally, we have evaluated the proposed BlockTDM scheme security, availability, and efficiency with large amount of experiments. Analysis and evaluations manifest that the proposed BlockTDM scheme provides a general, flexible, and configurable blockchain-based paradigm for trusted data management with tamper-resistance, which is suitable for edge computing with high-level security and creditability.

131 citations


Journal ArticleDOI
TL;DR: An overview of seven platforms for big Earth observation data management and analysis—Google Earth Engine, Sentinel Hub, Open Data Cube, System for Earth Observation Data Access, Processing and Analysis for Land Monitoring (SEPAL), openEO, JEODPP, and pipsCloud is presented.
Abstract: In recent years, Earth observation (EO) satellites have generated big amounts of geospatial data that are freely available for society and researchers. This scenario brings challenges for traditional spatial data infrastructures (SDI) to properly store, process, disseminate and analyze these big data sets. To meet these demands, novel technologies have been proposed and developed, based on cloud computing and distributed systems, such as array database systems, MapReduce systems and web services to access and process big Earth observation data. Currently, these technologies have been integrated into cutting edge platforms in order to support a new generation of SDI for big Earth observation data. This paper presents an overview of seven platforms for big Earth observation data management and analysis—Google Earth Engine (GEE), Sentinel Hub, Open Data Cube (ODC), System for Earth Observation Data Access, Processing and Analysis for Land Monitoring (SEPAL), openEO, JEODPP, and pipsCloud. We also provide a comparison of these platforms according to criteria that represent capabilities of the EO community interest.

123 citations


Book
29 Jan 2020
TL;DR: The Role of Case Management in the Human Services Assessment of Client Service and Support Needs and Effective Case Management Guidelines for Practice are presented.
Abstract: The Role of Case Management in the Human Services Assessment of Client Service and Support Needs Development of the Client Service and Support Plan The Direct Service Function of Case Management The Indirect Service Function of Case Management The Monitoring Function of Case Management The Evaluation Function of Case Management Effective Case Management Guidelines for Practice

109 citations


Journal ArticleDOI
11 Mar 2020-PLOS ONE
TL;DR: Although attitudes towards data sharing and data use and reuse are mostly positive, practice does not always support data storage, sharing, and future reuse, and assistance through data managers or data librarians is clearly needed.
Abstract: Background With data becoming a centerpiece of modern scientific discovery, data sharing by scientists is now a crucial element of scientific progress. This article aims to provide an in-depth examination of the practices and perceptions of data management, including data storage, data sharing, and data use and reuse by scientists around the world. Methods The Usability and Assessment Working Group of DataONE, an NSF-funded environmental cyberinfrastructure project, distributed a survey to a multinational and multidisciplinary sample of scientific researchers in a two-waves approach in 2017-2018. We focused our analysis on examining the differences across age groups, sub-disciplines of science, and sectors of employment. Findings Most respondents displayed what we describe as high and mediocre risk data practices by storing their data on their personal computer, departmental servers or USB drives. Respondents appeared to be satisfied with short-term storage solutions; however, only half of them are satisfied with available mechanisms for storing data beyond the life of the process. Data sharing and data reuse were viewed positively: over 85% of respondents admitted they would be willing to share their data with others and said they would use data collected by others if it could be easily accessed. A vast majority of respondents felt that the lack of access to data generated by other researchers or institutions was a major impediment to progress in science at large, yet only about a half thought that it restricted their own ability to answer scientific questions. Although attitudes towards data sharing and data use and reuse are mostly positive, practice does not always support data storage, sharing, and future reuse. Assistance through data managers or data librarians, readily available data repositories for both long-term and short-term storage, and educational programs for both awareness and to help engender good data practices are clearly needed.

95 citations


Journal ArticleDOI
TL;DR: A secure data collaboration framework (FDC) based on federated deep-learning technology that can realize the secure collaboration of multiparty data computation on the premise that the data do not need to be transmitted out of their private data center.
Abstract: With the explosive network data due to the advanced development of the Internet of Things (IoT), the demand for multiparty computation is increasing. In addition, with the advent of future digital society, data have been gradually evolving into an effective virtual asset for sharing and usage. With the nature of the sensitivity, massiveness, fragmentation, and security of multiparty data computation in the IoT environment, we propose a secure data collaboration framework (FDC) based on federated deep-learning technology. The proposed framework can realize the secure collaboration of multiparty data computation on the premise that the data do not need to be transmitted out of their private data center. This framework is empowered by public data center, private data center, and the blockchain technology. The private data center is responsible for data governance, data registration, and data management. The public data center is used for multiparty secure computation. The blockchain paradigm is responsible for ensuring secure data usage and transmissions. A real IoT scenario is used to validate the effectiveness of the proposed framework.

Journal ArticleDOI
TL;DR: ACTION-EHR is developed, a system for patient-centric, blockchain-based EHR data sharing and management for patient care, in particular radiation treatment for cancer, built on Hyperledger Fabric, a permissioned blockchain framework.
Abstract: Background: With increased specialization of health care services and high levels of patient mobility, accessing health care services across multiple hospitals or clinics has become very common for diagnosis and treatment, particularly for patients with chronic diseases such as cancer. With informed knowledge of a patient’s history, physicians can make prompt clinical decisions for smarter, safer, and more efficient care. However, due to the privacy and high sensitivity of electronic health records (EHR), most EHR data sharing still happens through fax or mail due to the lack of systematic infrastructure support for secure, trustable health data sharing, which can also cause major delays in patient care. Objective: Our goal was to develop a system that will facilitate secure, trustable management, sharing, and aggregation of EHR data. Our patient-centric system allows patients to manage their own health records across multiple hospitals. The system will ensure patient privacy protection and guarantee security with respect to the requirements for health care data management, including the access control policy specified by the patient. Methods: We propose a permissioned blockchain-based system for EHR data sharing and integration. Each hospital will provide a blockchain node integrated with its own EHR system to form the blockchain network. A web-based interface will be used for patients and doctors to initiate EHR sharing transactions. We take a hybrid data management approach, where only management metadata will be stored on the chain. Actual EHR data, on the other hand, will be encrypted and stored off-chain in Health Insurance Portability and Accountability Act–compliant cloud-based storage. The system uses public key infrastructure–based asymmetric encryption and digital signatures to secure shared EHR data. Results: In collaboration with Stony Brook University Hospital, we developed ACTION-EHR, a system for patient-centric, blockchain-based EHR data sharing and management for patient care, in particular radiation treatment for cancer. The prototype was built on Hyperledger Fabric, an open-source, permissioned blockchain framework. Data sharing transactions were implemented using chaincode and exposed as representational state transfer application programming interfaces used for the web portal for patients and users. The HL7 Fast Healthcare Interoperability Resources standard was adopted to represent shared EHR data, making it easy to interface with hospital EHR systems and integrate a patient’s EHR data. We tested the system in a distributed environment at Stony Brook University using deidentified patient data. Conclusions: We studied and developed the critical technology components to enable patient-centric, blockchain-based EHR sharing to support cancer care. The prototype demonstrated the feasibility of our approach as well as some of the major challenges. The next step will be a pilot study with health care providers in both the United States and Switzerland. Our work provides an exemplar testbed to build next-generation EHR sharing infrastructures.

Journal ArticleDOI
01 Oct 2020
TL;DR: This paper presents a blockchain-based scheme for information sharing securely in the pharmaceutical supply chain system with smart contracts and consensus mechanism, and provides a mechanism to distribute required cryptographic keys to all the participants securely using the smart contract technique.
Abstract: The concept of Supply Chain Management (SCM) is very imperative while moving sensitive products from one entity to the next entity until it reaches to the end-users to avoid damage(s) in the product. In the traditional supply chain management system, several serious problems such as tampering of products, delay, and fraud, etc. exist. It also lacks proper authentication among the participants, data management as well as the integrity of the data. The blockchain mechanism is capable of solving the above-mentioned issues due to its important features such as decentralization, transparency, trust-less environment, anonymity, and immutability. This paper describes how the blockchain mechanism combines with the traditional pharmaceutical supply chain system and to achieve a better SCM system, we present a blockchain-based scheme for information sharing securely in the pharmaceutical supply chain system with smart contracts and consensus mechanism. The proposed scheme also provides a mechanism to distribute required cryptographic keys to all the participants securely using the smart contract technique. Further, transaction and block validation protocols have been designed in our protocol. The security analysis ensures that our protocol is robust and also achieves reasonable performance in terms of computation and communication overheads.

Journal ArticleDOI
TL;DR: The most relevant concepts of data management in IoT are identified, the current solutions proposed for IoT data management are surveyed, the most promising solutions are discussed, and relevant open research issues on the topic are identified providing guidelines for further contributions.

Journal ArticleDOI
TL;DR: A novel Digital Twin (DT)-enabled collaborative data management framework for metal AM systems, where a Cloud DT communicates with distributed Edge DTs in different product lifecycle stages, which has shown great potential in enhancing fundamental understanding of metal AM processes, developing simulation and prediction models, reducing development times and costs, and improving product quality and production efficiency.

Posted Content
TL;DR: This survey comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering, and explores four closely related analytical tasks commonly used with trajectory data in interactive or real-time processing.
Abstract: Recent advances in sensor and mobile devices have enabled an unprecedented increase in the availability and collection of urban trajectory data, thus increasing the demand for more efficient ways to manage and analyze the data being produced. In this survey, we comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering. We also explore four closely related analytical tasks commonly used with trajectory data in interactive or real-time processing. Deep trajectory learning is also reviewed for the first time. Finally, we outline the essential qualities that a trajectory data management system should possess in order to maximize flexibility.

Journal ArticleDOI
TL;DR: This work employs blockchain as a distributed ledger to support the decentralized approach of data management in IoT systems, where IoT data are stored in the deployed blockchain for further utilization, e.g., retrieve and audit.
Abstract: The rapid proliferation of Internet-of-Things (IoT) devices has brought great challenges of data management, i.e., storing, retrieving and manipulating a large volume of IoT data. Conventional IoT systems rely on centralized architectures to manage IoT data, hence suffering from limited scalability, lack of transparency, and single point of failure issues. As such, we employ blockchain as a distributed ledger to support the decentralized approach of data management in IoT systems, where IoT data are stored in the deployed blockchain for further utilization, e.g., retrieve and audit. A general architecture combining blockchain and IoT systems is presented. Nevertheless, as the resource constraints of IoT devices may still exist during the process of data transmissions from IoT devices to the blockchain network, we propose a case study of a learning-assisted resource allocation method to support intelligent data management. The numerical results show that the proposed scheme achieves superior performance compared with baseline solutions.

Journal ArticleDOI
TL;DR: A blockchain-based architecture for the IoT applications is presented, which brings distributed data management to support transactions services within a multi-party apparel business supply chain network.

Journal ArticleDOI
26 Mar 2020
TL;DR: The results show that big data can tackle the ever-present issues of customer regrets related to poor quality of information or lack of information in smart real estate to increase the customer satisfaction using an intermediate organization that can process and keep a check on the data being provided to the customers by the sellers and real estate managers.
Abstract: Big data is the concept of enormous amounts of data being generated daily in different fields due to the increased use of technology and internet sources. Despite the various advancements and the hopes of better understanding, big data management and analysis remain a challenge, calling for more rigorous and detailed research, as well as the identifications of methods and ways in which big data could be tackled and put to good use. The existing research lacks in discussing and evaluating the pertinent tools and technologies to analyze big data in an efficient manner which calls for a comprehensive and holistic analysis of the published articles to summarize the concept of big data and see field-specific applications. To address this gap and keep a recent focus, research articles published in last decade, belonging to top-tier and high-impact journals, were retrieved using the search engines of Google Scholar, Scopus, and Web of Science that were narrowed down to a set of 139 relevant research articles. Different analyses were conducted on the retrieved papers including bibliometric analysis, keywords analysis, big data search trends, and authors’ names, countries, and affiliated institutes contributing the most to the field of big data. The comparative analyses show that, conceptually, big data lies at the intersection of the storage, statistics, technology, and research fields and emerged as an amalgam of these four fields with interlinked aspects such as data hosting and computing, data management, data refining, data patterns, and machine learning. The results further show that major characteristics of big data can be summarized using the seven Vs, which include variety, volume, variability, value, visualization, veracity, and velocity. Furthermore, the existing methods for big data analysis, their shortcomings, and the possible directions were also explored that could be taken for harnessing technology to ensure data analysis tools could be upgraded to be fast and efficient. The major challenges in handling big data include efficient storage, retrieval, analysis, and visualization of the large heterogeneous data, which can be tackled through authentication such as Kerberos and encrypted files, logging of attacks, secure communication through Secure Sockets Layer (SSL) and Transport Layer Security (TLS), data imputation, building learning models, dividing computations into sub-tasks, checkpoint applications for recursive tasks, and using Solid State Drives (SDD) and Phase Change Material (PCM) for storage. In terms of frameworks for big data management, two frameworks exist including Hadoop and Apache Spark, which must be used simultaneously to capture the holistic essence of the data and make the analyses meaningful, swift, and speedy. Further field-specific applications of big data in two promising and integrated fields, i.e., smart real estate and disaster management, were investigated, and a framework for field-specific applications, as well as a merger of the two areas through big data, was highlighted. The proposed frameworks show that big data can tackle the ever-present issues of customer regrets related to poor quality of information or lack of information in smart real estate to increase the customer satisfaction using an intermediate organization that can process and keep a check on the data being provided to the customers by the sellers and real estate managers. Similarly, for disaster and its risk management, data from social media, drones, multimedia, and search engines can be used to tackle natural disasters such as floods, bushfires, and earthquakes, as well as plan emergency responses. In addition, a merger framework for smart real estate and disaster risk management show that big data generated from the smart real estate in the form of occupant data, facilities management, and building integration and maintenance can be shared with the disaster risk management and emergency response teams to help prevent, prepare, respond to, or recover from the disasters.

Journal ArticleDOI
01 Aug 2020
TL;DR: It is argued that the data management community is uniquely positioned to lead the responsible design, development, use, and oversight of Automated Decision Systems.
Abstract: The need for responsible data management intensifies with the growing impact of data on society. One central locus of the societal impact of data are Automated Decision Systems (ADS), socio-legal-technical systems that are used broadly in industry, non-profits, and government. ADS process data about people, help make decisions that are consequential to people's lives, are designed with the stated goals of improving efficiency and promoting equitable access to opportunity, involve a combination of human and automated decision making, and are subject to auditing for legal compliance and to public disclosure. They may or may not use AI, and may or may not operate with a high degree of autonomy, but they rely heavily on data.In this article, we argue that the data management community is uniquely positioned to lead the responsible design, development, use, and oversight of ADS. We outline a technical research agenda that requires that we step outside our comfort zone of engineering for efficiency and accuracy, to also incorporate reasoning about values and beliefs. This seems high-risk, but one of the upsides is being able to explain to our children what we do and why it matters.

Book
01 Jan 2020
TL;DR: Researchers and students of computer science, information systems, or business management as well as data professionals and practitioners will benefit most from this handbook by not only focusing on the various sections relevant to their research area or particular practical work, but by also studying chapters that they may initially consider not to be directly relevant to them.
Abstract: The issue of data quality is as old as data itself. However, the proliferation of diverse, large-scale and often publically available data on the Web has increased the risk of poor data quality and misleading data interpretations. On the other hand, data is now exposed at a much more strategic level e.g. through business intelligence systems, increasing manifold the stakes involved for individuals, corporations as well as government agencies. There, the lack of knowledge about data accuracy, currency or completeness can have erroneous and even catastrophic results. With these changes, traditional approaches to data management in general, and data quality control specifically, are challenged. There is an evident need to incorporate data quality considerations into the whole data cycle, encompassing managerial/governance as well as technical aspects. Data quality experts from research and industry agree that a unified framework for data quality management should bring together organizational, architectural and computational approaches. Accordingly, Sadiq structured this handbook in four parts: Part I is on organizational solutions, i.e. the development of data quality objectives for the organization, and the development of strategies to establish roles, processes, policies, and standards required to manage and ensure data quality. Part II, on architectural solutions, covers the technology landscape required to deploy developed data quality management processes, standards and policies. Part III, on computational solutions, presents effective and efficient tools and techniques related to record linkage, lineage and provenance, data uncertainty, and advanced integrity constraints. Finally, Part IV is devoted to case studies of successful data quality initiatives that highlight the various aspects of data quality in action. The individual chapters present both an overview of the respective topic in terms of historical research and/or practice and state of the art, as well as specific techniques, methodologies and frameworks developed by the individual contributors. Researchers and students of computer science, information systems, or business management as well as data professionals and practitioners will benefit most from this handbook by not only focusing on the various sections relevant to their research area or particular practical work, but by also studying chapters that they may initially consider not to be directly relevant to them, as there they will learn about new perspectives and approaches.

Book
29 Jun 2020
TL;DR: In this paper, the authors present an open access volume that analyzes and compares data practices across several fields through the analysis of specific cases of data journeys, and provides the necessary ground to examine disciplinary, geographical and historical differences and similarities in data management, processing and interpretation.
Abstract: This groundbreaking, open access volume analyses and compares data practices across several fields through the analysis of specific cases of data journeys. It brings together leading scholars in the philosophy, history and social studies of science to achieve two goals: tracking the travel of data across different spaces, times and domains of research practice; and documenting how such journeys affect the use of data as evidence and the knowledge being produced. The volume captures the opportunities, challenges and concerns involved in making data move from the sites in which they are originally produced to sites where they can be integrated with other data, analysed and re-used for a variety of purposes. The in-depth study of data journeys provides the necessary ground to examine disciplinary, geographical and historical differences and similarities in data management, processing and interpretation, thus identifying the key conditions of possibility for the widespread data sharing associated with Big and Open Data. The chapters are ordered in sections that broadly correspond to different stages of the journeys of data, from their generation to the legitimisation of their use for specific purposes. Additionally, the preface to the volume provides a variety of alternative “roadmaps” aimed to serve the different interests and entry points of readers; and the introduction provides a substantive overview of what data journeys can teach about the methods and epistemology of research.

Journal ArticleDOI
TL;DR: This paper formalizes the system model and the security model for this new primitive, and describes a concrete construction of attribute-based cloud data integrity auditing protocol, which offers desirable properties namely attribute privacy-preserving and collusion-resistance.
Abstract: Outsourced storage such as cloud storage can significantly reduce the burden of data management of data owners. Despite of a long list of merits of cloud storage, it triggers many security risks at the same time. Data integrity, one of the most burning challenges in secure cloud storage, is a fundamental and pivotal element in outsourcing services. Outsourced data auditing protocols enable a verifier to efficiently check the integrity of the outsourced files without downloading the entire file from the cloud, which can dramatically reduce the communication overhead between the cloud server and the verifier. Existing protocols are mostly based on public key infrastructure or an exact identity, which lacks flexibility of key management. In this paper, we seek to address the complex key management challenge in cloud data integrity checking by introducing attribute-based cloud data auditing, where users can upload files to cloud through some customized attribute set and specify some designated auditor set to check the integrity of the outsourced data. We formalize the system model and the security model for this new primitive, and describe a concrete construction of attribute-based cloud data integrity auditing protocol. The new protocol offers desirable properties namely attribute privacy-preserving and collusion-resistance. We prove soundness of our protocol based on the computational Diffie-Hellman assumption and the discrete logarithm assumption. Finally, we develop a prototype of the protocol which demonstrates the practicality of the protocol.

Journal ArticleDOI
01 Jul 2020
TL;DR: This work considers the problem of learning an index for two-dimensional spatial data and introduces a rank space based ordering technique to establish an ordering of point data and group the points into blocks for index learning, and proposes a recursive strategy that partitions a large point set and learns indices for each partition.
Abstract: Machine learning, especially deep learning, is used increasingly to enable better solutions for data management tasks previously solved by other means, including database indexing. A recent study shows that a neural network can not only learn to predict the disk address of the data value associated with a one-dimensional search key but also outperform B-tree-based indexing, thus promises to speed up a broad range of database queries that rely on B-trees for efficient data access. We consider the problem of learning an index for two-dimensional spatial data. A direct application of a neural network is unattractive because there is no obvious ordering of spatial point data. Instead, we introduce a rank space based ordering technique to establish an ordering of point data and group the points into blocks for index learning. To enable scalability, we propose a recursive strategy that partitions a large point set and learns indices for each partition. Experiments on real and synthetic data sets with more than 100 million points show that our learned indices are highly effective and efficient. Query processing using our indices is more than an order of magnitude faster than the use of R-trees or a recently proposed learned index.

Journal ArticleDOI
TL;DR: A permissioned blockchain-based decentralized trust management and secure usage control scheme of IoT big data (called BlockBDM), upon which all the data operations and management, such as data gathering, invoking, transfer, storage, and usage, are processed over the blockchain smart contract.
Abstract: With the fast development of Internet-of-Things (IoT) technologies, IoT big data and its applications are getting more and more useful. However, traditional IoT data management is fragile and vulnerable. Once the gathered data are untrusted or the stored data are tampered with deliberately from the internal users or attacked by an external hacker, then the tampered data have a serious problem to be utilized. To solve the problems of trust and security of IoT big data management, in this article, we propose a permissioned blockchain-based decentralized trust management and secure usage control scheme of IoT big data (called BlockBDM), upon which all the data operations and management, such as data gathering, invoking, transfer, storage, and usage, are processed over the blockchain smart contract. To encourage the IoT client to supply high-quality content, in our scheme, we design public-blockchain-based tokens reward mechanism for the high-quality data supply contribution. All the data processing and usage procedure can be recorded in a cryptography-signed and Merkle tree-based transaction(s) and block(s) with high-level security in a global and distributed ledger with tamper resistance. For data utilization and consumption, we propose secure usage control for digital rights management and token-based data consumption approach of high-value data from being violated or spread without any limitation. We implemented the BlockBDM scheme based on public and permissioned blockchain for IoT big data management. Finally, a large amount of evaluation manifests that the proposed BlockBDM scheme is feasible, secure, and scalable for decentralized trust management of IoT big data.

Journal ArticleDOI
TL;DR: This paper presents a secure and efficient data management framework, named ”EdgeMediChain”, for sharing health data that leverages both edge computing and blockchain to facilitate and provide the necessary requirements for a healthcare ecosystem in terms of scalability, security, as well as privacy.
Abstract: Recently, researchers around the world in medical institutions and pharmaceutical companies are demanding a wider access to healthcare data for secondary use in order to provide enhanced and personalized medical services. For this purpose, healthcare information exchange between health authorities can be leveraged as a fundamental concept to meet these demands and enable the discovery of new insights and cures. However, health data are highly sensitive and private information that requires strong authentication and authorization procedures to manage the access to them. In this regard, the cloud paradigm has been used in these e-healthcare solutions, but they remain inefficient due to their inability to adapt to the expanding volume of data generated from body sensors and their vulnerability against cyberattacks. Hence, collaborative and distributed data governance supported by edge computing and blockchain promises enormous potentials in improving the performance and security of the whole system. In this paper, we present a secure and efficient data management framework, named ”EdgeMediChain”, for sharing health data. The proposed architecture leverages both edge computing and blockchain to facilitate and provide the necessary requirements for a healthcare ecosystem in terms of scalability, security, as well as privacy. The Ethereum-based testbed evaluations show the effectiveness of EdgeMediChain in terms of execution time with a reduction of nearly 84.75% for 2000 concurrent transactions, higher throughput compared to a traditional blockchain, and scalable ledger storage with a linear growth rate.

Journal ArticleDOI
01 Jul 2020
TL;DR: This paper reports on the experience building Modin, a scaled-up implementation of the most widely-used and complex dataframe API today, Python's pandas, and proposes a simple data model and algebra for dataframes to ground discussion in the field.
Abstract: Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the remarkable success of dataframe libraries in R and Python, dataframes face performance issues even on moderately large datasets. Moreover, there is significant ambiguity regarding dataframe semantics. In this paper we lay out a vision and roadmap for scalable dataframe systems. To demonstrate the potential in this area, we report on our experience building Modin, a scaled-up implementation of the most widely-used and complex dataframe API today, Python's pandas. With pandas as a reference, we propose a simple data model and algebra for dataframes to ground discussion in the field. Given this foundation, we lay out an agenda of open research opportunities where the distinct features of dataframes will require extending the state of the art in many dimensions of data management. We discuss the implications of signature dataframe features including flexible schemas, ordering, row/column equivalence, and data/metadata fluidity, as well as the piecemeal, trial-and-error-based approach to interacting with dataframes.

Journal ArticleDOI
Tim Hulsen1
TL;DR: An analysis of the current literature around data sharing is shown, and five aspects of data sharing in the medical domain are discussed: publisher requirements, data ownership, growing support for data sharing, data sharing initiatives and how the use of federated data might be a solution.
Abstract: In recent years, more and more health data are being generated. These data come not only from professional health systems, but also from wearable devices. All these ‘big data’ put together can be utilized to optimize treatments for each unique patient (‘precision medicine’). For this to be possible, it is necessary that hospitals, academia and industry work together to bridge the ‘valley of death’ of translational medicine. However, hospitals and academia often are reluctant to share their data with other parties, even though the patient is actually the owner of his/her own health data. Academic hospitals usually invest a lot of time in setting up clinical trials and collecting data, and want to be the first ones to publish papers on this data. There are some publicly available datasets, but these are usually only shared after study (and publication) completion, which means a severe delay of months or even years before others can analyse the data. One solution is to incentivize the hospitals to share their data with (other) academic institutes and the industry. Here, we show an analysis of the current literature around data sharing, and we discuss five aspects of data sharing in the medical domain: publisher requirements, data ownership, growing support for data sharing, data sharing initiatives and how the use of federated data might be a solution. We also discuss some potential future developments around data sharing, such as medical crowdsourcing and data generalists.

Journal ArticleDOI
09 Dec 2020-PLOS ONE
TL;DR: The proposed model shows that the healthcare records are not traceable to unauthorized access as the model stores only the encrypted hash of the records that proves effectiveness in terms of data security, enhanced data privacy, improved data scalability, interoperability and data integrity while sharing and accessing medical records among stakeholders across the healthchain network.
Abstract: The privacy of Electronic Health Records (EHRs) is facing a major hurdle with outsourcing private health data in the cloud as there exists danger of leaking health information to unauthorized parties. In fact, EHRs are stored on centralized databases that increases the security risk footprint and requires trust in a single authority which cannot effectively protect data from internal attacks. This research focuses on ensuring the patient privacy and data security while sharing the sensitive data across same or different organisations as well as healthcare providers in a distributed environment. This research develops a privacy-preserving framework viz Healthchain based on Blockchain technology that maintains security, privacy, scalability and integrity of the e-health data. The Blockchain is built on Hyperledger fabric, a permissioned distributed ledger solutions by using Hyperledger composer and stores EHRs by utilizing InterPlanetary File System (IPFS) to build this healthchain framework. Moreover, the data stored in the IPFS is encrypted by using a unique cryptographic public key encryption algorithm to create a robust blockchain solution for electronic health data. The objective of the research is to provide a foundation for developing security solutions against cyber-attacks by exploiting the inherent features of the blockchain, and thus contribute to the robustness of healthcare information sharing environments. Through the results, the proposed model shows that the healthcare records are not traceable to unauthorized access as the model stores only the encrypted hash of the records that proves effectiveness in terms of data security, enhanced data privacy, improved data scalability, interoperability and data integrity while sharing and accessing medical records among stakeholders across the healthchain network.

Journal ArticleDOI
TL;DR: An efficient approach for data integrity auditing in cloud computing is proposed and the results obtained have been compared with state-of-the-art protocols and demonstrates the high efficiency and adaptability of proposed protocol by clients with limited resources.