scispace - formally typeset
Search or ask a question

Showing papers on "Data access published in 2019"


Book ChapterDOI
01 Jan 2019
TL;DR: The state of the art in push-based data access is summarized and possible next steps in the development of applications and technology in the field of real-time data management are identified.
Abstract: The ability to notify clients of changes to their critical data has become an important feature for both data storage systems and application development frameworks. In the final chapter of this book, we summarize the state of the art in push-based data access and identify possible next steps in the development of applications and technology in the field of real-time data management.

219 citations


Journal ArticleDOI
TL;DR: This work proposes APPA: a device-oriented Anonymous Privacy-Preserving scheme with Authentication for data aggregation applications in fog-enhanced IoT systems, which also supports multi-authority to manage smart devices and fog nodes locally.

170 citations


Journal ArticleDOI
TL;DR: The proposed privacy preserving data aggregation scheme not only guarantees data privacy of the TDs but also provides source authentication and integrity, and is very suitable for MEC assisted IoT applications.
Abstract: As the rapid development of 5G and Internet of Things (IoT) techniques, more and more mobile devices with specific sensing capabilities access to the network and large amounts of data. The traditional architecture of the cloud computing cannot satisfy the requirements, such as low latency, fast data access for IoT applications. Mobile edge computing (MEC) can solve these problems, and improve the execution efficiency of the system. In this paper, we propose a privacy preserving data aggregation scheme for MEC assisted IoT applications. In our model, there are three participants, i.e., terminal device (TD), edge server (ES), and public cloud center (PCC). The data generated by the TDs is encrypted and transmitted to the ES, then the ES aggregates the data of the TDs and submits the aggregated data to the PCC. At last, the aggregated plaintext data can be recovered by PCC through its private key. Our scheme not only guarantees data privacy of the TDs but also provides source authentication and integrity. Compared with traditional model, our scheme can save half of communication cost, and is very suitable for MEC assisted IoT applications.

155 citations


Journal ArticleDOI
TL;DR: GraphH, a PIM architecture for graph processing on the hybrid memory cube array, is proposed to tackle all four problems mentioned above, including random access pattern causing local bandwidth degradation, poor locality leading to unpredictable global data access, heavy conflicts on updating the same vertex, and unbalanced workloads across processing units.
Abstract: Large-scale graph processing requires the high bandwidth of data access. However, as graph computing continues to scale, it becomes increasingly challenging to achieve a high bandwidth on generic computing architectures. The primary reasons include: the random access pattern causing local bandwidth degradation, the poor locality leading to unpredictable global data access, heavy conflicts on updating the same vertex, and unbalanced workloads across processing units. Processing-in-memory (PIM) has been explored as a promising solution to providing high bandwidth, yet open questions of graph processing on PIM devices remain in: 1) how to design hardware specializations and the interconnection scheme to fully utilize bandwidth of PIM devices and ensure locality and 2) how to allocate data and schedule processing flow to avoid conflicts and balance workloads. In this paper, we propose GraphH, a PIM architecture for graph processing on the hybrid memory cube array, to tackle all four problems mentioned above. From the architecture perspective, we integrate SRAM-based on-chip vertex buffers to eliminate local bandwidth degradation. We also introduce reconfigurable double-mesh connection to provide high global bandwidth. From the algorithm perspective, partitioning and scheduling methods like index mapping interval-block and round interval pair are introduced to GraphH, thus workloads are balanced and conflicts are avoided. Two optimization methods are further introduced to reduce synchronization overhead and reuse on-chip data. The experimental results on graphs with billions of edges demonstrate that GraphH outperforms DDR-based graph processing systems by up to two orders of magnitude and $5.12 {\times }$ speedup against the previous PIM design.

135 citations


Journal ArticleDOI
TL;DR: The network and threat models of the authentication schemes for cloud-driven IoT-based big data environment, which provides important real-time event processing in critical scenarios like surveillance and monitoring of an industrial plant, are discussed.

122 citations


Journal ArticleDOI
TL;DR: A standardized IoT infrastructure where data are stored on a distributed storage service that is fault-tolerant and resistant to distributed denial of service (DDOS) attacks and data access is managed by a decentralized, trustless blockchain is described.
Abstract: Today, the number of Internet of Things (IoT) devices in all aspects of life is increasing exponentially. Our cities are getting smarter and informing us about our surroundings in a contextual manner. However, we face significant challenges in deploying, managing, and collecting data from these devices. In addition, we must address the problem of storing and mining that data for higher-quality IoT services. Blockchain technology, even in today's nascent form, has the potential to be the foundation for a common, distributed, trustless, and autonomous infrastructure system. This article describes a standardized IoT infrastructure where data are stored on a distributed storage service that is fault-tolerant and resistant to distributed denial of service (DDOS) attacks and data access is managed by a decentralized, trustless blockchain. The illustrated system used LoRa as the emerging network technology, Swarm as the distributed data storage platform, and Ethereum as the blockchain platform. Such a data back end will ensure high availability with minimal security risks while replacing traditional back-end systems with a single "smart contract."

112 citations


Journal ArticleDOI
23 May 2019
TL;DR: This paper presents the virtual knowledge graph (VKG) paradigm, which replaces the rigid structure of tables with the flexibility of graphs that are kept virtual and embed domain knowledge.
Abstract: In this paper, we present the virtual knowledge graph (VKG) paradigm for data integration and access, also known in the literature as Ontology-based Data Access. Instead of structuring the integrat...

92 citations


Journal ArticleDOI
TL;DR: This article describes how to replicate data from the cloud to the edge, and then to mobile devices to provide faster data access for users, and shows how services can be composed in crowded environments using service-specific overlays.
Abstract: Densely crowded environments such as stadiums and metro stations have shown shortcomings when users request data and services simultaneously. This is due to the excessive amount of requested and generated traffic from the user side. Based on the wide availability of user smart-mobile devices, and noting their technological advancements, devices are not being categorized only as data/service requesters anymore, but are readily being transformed to data/ service providing network-side tools. In essence, to offload some of the workload burden from the cloud, data can be either fully or partially replicated to edge and mobile devices for faster and more efficient data access in such dense environments. Moreover, densely crowded environments provide an opportunity to deliver, in a timely manner, through node collaboration, enriched user-specific services using the replicated data and device-specific capabilities. In this article, we first highlight the challenges that arise in densely crowded environments in terms of data/service management and delivery. Then we show how data replication and service composition are considered promising solutions for data and service management in densely crowded environments. Specifically, we describe how to replicate data from the cloud to the edge, and then to mobile devices to provide faster data access for users. We also discuss how services can be composed in crowded environments using service-specific overlays. We conclude the article with most of the open research areas that remain to be investigated.

90 citations


Journal ArticleDOI
TL;DR: By integrating a structured, interoperable design with patient-accumulated and generated data shared through smart contracts into a universally accessible blockchain, HealthChain presents patients and providers with access to consistent and comprehensive medical records.
Abstract: Background: Blockchain has the potential to disrupt the current modes of patient data access, accumulation, contribution, exchange, and control. Using interoperability standards, smart contracts, and cryptographic identities, patients can securely exchange data with providers and regulate access. The resulting comprehensive, longitudinal medical records can significantly improve the cost and quality of patient care for individuals and populations alike. Objective: This work presents HealthChain, a novel patient-centered blockchain framework. The intent is to bolster patient engagement, data curation, and regulated dissemination of accumulated information in a secure, interoperable environment. A mixed-block blockchain is proposed to support immutable logging and redactable patient blocks. Patient data are generated and exchanged through Health Level-7 Fast Healthcare Interoperability Resources, allowing seamless transfer with compliant systems. In addition, patients receive cryptographic identities in the form of public and private key pairs. Public keys are stored in the blockchain and are suitable for securing and verifying transactions. Furthermore, the envisaged system uses proxy re-encryption (PRE) to share information through revocable, smart contracts, ensuring the preservation of privacy and confidentiality. Finally, several PRE improvements are offered to enhance performance and security. Methods: The framework was formulated to address key barriers to blockchain adoption in health care, namely, information security, interoperability, data integrity, identity validation, and scalability. It supports 16 configurations through the manipulation of 4 modes. An open-source, proof-of-concept tool was developed to evaluate the performance of the novel patient block components and system configurations. To demonstrate the utility of the proposed framework and evaluate resource consumption, extensive testing was performed on each of the 16 configurations over a variety of scenarios involving a variable number of existing and imported records. Results: The results indicate several clear high-performing, low-bandwidth configurations, although they are not the strongest cryptographically. Of the strongest models, one’s anticipated cumulative record size is shown to influence the selection. Although the most efficient algorithm is ultimately user specific, Advanced Encryption Standard–encrypted data with static keys, incremental server storage, and no additional server-side encryption are the fastest and least bandwidth intensive, whereas proxy re-encrypted data with dynamic keys, incremental server storage, and additional server-side encryption are the best performing of the strongest configurations. Conclusions: Blockchain is a potent and viable technology for patient-centered access to and exchange of health information. By integrating a structured, interoperable design with patient-accumulated and generated data shared through smart contracts into a universally accessible blockchain, HealthChain presents patients and providers with access to consistent and comprehensive medical records. Challenges addressed include data security, interoperability, block storage, and patient-administered data access, with several configurations emerging for further consideration regarding speed and security.

85 citations


Proceedings ArticleDOI
TL;DR: MIMIC-Extract, an open source pipeline for transforming the raw electronic health record data of critical care patients from the publicly-available MIMic-III database into data structures that are directly usable in common time-series prediction pipelines, is presented.
Abstract: Robust machine learning relies on access to data that can be used with standardized frameworks in important tasks and the ability to develop models whose performance can be reasonably reproduced. In machine learning for healthcare, the community faces reproducibility challenges due to a lack of publicly accessible data and a lack of standardized data processing frameworks. We present MIMIC-Extract, an open-source pipeline for transforming raw electronic health record (EHR) data for critical care patients contained in the publicly-available MIMIC-III database into dataframes that are directly usable in common machine learning pipelines. MIMIC-Extract addresses three primary challenges in making complex health records data accessible to the broader machine learning community. First, it provides standardized data processing functions, including unit conversion, outlier detection, and aggregating semantically equivalent features, thus accounting for duplication and reducing missingness. Second, it preserves the time series nature of clinical data and can be easily integrated into clinically actionable prediction tasks in machine learning for health. Finally, it is highly extensible so that other researchers with related questions can easily use the same pipeline. We demonstrate the utility of this pipeline by showcasing several benchmark tasks and baseline results.

77 citations


Journal ArticleDOI
TL;DR: An analysis of existing blockchain-based health record solutions and a reference architecture for a “Ledger of Me” system that extends PHR to create a new platform combining the collection and access of medical data and digital interventions with smart contracts.
Abstract: Personal Health Records (PHRs) have the potential to give patients fine-grained, personalized and secure access to their own medical data and to enable self-management of care. Emergent trends around the use of Blockchain, or Distributed Ledger Technology, seem to offer solutions to some of the problems faced in enabling these technologies, especially to support issues consent, data exchange, and data access. We present an analysis of existing blockchain-based health record solutions and a reference architecture for a "Ledger of Me" system that extends PHR to create a new platform combining the collection and access of medical data and digital interventions with smart contracts. Our intention is to enable patient use of the data in order to support their care and to provide a strong consent mechanisms for sharing of data between different organizations and apps. Ledger of Me is based on around the principle that this combination of event-driven smart contracts, medical record data, and patient control is important for the adoption of blockchain-based solutions for the PHR. The reference architecture we present can serve as the basis of a range of future blockchain-based medical application architectures.

Journal ArticleDOI
TL;DR: The proposed solution based on IOTA Tangle and MAM could overcome many challenges faced by other traditional blockchain-based solutions in terms of cost, efficiency, scalability, and flexibility in data access management.
Abstract: Background: Huge amounts of health-related data are generated every moment with the rapid development of Internet of Things (IoT) and wearable technologies. These big health data contain great value and can bring benefit to all stakeholders in the health care ecosystem. Currently, most of these data are siloed and fragmented in different health care systems or public and private databases. It prevents the fulfillment of intelligent health care inspired by these big data. Security and privacy concerns and the lack of ensured authenticity trails of data bring even more obstacles to health data sharing. With a decentralized and consensus-driven nature, distributed ledger technologies (DLTs) provide reliable solutions such as blockchain, Ethereum, and IOTA Tangle to facilitate the health care data sharing. Objective: This study aimed to develop a health-related data sharing system by integrating IoT and DLT to enable secure, fee-less, tamper-resistant, highly-scalable, and granularly-controllable health data exchange, as well as build a prototype and conduct experiments to verify the feasibility of the proposed solution. Methods: The health-related data are generated by 2 types of IoT devices: wearable devices and stationary air quality sensors. The data sharing mechanism is enabled by IOTA’s distributed ledger, the Tangle, which is a directed acyclic graph. Masked Authenticated Messaging (MAM) is adopted to facilitate data communications among different parties. Merkle Hash Tree is used for data encryption and verification. Results: A prototype system was built according to the proposed solution. It uses a smartwatch and multiple air sensors as the sensing layer; a smartphone and a single-board computer (Raspberry Pi) as the gateway; and a local server for data publishing. The prototype was applied to the remote diagnosis of tremor disease. The results proved that the solution could enable costless data integrity and flexible access management during data sharing. Conclusions: DLT integrated with IoT technologies could greatly improve the health-related data sharing. The proposed solution based on IOTA Tangle and MAM could overcome many challenges faced by other traditional blockchain-based solutions in terms of cost, efficiency, scalability, and flexibility in data access management. This study also showed the possibility of fully decentralized health data sharing by replacing the local server with edge computing devices.

Journal ArticleDOI
TL;DR: This survey paper is providing a state of the art overview of Cloud-centric Big Data placement together with the data storage methodologies and is an attempt to highlight the actual correlation between these two in terms of better supporting Big Data management.
Abstract: Currently, the data to be explored and exploited by computing systems increases at an exponential rate. The massive amount of data or so-called “Big Data” put pressure on existing technologies for providing scalable, fast and efficient support. Recent applications and the current user support from multi-domain computing, assisted in migrating from data-centric to knowledge-centric computing. However, it remains a challenge to optimally store and place or migrate such huge data sets across data centers (DCs). In particular, due to the frequent change of application and DC behaviour (i.e., resources or latencies), data access or usage patterns need to be analyzed as well. Primarily, the main objective is to find a better data storage location that improves the overall data placement cost as well as the application performance (such as throughput). In this survey paper, we are providing a state of the art overview of Cloud-centric Big Data placement together with the data storage methodologies. It is an attempt to highlight the actual correlation between these two in terms of better supporting Big Data management. Our focus is on management aspects which are seen under the prism of non-functional properties. In the end, the readers can appreciate the deep analysis of respective technologies related to the management of Big Data and be guided towards their selection in the context of satisfying their non-functional application requirements. Furthermore, challenges are supplied highlighting the current gaps in Big Data management marking down the way it needs to evolve in the near future.

Journal ArticleDOI
TL;DR: A data-driven architecture for scheduling, in which the system has real time access to data and decisions can be made ahead of time, on the basis of more information, based on the architecture of cyber-physical systems.

Journal ArticleDOI
15 Nov 2019
TL;DR: It is theorized that those who create data have intimate and tacit knowledge that can be used as barter to form collaborations for mutual advantage, and proposes a typology of data reuses ranging from comparative to integrative.
Abstract: Open access to data, as a core principle of open science, is predicated on assumptions that scientific data can be reused by other researchers. We test those assumptions by asking where scientists find reusable data, how they reuse those data, and how they interpret data they did not collect themselves. By conducting a qualitative meta-analysis of evidence on two long-term, distributed, interdisciplinary consortia, we found that scientists frequently sought data from public collections and from other researchers for comparative purposes such as “ground-truthing” and calibration. When they sought others’ data for reanalysis or for combining with their own data, which was relatively rare, most preferred to collaborate with the data creators. We propose a typology of data reuses ranging from comparative to integrative. Comparative data reuse requires interactional expertise, which involves knowing enough about the data to assess their quality and value for a specific comparison such as calibrating an instrument in a lab experiment. Integrative reuse requires contributory expertise, which involves the ability to perform the action, such as reusing data in a new experiment. Data integration requires more specialized scientific knowledge and deeper levels of epistemic trust in the knowledge products. Metadata, ontologies, and other forms of curation benefit interpretation for any kind of data reuse. Based on these findings, we theorize the data creators’ advantage, that those who create data have intimate and tacit knowledge that can be used as barter to form collaborations for mutual advantage. Data reuse is a process that occurs within knowledge infrastructures that evolve over time, encompassing expertise, trust, communities, technologies, policies, resources, and institutions. Keywords: data, science, reuse, biomedicine, environmental sciences, open science, data practices, science policy

Journal ArticleDOI
TL;DR: Providing capabilities for distributed data stewardship and participatory access control along with effective ways for enforcement of the data access agreements and data ownership are among the major promises of blockchain-based platforms.

Proceedings ArticleDOI
Zinan Lin1, Alankar Jain1, Chen Wang2, Giulia Fanti1, Vyas Sekar1 
TL;DR: This work explores if and how generative adversarial networks can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge and designs a custom workflow called DoppelGANger, which achieves up to 43% better fidelity than baseline models.
Abstract: Limited data access is a longstanding barrier to data-driven research and development in the networked systems community. In this work, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge. As a specific target, our focus in this paper is on time series datasets with metadata (e.g., packet loss rate measurements with corresponding ISPs). We identify key challenges of existing GAN approaches for such workloads with respect to fidelity (e.g., long-term dependencies, complex multidimensional relationships, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity). To improve fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DG achieves up to 43% better fidelity than baseline models. Although we do not resolve the privacy problem in this work, we identify fundamental challenges with both classical notions of privacy and recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges. By shedding light on the promise and challenges, we hope our work can rekindle the conversation on workflows for data sharing.

Journal ArticleDOI
TL;DR: In this article, the authors compare data sharing to the exchange of patents based on the FRAND principles, and suggest a possible way for self-regulation to provide more transparency and fairness in the growing markets for data sharing.
Abstract: Data-driven markets depend on access to data as a resource for products and services. Since the quality of information that can be drawn from data increases with the available amount and quality of the data, businesses involved in the data economy have a great interest in accessing data from other market players. However, companies still appear to be reluctant to share their data. Therefore, the key question is how data sharing can be incentivized. This article focuses on data sharing platforms, which are emerging as new intermediaries and can play a vital role in the data economy, as they may increase willingness to share data. By comparing data sharing to the exchange of patents based on the FRAND principles, this article suggests a possible way for self-regulation to provide more transparency and fairness in the growing markets for data sharing.

Proceedings ArticleDOI
09 Dec 2019
TL;DR: A compiler-based tool called DR.SGX is designed and implemented that instruments the enclave code, permuting data locations at fine granularity and periodically re-randomize all enclave data to break the link between the memory observations by the adversary and the actual data accesses by the victim.
Abstract: Recent research has demonstrated that Intel's SGX is vulnerable to software-based side-channel attacks. In a common attack, the adversary monitors CPU caches to infer secret-dependent data accesses patterns. Known defenses have major limitations, as they require either error-prone developer assistance, incur extremely high runtime overhead, or prevent only specific attacks. In this paper, we propose data location randomization as a novel defense against side-channel attacks that target data access patterns. Our goal is to break the link between the memory observations by the adversary and the actual data accesses by the victim. We design and implement a compiler-based tool called DR.SGX that instruments the enclave code, permuting data locations at fine granularity. To prevent correlation of repeated memory accesses we periodically re-randomize all enclave data. Our solution requires no developer assistance and strikes the balance between side-channel protection and performance based on an adjustable security parameter.

Journal ArticleDOI
TL;DR: A novel data replica placement strategy for coordinated processing data-intensive IoT workflows in collaborative edge and cloud computing environment is proposed and the ITO algorithm, a variant of intelligent swarm optimization, is presented to address this model.

Patent
10 May 2019
TL;DR: In this paper, a Data Access Webform Crawling System is configured to identify a webform used to collect one or more pieces of personal data; robotically complete the identified webform; and analyze the completed webform to determine one of the processing activities that utilize the one or multiple pieces of Personal Data collected by the webform.
Abstract: In particular embodiments, a Data Access Webform Crawling System is configured to: (1) identify a webform used to collect one or more pieces of personal data; (2) robotically complete the identified webform; (3) analyze the completed webform to determine one or more processing activities that utilize the one or more pieces of personal data collected by the webform; (4) identify a first data asset in the data model that is associated with the one or more processing activities; (5) modify a data inventory for the first data asset in the data model to include data associated with the webform; and (6) modify the data model to include the modified data inventory for the first data asset.

Proceedings ArticleDOI
10 Jun 2019
TL;DR: A new data-driven spatial index structure, namely learned Z-order Model (ZM) index, which combines the Z- order space filling curve and the staged learning model is designed, which significantly reduces the memory cost and performs more efficiently than R-tree in most scenarios.
Abstract: With the pervasiveness of location-based services (LBS), spatial data processing has received considerable attention in the research of database system management. Among various spatial query techniques, index structures play a key role in data access and query processing. However, existing spatial index structures (e.g., R-tree) mainly focus on partitioning data space or data objects. In this paper, we explore the potential to construct the spatial index structure by learning the distribution of the data. We design a new data-driven spatial index structure, namely learned Z-order Model (ZM) index, which combines the Z-order space filling curve and the staged learning model. Experimental results on both real and synthetic datasets show that our learned index significantly reduces the memory cost and performs more efficiently than R-tree in most scenarios.

Book ChapterDOI
26 Oct 2019
TL;DR: VLog as discussed by the authors is a rule-based reasoner designed to satisfy the requirements of modern use cases, with a focus on performance and adaptability to different scenarios, including fast Datalog materialisation, support for reasoning with existential rules, stratified negation, and data integration from a variety of sources.
Abstract: Knowledge graphs are crucial assets for tasks like query answering or data integration. These tasks can be viewed as reasoning problems, which in turn require efficient reasoning systems to be implemented. To this end, we present VLog, a rule-based reasoner designed to satisfy the requirements of modern use cases, with a focus on performance and adaptability to different scenarios. We address the former with a novel vertical storage layout, and the latter by abstracting the access to data sources and providing a platform-independent Java API. Features of VLog include fast Datalog materialisation, support for reasoning with existential rules, stratified negation, and data integration from a variety of sources, such as high-performance RDF stores, relational databases, CSV files, OWL ontologies, and remote SPARQL endpoints.

Journal ArticleDOI
TL;DR: A three-year qualitative study of DANS, a digital data archive containing more than 50 years of heterogeneous data types, provides new insights to the uses, users, and roles of these systems and services as mentioned in this paper.
Abstract: Digital data archives play essential roles in knowledge infrastructures by mediating access to data within and between communities. This three-year qualitative study of DANS, a digital data archive containing more than 50 years of heterogeneous data types, provides new insights to the uses, users, and roles of these systems and services. Consumers are highly diverse, including researchers, students, practitioners in museums and companies, and hobbyists. Contributors are not necessarily consumers of data from the archive, and few users cite data in DANS, even their own data. Academic contributors prefer to maintain control over data after deposit so that they can have personal exchanges with those seeking their data. Staff archivists provide essential mediating roles in identifying, acquiring, curating, and disseminating data. Archivists take the perspective of potential consumers in curating data to be findable and usable. Staff balance competing goals, and competing stakeholders, in time spent acquiring and curating data, in maintaining current data and long-term stewardship, and in providing direct access and interfaces to search engines and harvesters. Data archives are fragile in the long run, due to the competing stakeholders, multiple funding sources, and array of interacting technologies and infrastructures on which they depend.

Journal ArticleDOI
TL;DR: Open Humans highlights how a community-centric ecosystem can be used to aggregate personal data from various sources, as well as how these data can be use by academic and citizen scientists through practical, iterative approaches to sharing that strive to balance considerations with participant autonomy, inclusion, and privacy.
Abstract: Background Many aspects of our lives are now digitized and connected to the internet. As a result, individuals are now creating and collecting more personal data than ever before. This offers an unprecedented chance for human-participant research ranging from the social sciences to precision medicine. With this potential wealth of data comes practical problems (e.g., how to merge data streams from various sources), as well as ethical problems (e.g., how best to balance risks and benefits when enabling personal data sharing by individuals). Results To begin to address these problems in real time, we present Open Humans, a community-based platform that enables personal data collections across data streams, giving individuals more personal data access and control of sharing authorizations, and enabling academic research as well as patient-led projects. We showcase data streams that Open Humans combines (e.g., personal genetic data, wearable activity monitors, GPS location records, and continuous glucose monitor data), along with use cases of how the data facilitate various projects. Conclusions Open Humans highlights how a community-centric ecosystem can be used to aggregate personal data from various sources, as well as how these data can be used by academic and citizen scientists through practical, iterative approaches to sharing that strive to balance considerations with participant autonomy, inclusion, and privacy.

Journal ArticleDOI
TL;DR: This paper proposes a new verifiable outsourced CP-ABE for big data privacy and access control in the cloud that reduces the computational overhead of encryption and decryption by outsourcing the heavy computations to the proxy server and proves that the scheme is efficient.
Abstract: The foremost security concerns for big data in the cloud are privacy and access control. Ciphertext-policy attribute based encryption (CP-ABE) is an effective cryptographic solution for above concerns, but the existing CP-ABE schemes are not suitable for big data in the cloud as they require huge computation time for encryption and decryption process. In this paper, we propose a new verifiable outsourced CP-ABE for big data privacy and access control in the cloud. Our scheme reduces the computational overhead of encryption and decryption by outsourcing the heavy computations to the proxy server. Our scheme also verifies the correctness of the data along with the outsourcing computations. Further, our scheme limits the data access for a set of users instead of providing an infinite number of times data access, which is essentially required for commercial applications. In security analysis, we prove that our scheme is secure against chosen plain-text attack, collusion and proxy attacks. Performance analysis proves that our scheme is efficient.

Journal ArticleDOI
TL;DR: This work reviews software platforms for managing, analyzing, and sharing genomic data, with an emphasis on data commons, but also cover data ecosystems and data lakes.

Journal ArticleDOI
TL;DR: This paper proposes an edge computing platform architecture that supports service migration with different options of granularity (either entire service/data migration, or proactive application-aware data migration) across heterogeneous edge devices (either MEC-based servers or resource-poor Fog devices) that host virtualized resources (Docker Containers).
Abstract: The Multi-access Edge Computing (MEC) and Fog Computing paradigms are enabling the opportunity to have middleboxes either statically or dynamically deployed at network edges acting as local proxies with virtualized resources for supporting and enhancing service provisioning in edge localities. However, migration of edge-enabled services poses significant challenges in the edge computing environment. In this paper, we propose an edge computing platform architecture that supports service migration with different options of granularity (either entire service/data migration, or proactive application-aware data migration) across heterogeneous edge devices (either MEC-based servers or resource-poor Fog devices) that host virtualized resources (Docker Containers). The most innovative elements of the technical contribution of our work include i) the possibility to select either an application-agnostic or an application-aware approach, ii) the possibility to choose the appropriate application-aware approach (e.g., based on data access frequencies), iii) an automatic edge services placement support with the aim of finding a more effective placement with low energy consumption, and iv) the in-lab experimentation of the performance achieved over rapidly deployable environments with resource-limited edges such as Raspberry Pi devices.

Proceedings ArticleDOI
01 Jan 2019
TL;DR: This paper forms the data caching problem as an integer programming problem, and maximizes the revenue of the service provider while satisfying a constraint for data access latency, and results reveal that this approach significantly outperform the baseline approaches.
Abstract: With the rapid increase in the use of mobile devices in people's daily lives, mobile data traffic is exploding in recent years. In the edge computing environment where edge servers are deployed around mobile users, caching popular data on edge servers can ensure mobile users' fast access to those data and reduce the data traffic between mobile users and the centralized cloud. Existing studies consider the data cache problem with a focus on the reduction of network delay and the improvement of mobile devices' energy efficiency. In this paper, we attack the data caching problem in the edge computing environment from the service providers' perspective, who would like to maximize their venues of caching their data. This problem is complicated because data caching produces benefits at a cost and there usually is a trade-off in-between. In this paper, we formulate the data caching problem as an integer programming problem, and maximizes the revenue of the service provider while satisfying a constraint for data access latency. Extensive experiments are conducted on a real-world dataset that contains the locations of edge servers and mobile users, and the results reveal that our approach significantly outperform the baseline approaches.

Book ChapterDOI
26 Sep 2019
TL;DR: The European General Data Protection Regulation (GDPR) came into effect on May 25, 2018 and introduced new rights for users to access data collected about them.
Abstract: Online tracking has mostly been studied by passively measuring the presence of tracking services on websites (i) without knowing what data these services collect, (ii) the reasons for which specific purposes it is collected, (iii) or if the used practices are disclosed in privacy policies. The European General Data Protection Regulation (GDPR) came into effect on May 25, 2018 and introduced new rights for users to access data collected about them.