scispace - formally typeset
Search or ask a question

Showing papers on "Data access published in 2018"


Journal ArticleDOI
TL;DR: The Online Labour Index (OLI) as discussed by the authors is a data repository for the data underlying the online labour index, which is used by the International Organization of Industrial Relations (IOLR).

268 citations


Journal ArticleDOI
TL;DR: A secure system to devise a novel two-fold access control mechanism, which is self-adaptive for both normal and emergency situations, is formally proved secure, and extensive comparison and simulations demonstrate its efficiency.

267 citations


Proceedings ArticleDOI
01 Jul 2018
TL;DR: The framework of ontology-based data access is presented, a semantic paradigm for providing a convenient and user-friendly access to data repositories, which has been actively developed and studied in the past decade.
Abstract: We present the framework of ontology-based data access, a semantic paradigm for providing a convenient and user-friendly access to data repositories, which has been actively developed and studied in the past decade. Focusing on relational data sources, we discuss the main ingredients of ontology-based data access, key theoretical results, techniques, applications and future challenges.

251 citations


Proceedings ArticleDOI
15 Aug 2018
TL;DR: The NSRR provides a single point of access to analysis-ready physiological signals from polysomnography obtained from multiple sources, and a wide variety of clinical data to facilitate sleep research, and provides the design of a functional architecture for implementing a Sleep Data Commons.
Abstract: Objective: The gold standard for diagnosing sleep disorders is polysomnography, which generates extensive data about biophysical changes occurring during sleep. We developed the National Sleep Research Resource (NSRR), a comprehensive system for sharing sleep data. The NSRR embodies elements of a data commons aimed at accelerating research to address critical questions about the impact of sleep disorders on important health outcomes. Approach: We used a metadata-guided approach, with a set of common sleep-specific terms enforcing uniform semantic interpretation of data elements across three main components: (1) annotated datasets; (2) user interfaces for accessing data; and (3) computational tools for the analysis of polysomnography recordings. We incorporated the process for managing dataset-specific data use agreements, evidence of Institutional Review Board review, and the corresponding access control in the NSRR web portal. The metadata-guided approach facilitates structural and semantic interoperability, ultimately leading to enhanced data reusability and scientific rigor. Results: The authors curated and deposited retrospective data from 10 large, NIH-funded sleep cohort studies, including several from the Trans-Omics for Precision Medicine (TOPMed) program, into the NSRR. The NSRR currently contains data on 26,808 subjects and 31,166 signal files in European Data Format. Launched in April 2014, over 3000 registered users have downloaded over 130 terabytes of data. Conclusions: The NSRR offers a use case and an example for creating a full-fledged data commons. It provides a single point of access to analysis-ready physiological signals from polysomnography obtained from multiple sources, and a wide variety of clinical data to facilitate sleep research. The NIH Data Commons (or Commons) is an ambitious vision for a shared virtual space to allow digital objects to be stored and computed upon by the scientific community. The Commons would allow investigators to find, manage, share, use and reuse data, software, metadata and workflows. It imagines an ecosystem that makes digital objects Findable, Accessible, Interoperable and Reusable (FAIR). Four components are considered integral parts of the Commons: a computing resource for accessing and processing of digital objects; a "digital object compliance model" that describes the properties of digital objects that enable them to be FAIR; datasets that adhere to the digital object compliance model; and software and services to facilitate access to and use of data. This paper describes the contributions of NSRR along several aspects of the Commons vision: metadata for sleep research digital objects; a collection of annotated sleep data sets; and interfaces and tools for accessing and analyzing such data. More importantly, the NSRR provides the design of a functional architecture for implementing a Sleep Data Commons. The NSRR also reveals complexities and challenges involved in making clinical sleep data conform to the FAIR principles. Future directions: Shared resources offered by emerging resources such as cloud instances provide promising platforms for the Data Commons. However, simply expanding storage or adding compute power may not allow us to cope with the rapidly expanding volume and increasing complexity of biomedical data. Concurrent efforts must be spent to address digital object organization challenges. To make our approach future-proof, we need to continue advancing research in data representation and interfaces for human-data interaction. A possible next phase of NSRR is the creation of a universal self-descriptive sequential data format. The idea is to break large, unstructured, sequential data files into minimal, semantically meaningful, fragments. Such fragments can be indexed, assembled, retrieved, rendered, or repackaged on-the-fly, for multitudes of application scenarios. Data points in such a fragment will be locally embedded with relevant metadata labels, governed by terminology and ontology. Potential benefits of such an approach may include precise levels of data access, increased analysis readiness with on-the-fly data conversion, multi-level data discovery and support for effective web-based visualization of contents in large sequential files.

173 citations


Proceedings ArticleDOI
06 Jul 2018
TL;DR: This work proposes a decentralized system of data management for IoT devices where all data access permission is en-forced using smart contracts and the audit trail of data access is stored in the blockchain.
Abstract: Due to the centralization of authority in the management of data generated by IoT devices, there is a lack of transparency in how user data is being shared among third party entities. With the popularity of adoption of blockchain technology, which provide decentralized management of assets such as currency as seen in Bitcoin, we propose a decentralized system of data management for IoT devices where all data access permission is en-forced using smart contracts and the audit trail of data access is stored in the blockchain. With smart contracts applications, multiple parties can specify rules to govern their interactions which is independently enforced in the blockchain without the need for a centralized system. We provide a framework that store the hash of the data in the blockchain and store the raw data in a secure storage platform using trusted execution environment (TEE). In particular, we consider Intel SGX as a part of TEE that ensure data security and privacy for sensitive part of the application (code and data).

119 citations


Journal ArticleDOI
TL;DR: An IoT-oriented Data Placement method with privacy preservation, named IDP, is designed in this paper and designed to achieve high resource usage, energy saving and efficient data access, and meanwhile realize privacy preservation of the IoT data.

117 citations


Proceedings ArticleDOI
01 Dec 2018
TL;DR: A Federated learning based Proactive Content Caching (FPCC) scheme, which does not require to gather users' data centrally for training, and which outperforms other learning-based caching algorithms such as m-epsilon-greedy and Thompson sampling in terms of cache efficiency.
Abstract: Content caching is a promising approach in edge computing to cope with the explosive growth of mobile data on 5G networks, where contents are typically placed on local caches for fast and repetitive data access. Due to the capacity limit of caches, it is essential to predict the popularity of files and cache those popular ones. However, the fluctuated popularity of files makes the prediction a highly challenging task. To tackle this challenge, many recent works propose learning based approaches which gather the users' data centrally for training, but they bring a significant issue: users may not trust the central server and thus hesitate to upload their private data. In order to address this issue, we propose a Federated learning based Proactive Content Caching (FPCC) scheme, which does not require to gather users' data centrally for training. The FPCC is based on a hierarchical architecture in which the server aggregates the users' updates using federated averaging, and each user performs training on its local data using hybrid filtering on stacked autoencoders. The experimental results demonstrate that, without gathering user's private data, our scheme still outperforms other learning-based caching algorithms such as m-epsilon-greedy and Thompson sampling in terms of cache efficiency.

116 citations


Journal ArticleDOI
TL;DR: Signac as discussed by the authors is a framework designed to assist in the integration of various specialized data formats, tools and workflows, simplifying data access and modification through a homogeneous data interface that is largely agnostic to the data source.

92 citations


Journal ArticleDOI
TL;DR: BioStudies offers a simple way to describe the study structure, and provides flexible data deposition tools and data access interfaces, and is a resource for authors and publishers for packaging data during the manuscript preparation process.
Abstract: BioStudies (www.ebi.ac.uk/biostudies) is a new public database that organizes data from biological studies. Typically, but not exclusively, a study is associated with a publication. BioStudies offers a simple way to describe the study structure, and provides flexible data deposition tools and data access interfaces. The actual data can be stored either in BioStudies or remotely, or both. BioStudies imports supplementary data from Europe PMC, and is a resource for authors and publishers for packaging data during the manuscript preparation process. It also can support data management needs of collaborative projects. The growth in multiomics experiments and other multi-faceted approaches to life sciences research mean that studies result in a diversity of data outputs in multiple locations. BioStudies presents a solution to ensuring that all these data and the associated publication(s) can be found coherently in the longer term.

85 citations


Proceedings ArticleDOI
11 Jul 2018
TL;DR: This study has implemented the AT&T scheme for managing the access control mechanism of patients data using the XACML access model, that provides hierarchically satisfied access to various data resources.
Abstract: Mobile medicine and health care have been adopted on a very large scale with support from the influx of medical devices and increased usability of remote health services. These are combined with an approachable interest of the patients and a readily approachable conscience about their healthcare. This leads to a huge set of medical data. These data sets require secure transfer, archival and access. In this study, we have proposed an efficient approach to preserve the identity and also protect the privacy of clinical data using highly effective encryption scheme. Moreover, we have also discussed an authorization framework using access of varying degrees. Medical records are often accessed by various entities with varying degrees of authorization. In this study, we have implemented the AT&T scheme for managing the access control mechanism of patients data. Further, encryption is undertaken using ARCANA, that provides hierarchically satisfied access to various data resources. It utilizes the XACML access model to formulate the access control framework. The primary reason for using this model is the regulation to access the data through AT&T based on XACML policies. In addition to this, the encrypting of medical data using various authorization techniques have been required for the proper data access regulation. Which may impact trust of the users in the e-health paradigm and in turn increase the large-scale usability.

76 citations


Journal ArticleDOI
TL;DR: This work presents OptiqueVQS, a query formulation tool designed based on the experience with OBDA applications in Statoil and Siemens and on best HCI practices for interdisciplinary engineering environments, which implements a number of unique techniques distinguishing it from analogous query formulation systems.
Abstract: An important application of semantic technologies in industry has been the formalisation of information models using OWL 2 ontologies and the use of RDF for storing and exchanging application data. Moreover, legacy data can be virtualised as RDF using ontologies following the ontology-based data access (OBDA) approach. In all these applications, it is important to provide domain experts with query formulation tools for expressing their information needs in terms of queries over ontologies. In this work, we present such a tool, OptiqueVQS, which is designed based on our experience with OBDA applications in Statoil and Siemens and on best HCI practices for interdisciplinary engineering environments. OptiqueVQS implements a number of unique techniques distinguishing it from analogous query formulation systems. In particular, it exploits ontology projection techniques to enable graph-based navigation over an ontology during query construction. Secondly, while OptiqueVQS is primarily ontology driven, it exploits sampled data to enhance selection of data values for some data attributes. Finally, OptiqueVQS is built on well-grounded requirements, design rationale, and quality attributes. We evaluated OptiqueVQS with both domain experts and casual users and qualitatively compared our system against prominent visual systems for ontology-driven query formulation and exploration of semantic data. OptiqueVQS is available online and can be downloaded together with an example OBDA scenario.

Journal ArticleDOI
01 Mar 2018
TL;DR: This work extends Apache Spark with respect to both data storage and computing by seamlessly integrating a key-value store, and enhances the MapReduce paradigm to allow flexible optimizations based on random data access to achieve scalability, efficiency, persistence, and flexibility.
Abstract: Massive trajectory data is being generated by GPS-equipped devices, such as cars and mobile phones, which is used increasingly in transportation, location-based services, and urban computing. As a result, a variety of methods have been proposed for trajectory data management and analytics. However, traditional systems and methods are usually designed for very specific data management or analytics needs, which forces users to stitch together heterogeneous systems to analyze trajectory data in an inefficient manner. Targeting the overall data pipeline of big trajectory data management and analytics, we present a unified platform, termed as UlTraMan. In order to achieve scalability, efficiency, persistence, and flexibility, (i) we extend Apache Spark with respect to both data storage and computing by seamlessly integrating a key-value store, and (ii) we enhance the MapReduce paradigm to allow flexible optimizations based on random data access. We study the resulting system's flexibility using case studies on data retrieval, aggregation analyses, and pattern mining. Extensive experiments on real and synthetic trajectory data are reported to offer insight into the scalability and performance of UlTraMan.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: This research proposes a blockchain based secure and efficient data accessibility mechanism for the patient and the doctor in a given healthcare system that can resist to well-known attacks along with maintaining the integrity of the system.
Abstract: The healthcare industry is constantly reforming and adopting new shapes with respect to the technological evolutions and transitions. One of the crucial requirements in the current smart healthcare systems is the protection of patients sensitive data against the potential adversaries. Therefore, it is vital to have secure data access mechanisms that can ensure only authorized entities can access the patients medical information. Hence, this paper considers blockchain technology as a distributed approach protect the data in healthcare systems. This research proposes a blockchain based secure and efficient data accessibility mechanism for the patient and the doctor in a given healthcare system. Proposed system able to protect the privacy of the patients as well. The security analysis of our scheme shows that it can resist to well-known attacks along with maintaining the integrity of the system. Moreover, an Ethereum based implementation has used to verify the feasibility of our proposed system.

Proceedings ArticleDOI
27 May 2018
TL;DR: The Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, and synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.
Abstract: Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.

Journal ArticleDOI
TL;DR: The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments, with a specific focus on data integration and sharing.
Abstract: Introduction: This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative Future medicine will be predictive, preventive, personalized, participatory and digital Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy Data integration and data sharing will be essential to achieve these goals For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers Objectives: The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients To realize our vision, numerous challenges have to be addressed The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing Governance and Policies: Data sharing implies significant security and privacy challenges Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses Interdisciplinary groups have been installed in order to manage change Architectural Framework and Methodology: The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR) Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing Use Cases: From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, ie following the needs of physicians and researchers and aiming at measurable benefits for our patients We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios Discussion: Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture

Journal ArticleDOI
01 Jan 2018-Database
TL;DR: The current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation is presented.
Abstract: The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.

Journal ArticleDOI
TL;DR: It is proved that the P2Q scheme achieves data confidentiality and preserves the data owner’s privacy in a semi-trusted cloud and can significantly reduce response time and provide high search efficiency without compromising on search quality.

Proceedings ArticleDOI
02 Jul 2018
TL;DR: A novel, centralized, attribute based authorization mechanism that uses Attribute Based Encryption (ABE) and allows for delegated secure access of patient records and allows easy delegation of cloud-based EHR's access authority to the medical providers is developed.
Abstract: Medical organizations find it challenging to adopt cloud-based electronic medical records services, due to the risk of data breaches and the resulting compromise of patient data. Existing authorization models follow a patient centric approach for EHR management where the responsibility of authorizing data access is handled at the patients' end. This however creates a significant overhead for the patient who has to authorize every access of their health record. This is not practical given the multiple personnel involved in providing care and that at times the patient may not be in a state to provide this authorization. Hence there is a need of developing a proper authorization delegation mechanism for safe, secure and easy cloud-based EHR management. We have developed a novel, centralized, attribute based authorization mechanism that uses Attribute Based Encryption (ABE) and allows for delegated secure access of patient records. This mechanism transfers the service management overhead from the patient to the medical organization and allows easy delegation of cloud-based EHR's access authority to the medical providers. In this paper, we describe this novel ABE approach as well as the prototype system that we have created to illustrate it.

Journal ArticleDOI
20 Oct 2018-Sensors
TL;DR: This work proposes an authorization system to facilitate access to consumer information and resource trading, based on blockchain technology, oriented to the Smart communities, an evolution of Community Energy Management Systems.
Abstract: Resource consumption in residential areas requires novel contributions in the field of consumer information management and collaborative mechanisms for the exchange of resources, in order to optimize the overall consumption of the community. We propose an authorization system to facilitate access to consumer information and resource trading, based on blockchain technology. Our proposal is oriented to the Smart communities, an evolution of Community Energy Management Systems, in which communities are involved in the monitoring and coordination of resource consumption. The proposed environment allows a more reliable management of monitoring and authorization functions, with secure data access and storage and delegation of controller functions among householders. We provide the definition of virtual assets for energy and water resource sharing as an auction, which encourages the optimization of global consumption and saves resources. The proposed solution is implemented and validated in application scenarios that demonstrate the suitability of the defined consensus mechanism, trustworthiness in the level of provided security for resource monitoring and delegation and reduction on resource consumption by the resource trading contribution.

Journal ArticleDOI
TL;DR: A proposed face identification and resolution scheme based on cloud computing makes full use of the advantages of cloud computing to effectively improve computation power and storage capacity and the experimental result of prototype system indicates that the proposed scheme is practically feasible and can provide efficient face Identification and resolution service.

Journal ArticleDOI
TL;DR: In this paper, the authors describe a standardized IoT infrastructure where data is stored on a DDOS-resistant, fault-tolerant, distributed storage service and data access is managed by a decentralized, trustless blockchain.
Abstract: Today, the number of IoT devices in all aspects of life is exponentially increasing. The cities we are living in are getting smarter and informing us about our surroundings in a contextual manner. However, there lay significant challenges of deploying, managing and collecting data from these devices, in addition to the problem of storing and mining that data for higher-quality IoT services. Blockchain technology, even in today's nascent form, contains the pillars to create a common, distributed, trustless and autonomous infrastructure system. This paper describes a standardized IoT infrastructure; where data is stored on a DDOS-resistant, fault-tolerant, distributed storage service and data access is managed by a decentralized, trustless blockchain. The illustrated system used LoRa as the emerging network technology, Swarm as the distributed data storage and Ethereum as the blockchain platform. Such a data backend will ensure high availability with minimal security risks while replacing traditional backend systems with a single "smart contract".

Patent
01 Jun 2018
TL;DR: In this article, a data processing data inventory generation system is configured to: (1) generate a data model (e.g., a data inventory) for one or more data assets utilized by a particular organization; (2) generate respective data inventory for each of the one ormore data assets; and (3) map relationships between aspects of the data inventory, the data assets, etc. within the data model.
Abstract: In particular embodiments, a data processing data inventory generation system is configured to: (1) generate a data model (e.g., a data inventory) for one or more data assets utilized by a particular organization; (2) generate a respective data inventory for each of the one or more data assets; and (3) map one or more relationships between one or more aspects of the data inventory, the one or more data assets, etc. within the data model. In particular embodiments, a data asset (e.g., data system, software application, etc.) may include any entity that collects, processes, contains, and/or transfers personal data (e.g., a software application, database, website, server, etc.). A data asset may include any software or device (e.g., server or servers) utilized by a particular entity for such data collection, processing, transfer, storage, etc. The system may then utilize the generated model to fulfil a data subject access request.

Journal ArticleDOI
TL;DR: This paper argues for adapting and extending related work methods in the field of big data software, using Hadoop and Spark frameworks to provide an optimal and efficient architecture for biomedical image analysis.

Journal ArticleDOI
01 Jun 2018
TL;DR: Sundial is presented, an in-memory distributed optimistic concurrency control protocol that dynamically determines the logical order among transactions at runtime, based on their data access patterns, to reduce the transaction abort rate and reduce the overhead of remote data accesses.
Abstract: Distributed transactions suffer from poor performance due to two major limiting factors. First, distributed transactions suffer from high latency because each of their accesses to remote data incurs a long network delay. Second, this high latency increases the likelihood of contention among distributed transactions, leading to high abort rates and low performance.We present Sundial, an in-memory distributed optimistic concurrency control protocol that addresses these two limitations. First, to reduce the transaction abort rate, Sundial dynamically determines the logical order among transactions at runtime, based on their data access patterns. Sundial achieves this by applying logical leases to each data element, which allows the database to dynamically calculate a transaction's logical commit timestamp. Second, to reduce the overhead of remote data accesses, Sundial allows the database to cache remote data in a server's local main memory and maintains cache coherence. With logical leases, Sundial integrates concurrency control and cache coherence into a simple unified protocol. We evaluate Sundial against state-of-the-art distributed concurrency control protocols. Sundial outperforms the next-best protocol by up to 57% under high contention. Sundial's caching scheme improves performance by up to 4.6× in workloads with high access skew.

Journal ArticleDOI
TL;DR: An overview of the landscape of online data infrastructures in ecology and evolutionary biology is provided, and an online collaborative platform to keep a community-driven, updated list of the best sources that enable search for data in one interface is introduced.
Abstract: Open access to data is revolutionizing the sciences. To allow ecologists and evolutionary biologists to confidently find and use the existing data, we provide an overview of the landscape of online data infrastructures, and highlight the key points to consider when using open data. We introduce an online collaborative platform to keep a community-driven, updated list of the best sources that enable search for data in one interface. In doing so, our aim is to lower the barrier to accessing open data, and encourage its use by researchers hoping to increase the scope, reliability and value of their findings.

Posted Content
TL;DR: Private Data Objects are presented, a technology that enables mutually untrusted parties to run smart contracts over private data through the integration of a distributed ledger and Intel Secure Guard Extensions (SGX).
Abstract: We present Private Data Objects (PDOs), a technology that enables mutually untrusted parties to run smart contracts over private data. PDOs result from the integration of a distributed ledger and Intel Secure Guard Extensions (SGX). In particular, contracts run off-ledger in secure enclaves using Intel SGX, which preserves data confidentiality, execution integrity and enforces data access policies (as opposed to raw data access). A distributed ledger verifies and records transactions produced by PDOs, in order to provide a single authoritative instance of such objects. This allows contracting parties to retrieve and check data related to contract and enclave instances, as well as to serialize and commit contract state updates. The design and the development of PDOs is an ongoing research effort, and open source code is available and hosted by Hyperledger Labs [5, 7].

Journal ArticleDOI
TL;DR: Widespread use of ADA-M will aid researchers in globally searching and prescreening potential data and/or biospecimen resources for compatibility with their research plans in a responsible and efficient manner, increasing likelihood of timely DAC approvals while also significantly reducing time and effort DACs, RECs, and IRBs spend evaluating resource requests and research proposals.
Abstract: Given the data-rich nature of modern biomedical research, there is a pressing need for a systematic, structured, computer-readable way to capture, communicate, and manage sharing rules that apply to biomedical resources. This is essential for responsible recording, versioning, communication, querying, and actioning of resource sharing plans. However, lack of a common “information model” for rules and conditions that govern the sharing of materials, methods, software, data, and knowledge creates a fundamental barrier. Without this, it can be virtually impossible for Research Ethics Committees (RECs), Institutional Review Boards (IRBs), Data Access Committees (DACs), biobanks, and end users to confidently track, manage, and interpret applicable legal and ethical requirements. This raises costs and burdens of data stewardship and decreases efficient and responsible access to data, biospecimens, and other resources. To address this, the GA4GH and IRDiRC organizations sponsored the creation of the Automatable Discovery and Access Matrix (ADA-M, read simply as “Adam”). ADA-M is a comprehensive information model that provides the basis for producing structured metadata “Profiles” of regulatory conditions, thereby enabling efficient application of those conditions across regulatory spheres. Widespread use of ADA-M will aid researchers in globally searching and prescreening potential data and/or biospecimen resources for compatibility with their research plans in a responsible and efficient manner, increasing likelihood of timely DAC approvals while also significantly reducing time and effort DACs, RECs, and IRBs spend evaluating resource requests and research proposals. Extensive online documentation, software support, video guides, and an Application Programming Interface (API) for ADA-M have been made available.

Journal ArticleDOI
TL;DR: This paper proposes a novel efficient privacy-preserving kNN classification protocol over semantically secure hybrid encrypted cloud database using Paillier and ElGamal cryptosystems and shows that the computation cost of the protocol is about two orders of magnitude lower than that of the state-of-the-art protocol while achieving the same security and privacy properties.
Abstract: Nowadays, individuals and companies increasingly tend to outsource their databases and further data operations to cloud service provides. However, utilizing the cost-saving advantages of cloud computing brings about the risk of violating database security and user’s privacy. In this paper, we focus on the problem of privacy-preserving k-nearest neighbor (kNN) classification, in which a query user (QU) submits an encrypted query point to a cloud server (CS) and asks for the kNN classification labels based on the encrypted cloud database outsourced by a data owner (DO), without disclosing any privacy of DO or QU to CS. Previous secure kNN query schemes either cannot fully achieve required security properties or introduce heavy computation costs, making them not practical in real-world applications. To better solve this problem, we propose a novel efficient privacy-preserving kNN classification protocol over semantically secure hybrid encrypted cloud database using Paillier and ElGamal cryptosystems. The proposed protocol protects both database security and query privacy and also hides data access patterns from CS. We formally analyze the security of our protocol and evaluate the performance through extensive experiments. The experiment results show that the computation cost of our protocol is about two orders of magnitude lower than that of the state-of-the-art protocol while achieving the same security and privacy properties.

Proceedings ArticleDOI
15 Feb 2018
TL;DR: An architecture-aware graph clustering algorithm is developed that exploits the FPGA-HMC platform»s capability to improve data locality and memory access efficiency and is further improved by designing a memory request merging unit to take advantage of the increased data locality resulting fromgraph clustering.
Abstract: Graph analytics, which explores the relationships among interconnected entities, is becoming increasingly important due to its broad applicability, from machine learning to social sciences. However, due to the irregular data access patterns in graph computations, one major challenge for graph processing systems is performance. The algorithms, softwares, and hardwares that have been tailored for mainstream parallel applications are generally not effective for massive, sparse graphs from the real-world problems, due to their complex and irregular structures. To address the performance issues in large-scale graph analytics, we leverage the exceptional random access performance of the emerging Hybrid Memory Cube (HMC) combined with the flexibility and efficiency of modern FPGAs. In particular, we develop a collaborative software/hardware technique to perform a level-synchronized Breadth First Search (BFS) on a FPGA-HMC platform. From the software perspective, we develop an architecture-aware graph clustering algorithm that exploits the FPGA-HMC platform»s capability to improve data locality and memory access efficiency. From the hardware perspective, we further improve the FPGA-HMC graph processor architecture by designing a memory request merging unit to take advantage of the increased data locality resulting from graph clustering. We evaluate the performance of our BFS implementation using the AC-510 development kit from Micron and achieve $2.8 \times$ average performance improvement compared to the latest FPGA-HMC based graph processing system over a set of benchmarks from a wide range of applications.

Posted Content
TL;DR: A three-year qualitative study of DANS, a digital data archive containing more than 50 years of heterogeneous data types, provides new insights to the uses, users, and roles of these systems and services as discussed by the authors.
Abstract: Digital data archives play essential roles in knowledge infrastructures by mediating access to data within and between communities. This three-year qualitative study of DANS, a digital data archive containing more than 50 years of heterogeneous data types, provides new insights to the uses, users, and roles of these systems and services. Consumers are highly diverse, including researchers, students, practitioners in museums and companies, and hobbyists. Contributors are not necessarily consumers of data from the archive, and few users cite data in DANS, even their own data. Academic contributors prefer to maintain control over data after deposit so that they can have personal exchanges with those seeking their data. Staff archivists provide essential mediating roles in identifying, acquiring, curating, and disseminating data. Archivists take the perspective of potential consumers in curating data to be findable and usable. Staff balance competing goals, and competing stakeholders, in time spent acquiring and curating data, in maintaining current data and long-term stewardship, and in providing direct access and interfaces to search engines and harvesters. Data archives are fragile in the long run, due to the competing stakeholders, multiple funding sources, and array of interacting technologies and infrastructures on which they depend.