scispace - formally typeset
Search or ask a question

Showing papers on "Data access published in 2013"


Proceedings ArticleDOI
22 Jun 2013
TL;DR: The introduction of Trinity, a general purpose graph engine over a distributed memory cloud that leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance, which supports fast graph exploration as well as efficient parallel computing.
Abstract: Computations performed by graph algorithms are data driven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot provide the level of efficient random access required by graph computation. On the other hand, memory-based approaches usually do not scale due to the capacity limit of single machines. In this paper, we introduce Trinity, a general purpose graph engine over a distributed memory cloud. Through optimized memory management and network communication, Trinity supports fast graph exploration as well as efficient parallel computing. In particular, Trinity leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance. These enable Trinity to support efficient online query processing and offline analytics on large graphs with just a few commodity machines. Furthermore, Trinity provides a high level specification language called TSL for users to declare data schema and communication protocols, which brings great ease-of-use for general purpose graph management and computing. Our experiments show Trinity's performance in both low latency graph queries as well as high throughput graph analytics on web-scale, billion-node graphs.

468 citations


Journal ArticleDOI
TL;DR: This paper proposes a role-based encryption (RBE) scheme that integrates the cryptographic techniques with RBAC, and presents a secure RBE-based hybrid cloud storage architecture that allows an organization to store data securely in a public cloud, while maintaining the sensitive information related to the organization's structure in a private cloud.
Abstract: With the rapid developments occurring in cloud computing and services, there has been a growing trend to use the cloud for large-scale data storage. This has raised the important security issue of how to control and prevent unauthorized access to data stored in the cloud. One well known access control model is the role-based access control (RBAC), which provides flexible controls and management by having two mappings, users to roles and roles to privileges on data objects. In this paper, we propose a role-based encryption (RBE) scheme that integrates the cryptographic techniques with RBAC. Our RBE scheme allows RBAC policies to be enforced for the encrypted data stored in public clouds. Based on the proposed scheme, we present a secure RBE-based hybrid cloud storage architecture that allows an organization to store data securely in a public cloud, while maintaining the sensitive information related to the organization's structure in a private cloud. We describe a practical implementation of the proposed RBE-based architecture and discuss the performance results. We demonstrate that users only need to keep a single key for decryption, and system operations are efficient regardless of the complexity of the role hierarchy and user membership in the system.

353 citations


Journal Article
TL;DR: In this article, the importance of providing individuals with access to their data in usable format is emphasized, which will let individuals share the wealth created by their information and incentivize developers to offer user-side features and applications harnessing the value of big data.
Abstract: We live in an age of “big data.” Data have become the raw material of production, a new source for immense economic and social value. Advances in data mining and analytics and the massive increase in computing power and data storage capacity have expanded by orders of magnitude the scope of information available for businesses and government. Data are now available for analysis in raw form, escaping the confines of structured databases and enhancing researchers’ abilities to identify correlations and conceive of new, unanticipated uses for existing information. In addition, the increasing number of people, devices, and sensors that are now connected by digital networks has revolutionized the ability to generate, communicate, share, and access data. Data creates enormous value for the world economy, driving innovation, productivity, efficiency and growth. At the same time, the “data deluge” presents privacy concerns which could stir a regulatory backlash dampening the data economy and stifling innovation. In order to craft a balance between beneficial uses of data and in individual privacy, policymakers must address some of the most fundamental concepts of privacy law, including the definition of “personally identifiable information”, the role of individual control, and the principles of data minimization and purpose limitation. This article emphasizes the importance of providing individuals with access to their data in usable format. This will let individuals share the wealth created by their information and incentivize developers to offer user-side features and applications harnessing the value of big data. Where individual access to data is impracticable, data are likely to be de-identified to an extent sufficient to diminish privacy concerns. In addition, organizations should be required to disclose their decisional criteria, since in a big data world it is often not the data but rather the inferences drawn from them that give cause for concern.

276 citations


Journal ArticleDOI
TL;DR: A novel hierarchical data structure for the efficient representation of sparse, time-varying volumetric data discretized on a 3D grid that facilitates adaptive grid sampling, and the inherent acceleration structure leads to fast algorithms that are well-suited for simulations.
Abstract: We have developed a novel hierarchical data structure for the efficient representation of sparse, time-varying volumetric data discretized on a 3D grid. Our “VDB”, so named because it is a Volumetric, Dynamic grid that shares several characteristics with Bptrees, exploits spatial coherency of time-varying data to separately and compactly encode data values and grid topology. VDB models a virtually infinite 3D index space that allows for cache-coherent and fast data access into sparse volumes of high resolution. It imposes no topology restrictions on the sparsity of the volumetric data, and it supports fast (average O(1)) random access patterns when the data are inserted, retrieved, or deleted. This is in contrast to most existing sparse volumetric data structures, which assume either static or manifold topology and require specific data access patterns to compensate for slow random access. Since the VDB data structure is fundamentally hierarchical, it also facilitates adaptive grid sampling, and the inherent acceleration structure leads to fast algorithms that are well-suited for simulations. As such, VDB has proven useful for several applications that call for large, sparse, animated volumes, for example, level set dynamics and cloud modeling. In this article, we showcase some of these algorithms and compare VDB with existing, state-of-the-art data structures.

263 citations


Posted Content
TL;DR: This paper focuses on solving the k-nearest neighbor (kNN) query problem over encrypted database outsourced to a cloud: a user issues an encrypted query record to the cloud, and the cloud returns the k closest records to the user.
Abstract: For the past decade, query processing on relational data has been studied extensively, and many theoretical and practical solutions to query processing have been proposed under various scenarios. With the recent popularity of cloud computing, users now have the opportunity to outsource their data as well as the data management tasks to the cloud. However, due to the rise of various privacy issues, sensitive data (e.g., medical records) need to be encrypted before outsourcing to the cloud. In addition, query processing tasks should be handled by the cloud; otherwise, there would be no point to outsource the data at the first place. To process queries over encrypted data without the cloud ever decrypting the data is a very challenging task. In this paper, we focus on solving the k-nearest neighbor (kNN) query problem over encrypted database outsourced to a cloud: a user issues an encrypted query record to the cloud, and the cloud returns the k closest records to the user. We first present a basic scheme and demonstrate that such a naive solution is not secure. To provide better security, we propose a secure kNN protocol that protects the confidentiality of the data, user's input query, and data access patterns. Also, we empirically analyze the efficiency of our protocols through various experiments. These results indicate that our secure protocol is very efficient on the user end, and this lightweight scheme allows a user to use any mobile device to perform the kNN query.

250 citations


Book ChapterDOI
21 Oct 2013
TL;DR: The architecture and technologies underpinning the OBDA system Ontop are presented and it is demonstrated that, for standard ontologies, queries and data stored in relational databases, Ontop is fast, efficient and produces SQL rewritings of high quality.
Abstract: We present the architecture and technologies underpinning the OBDA system Ontop and taking full advantage of storing data in relational databases. We discuss the theoretical foundations of Ontop: the tree-witness query rewriting, $\mathcal{T}$ -mappings and optimisations based on database integrity constraints and SQL features. We analyse the performance of Ontop in a series of experiments and demonstrate that, for standard ontologies, queries and data stored in relational databases, Ontop is fast, efficient and produces SQL rewritings of high quality.

175 citations


Proceedings ArticleDOI
17 Jul 2013
TL;DR: By somewhat overcoming data quality issues with data quantity, data access restrictions with on-demand cloud computing, causative analysis with correlative data analytics, and model-driven with evidence-driven applications, appropriate actions can be undertaken with the obtained information.
Abstract: Summary form only given. At present, it is projected that about 4 zettabytes (or 10**21 bytes) of electronic data are being generated per year by everything from underground physics experiments to retail transactions to security cameras to global positioning systems. In the U. S., major research programs are being funded to deal with big data in all five economic sectors (i.e., services, manufacturing, construction, agriculture and mining) of the economy. Big Data is a term applied to data sets whose size is beyond the ability of available tools to undertake their acquisition, access, analytics and/or application in a reasonable amount of time. Whereas Tien (2003) forewarned about the data rich, information poor (DRIP) problems that have been pervasive since the advent of large-scale data collections or warehouses, the DRIP conundrum has been somewhat mitigated by the Big Data approach which has unleashed information in a manner that can support informed - yet, not necessarily defensible or knowledgeable - decisions or choices. Thus, by somewhat overcoming data quality issues with data quantity, data access restrictions with on-demand cloud computing, causative analysis with correlative data analytics, and model-driven with evidence-driven applications, appropriate actions can be undertaken with the obtained information. New acquisition, access, analytics and application technologies are being developed to further Big Data as it is being employed to help resolve the 14 grand challenges (identified by the National Academy of Engineering in 2008), underpin the 10 breakthrough technologies (compiled by the Massachusetts Institute of Technology in 2013) and support the Third Industrial Revolution of mass customization.

173 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a solution to the data rich, information poor (DRIP) problem that has been pervasive since the advent of large-scale data collections or warehouses.
Abstract: At present, it is projected that about 4 zettabytes (or 10**21 bytes) of digital data are being generated per year by everything from underground physics experiments to retail transactions to security cameras to global positioning systems. In the U. S., major research programs are being funded to deal with big data in all five sectors (i.e., services, manufacturing, construction, agriculture and mining) of the economy. Big Data is a term applied to data sets whose size is beyond the ability of available tools to undertake their acquisition, access, analytics and/or application in a reasonable amount of time. Whereas Tien (2003) forewarned about the data rich, information poor (DRIP) problems that have been pervasive since the advent of large-scale data collections or warehouses, the DRIP conundrum has been somewhat mitigated by the Big Data approach which has unleashed information in a manner that can support informed — yet, not necessarily defensible or valid — decisions or choices. Thus, by somewhat overcoming data quality issues with data quantity, data access restrictions with on-demand cloud computing, causative analysis with correlative data analytics, and model-driven with evidence-driven applications, appropriate actions can be undertaken with the obtained information. New acquisition, access, analytics and application technologies are being developed to further Big Data as it is being employed to help resolve the 14 grand challenges (identified by the National Academy of Engineering in 2008), underpin the 10 breakthrough technologies (compiled by the Massachusetts Institute of Technology in 2013) and support the Third Industrial Revolution of mass customization.

158 citations


Journal ArticleDOI
01 Sep 2013
TL;DR: A novel semantic hash partitioning approach is presented and a Semantic HAsh Partitioning-Enabled distributed RDF data management system is implemented, called Shape, which scales well and can process big RDF datasets more efficiently than existing approaches.
Abstract: Massive volumes of big RDF data are growing beyond the performance capacity of conventional RDF data management systems operating on a single node. Applications using large RDF data demand efficient data partitioning solutions for supporting RDF data access on a cluster of compute nodes. In this paper we present a novel semantic hash partitioning approach and implement a Semantic HAsh Partitioning-Enabled distributed RDF data management system, called Shape. This paper makes three original contributions. First, the semantic hash partitioning approach we propose extends the simple hash partitioning method through direction-based triple groups and direction-based triple replications. The latter enhances the former by controlled data replication through intelligent utilization of data access locality, such that queries over big RDF graphs can be processed with zero or very small amount of inter-machine communication cost. Second, we generate locality-optimized query execution plans that are more efficient than popular multi-node RDF data management systems by effectively minimizing the inter-machine communication cost for query processing. Third but not the least, we provide a suite of locality-aware optimization techniques to further reduce the partition size and cut down on the inter-machine communication cost during distributed query processing. Experimental results show that our system scales well and can process big RDF datasets more efficiently than existing approaches.

142 citations


Proceedings ArticleDOI
08 May 2013
TL;DR: An access control framework for cloud storage systems that achieves fine-grained access control based on an adapted Ciphertext-Policy Attribute-based Encryption (CP-ABE) approach is designed and an efficient attribute revocation method is proposed to cope with the dynamic changes of users' access privileges in large-scale systems.
Abstract: A cloud storage service allows data owner to outsource their data to the cloud and through which provide the data access to the users. Because the cloud server and the data owner are not in the same trust domain, the semi-trusted cloud server cannot be relied to enforce the access policy. To address this challenge, traditional methods usually require the data owner to encrypt the data and deliver decryption keys to authorized users. These methods, however, normally involve complicated key management and high overhead on data owner. In this paper, we design an access control framework for cloud storage systems that achieves fine-grained access control based on an adapted Ciphertext-Policy Attribute-based Encryption (CP-ABE) approach. In the proposed scheme, an efficient attribute revocation method is proposed to cope with the dynamic changes of users' access privileges in large-scale systems. The analysis shows that the proposed access control scheme is provably secure in the random oracle model and efficient to be applied into practice.

131 citations


Proceedings ArticleDOI
12 Feb 2013
TL;DR: This work designed a novel workload-independent data structure called the VT-tree which extends the LSM-tree to efficiently handle sequential and file-system workloads and provides efficient and scalable access to both large and small data items regardless of the access pattern.
Abstract: As the Internet and the amount of data grows, the variability of data sizes grows too--from small MP3 tags to large VM images. With applications using increasingly more complex queries and larger data-sets, data access patterns have become more complex and randomized. Current storage systems focus on optimizing for one band of workloads at the expense of other workloads due to limitations in existing storage system data structures. We designed a novel workload-independent data structure called the VT-tree which extends the LSM-tree to efficiently handle sequential and file-system workloads. We designed a system based solely on VT-trees which offers concurrent access to data via file system and database APIs, transactional guarantees, and consequently provides efficient and scalable access to both large and small data items regardless of the access pattern. Our evaluation shows that our user-level system has 2-6.6× better performance for random-write workloads and only a small average overhead for other workloads.

Proceedings ArticleDOI
14 Apr 2013
TL;DR: This work addresses the problem of optimized VM placement - given the location of the data sets, it is needed to determine the locations for placing VMs so as to minimize data access latencies while satisfying system constraints.
Abstract: Many cloud applications are data intensive requiring the processing of large data sets and the MapReduce/Hadoop architecture has become the de facto processing framework for these applications. Large data sets are stored in data nodes in the cloud which are typically SAN or NAS devices. Cloud applications process these data sets using a large number of application virtual machines (VMs), with the total completion time being an important performance metric. There are many factors that affect the total completion time of the processing task such as the load on the individual servers, the task scheduling mechanism, communication and data access bottlenecks, etc. One dominating factor that affects completion times for data intensive applications is the access latencies from processing nodes to data nodes. Ideally, one would like to keep all data access local to minimize access latency but this is often not possible due to the size of the data sets, capacity constraints in processing nodes which constrain VMs from being placed in their ideal location and so on. When it is not possible to keep all data access local, one would like to optimize the placement of VMs so that the impact of data access latencies on completion times is minimized. We address this problem of optimized VM placement - given the location of the data sets, we need to determine the locations for placing the VMs so as to minimize data access latencies while satisfying system constraints. We present optimal algorithms for determining the VM locations satisfying various constraints and with objectives that capture natural tradeoffs between minimizing latencies and incurring bandwidth costs. We also consider the problem of incorporating inter-VM latency constraints. In this case, the associated location problem is NP-hard with no effective approximation within a factor of 2 - ϵ for any ϵ > 0. We discuss an effective heuristic for this case and evaluate by simulation the impact of the various tradeoffs in the optimization objectives.

Patent
08 Mar 2013
TL;DR: In this paper, software, firmware, and systems that migrate functionality of a source physical computing device to a destination virtual machine are described, and a non-production copy of data associated with a source PC device is created.
Abstract: Software, firmware, and systems are described herein that migrate functionality of a source physical computing device to a destination virtual machine. A non-production copy of data associated with a source physical computing device is created. A configuration of the source physical computing device is determined. A configuration for a destination virtual machine is determined based at least in part on the configuration of the source physical computing device. The destination virtual machine is provided access to data and metadata associated with the source physical computing device using the non-production copy of data associated with the source physical computing device.

Proceedings ArticleDOI
12 Feb 2013
TL;DR: Shroud is presented, a general storage system that hides data access patterns from the servers running it, protecting user privacy, and shows, via new techniques such as oblivious aggregation, how to securely use many inexpensive secure coprocessors acting in parallel to improve request latency.
Abstract: Recent events have shown online service providers the perils of possessing private information about users Encrypting data mitigates but does not eliminate this threat: the pattern of data accesses still reveals information Thus, we present Shroud, a general storage system that hides data access patterns from the servers running it, protecting user privacy Shroud functions as a virtual disk with a new privacy guarantee: the user can look up a block without revealing the block's address Such a virtual disk can be used for many purposes, including map lookup, microblog search, and social networkingShroud aggressively targets hiding accesses among hundreds of terabytes of data We achieve our goals by adapting oblivious RAM algorithms to enable large-scale parallelization Specifically, we show, via new techniques such as oblivious aggregation, how to securely use many inexpensive secure coprocessors acting in parallel to improve request latency Our evaluation combines large-scale emulation with an implementation on secure coprocessors and suggests that these adaptations bring private data access closer to practicality

Patent
11 Mar 2013
TL;DR: In this article, the authors present a data aggregation framework that can be configured to optimize aggregate operations over non-relational distributed databases, including data access, data retrieval, data writes, indexing, etc.
Abstract: Database systems and methods that implement a data aggregation framework are provided. The framework can be configured to optimize aggregate operations over non-relational distributed databases, including, for example, data access, data retrieval, data writes, indexing, etc. Various embodiments are configured to aggregate multiple operations and/or commands, where the results (e.g., database documents and computations) captured from the distributed database are transformed as they pass through an aggregation operation. The aggregation operation can be defined as a pipeline which enables the results from a first operation to be redirected into the input of a subsequent operation, which output can be redirected into further subsequent operations. Computations may also be executed at each stage of the pipeline, where each result at each stage can be evaluated by the computation to return a result. Execution of the pipeline can be optimized based on data dependencies and re-ordering of the pipeline operations.

Patent
08 Mar 2013
TL;DR: In this article, software, firmware, and systems that migrate functionality of a source physical computing device to a destination physical device are described and a non-production copy of data associated with the source physical device is created.
Abstract: Software, firmware, and systems are described herein that migrate functionality of a source physical computing device to a destination physical computing device. A non-production copy of data associated with a source physical computing device is created. A configuration of the source physical computing device is determined. A configuration for a destination physical computing device is determined based at least in part on the configuration of the source physical computing device. The destination physical computing device is provided access to data and metadata associated with the source physical computing device using the non-production copy of data associated with the source physical computing device.

Patent
10 May 2013
TL;DR: In this paper, the authors present various embodiments for controlling access to data on a network upon receiving a request comprising a device identifier and at least one user credential to access a remote resource.
Abstract: Disclosed are various embodiments for controlling access to data on a network Upon receiving a request comprising a device identifier and at least one user credential to access a remote resource, the request may be authenticated according to at least one compliance policy If the request is authenticated, a resource credential associated with the remote resource may be provided

Journal ArticleDOI
TL;DR: An easy-to-use yet powerful Web API is described, enabling fast and convenient access to MSI data, metadata, and derived analysis results stored remotely to facilitate high-performance data analysis and enable implementation of Web based data sharing, visualization, and analysis.
Abstract: Mass spectrometry imaging (MSI) enables researchers to directly probe endogenous molecules directly within the architecture of the biological matrix. Unfortunately, efficient access, management, and analysis of the data generated by MSI approaches remain major challenges to this rapidly developing field. Despite the availability of numerous dedicated file formats and software packages, it is a widely held viewpoint that the biggest challenge is simply opening, sharing, and analyzing a file without loss of information. Here we present OpenMSI, a software framework and platform that addresses these challenges via an advanced, high-performance, extensible file format and Web API for remote data access (http://openmsi.nersc.gov). The OpenMSI file format supports storage of raw MSI data, metadata, and derived analyses in a single, self-describing format based on HDF5 and is supported by a large range of analysis software (e.g., Matlab and R) and programming languages (e.g., C++, Fortran, and Python). Careful optimization of the storage layout of MSI data sets using chunking, compression, and data replication accelerates common, selective data access operations while minimizing data storage requirements and are critical enablers of rapid data I/O. The OpenMSI file format has shown to provide >2000-fold improvement for image access operations, enabling spectrum and image retrieval in less than 0.3 s across the Internet even for 50 GB MSI data sets. To make remote high-performance compute resources accessible for analysis and to facilitate data sharing and collaboration, we describe an easy-to-use yet powerful Web API, enabling fast and convenient access to MSI data, metadata, and derived analysis results stored remotely to facilitate high-performance data analysis and enable implementation of Web based data sharing, visualization, and analysis.

Patent
13 May 2013
TL;DR: In this paper, an integrated environment is provided for accessing structured data (e.g., data of a relational database) and unstructured data (i.e., data stored in a text or binary file).
Abstract: Methods, program products, and systems implementing integrated repository of structured and unstructured data are disclosed. An integrated environment is provided for accessing structured data (e.g., data of a relational database) and unstructured data (e.g., data stored in a text or binary file), including creating, managing, modifying, and searching the structured data and unstructured data. The integrated environment can include an integrated user interface, a set of commands and application programming interface (API), and storage for a relational database and a document repository. The integrated environment can include a database abstraction layer that allows database operations on both the structured data and the unstructured data.

Journal ArticleDOI
TL;DR: This paper proposes the use of an enterprise service bus (ESB) as a bridge for guaranteeing interoperability and integration of the different environments, thus introducing a semantic added value needed in the world of IoT-based systems.
Abstract: The Internet of Things (IoT) is growing at a fast pace with new devices getting connected all the time. A new emerging group of these devices is the wearable devices, and the wireless sensor networks are a good way to integrate them in the IoT concept and bring new experiences to the daily life activities. In this paper, we present an everyday life application involving a WSN as the base of a novel context-awareness sports scenario, where physiological parameters are measured and sent to the WSN by wearable devices. Applications with several hardware components introduce the problem of heterogeneity in the network. In order to integrate different hardware platforms and to introduce a service-oriented semantic middleware solution into a single application, we propose the use of an enterprise service bus (ESB) as a bridge for guaranteeing interoperability and integration of the different environments, thus introducing a semantic added value needed in the world of IoT-based systems. This approach places all the data acquired (e.g., via internet data access) at application developers disposal, opening the system to new user applications. The user can then access the data through a wide variety of devices (smartphones, tablets, and computers) and operating systems (Android, iOS, Windows, Linux, etc.).

Posted Content
TL;DR: A temporal description logic, TQL, is designed that extends the standard ontology language OWL 2 QL, provides basic means for temporal conceptual modelling and ensures first-order rewritability of conjunctive queries for suitably defined data instances with validity time.
Abstract: Our aim is to investigate ontology-based data access over temporal data with validity time and ontologies capable of temporal conceptual modelling. To this end, we design a temporal description logic, TQL, that extends the standard ontology language OWL 2 QL, provides basic means for temporal conceptual modelling and ensures first-order rewritability of conjunctive queries for suitably defined data instances with validity time.

Journal ArticleDOI
TL;DR: This paper has used REST based Web services as an interoperable application layer that can be directly integrated into other application domains for remote monitoring such as e-health care services, smart homes, or even vehicular area networks (VAN).
Abstract: Cloud computing provides great benefits for applications hosted on the Web that also have special computational and storage requirements. This paper proposes an extensible and flexible architecture for integrating Wireless Sensor Networks with the Cloud. We have used REST based Web services as an interoperable application layer that can be directly integrated into other application domains for remote monitoring such as e-health care services, smart homes, or even vehicular area networks (VAN). For proof of concept, we have implemented a REST based Web services on an IP based low power WSN test bed, which enables data access from anywhere. The alert feature has also been implemented to notify users via email or tweets for monitoring data when they exceed values and events of interest.

Proceedings ArticleDOI
15 Apr 2013
TL;DR: MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment and is able to autonomously achieve the performance of a manual configured cluster and quickly reconfigure the cluster according to unpredicted workload changes.
Abstract: NoSQL databases manage the bulk of data produced by modern Web applications such as social networks. This stems from their ability to partition and spread data to all available nodes, allowing NoSQL systems to scale. Unfortunately, current solutions' scale out is oblivious to the underlying data access patterns, resulting in both highly skewed load across nodes and suboptimal node configurations.In this paper, we first show that judicious placement of HBase partitions taking into account data access patterns can improve overall throughput by 35%. Next, we go beyond current state of the art elastic systems limited to uninformed replica addition and removal by: i) reconfiguring existing replicas according to access patterns and ii) adding replicas specifically configured to the expected access pattern.MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment. Our evaluation, conducted using the YCSB workload generator and a TPC-C workload, shows that MeT is able to i) autonomously achieve the performance of a manual configured cluster and ii) quickly reconfigure the cluster according to unpredicted workload changes.

Proceedings Article
03 Aug 2013
TL;DR: In this article, the authors investigate ontology-based data access over temporal data with validity time and ontologies capable of temporal conceptual modelling, and design a temporal description logic, TQL, that extends the standard ontology language OWL2QL, providing basic means for temporal conceptual modeling and ensuring first-order rewritability of conjunctive queries for suitably defined data instances with validity times.
Abstract: Our aim is to investigate ontology-based data access over temporal data with validity time and ontologies capable of temporal conceptual modelling. To this end, we design a temporal description logic, TQL, that extends the standard ontology language OWL2QL, provides basic means for temporal conceptual modelling and ensures first-order rewritability of conjunctive queries for suitably defined data instances with validity time.

Patent
27 Jun 2013
TL;DR: In this paper, a data access pattern is identified for accessing a first set of data portions of a first logical device, wherein the access pattern includes a time-ordered list of consecutively accessed logical addresses of the first logical devices.
Abstract: Described are techniques for storing data. A data access pattern is identified for accessing a first set of data portions of a first logical device, wherein the data access pattern includes a time-ordered list of consecutively accessed logical addresses of the first logical device. The first set of data portions are arranged on a second logical device. The first set of data portions have corresponding logical addresses on the second logical device and such corresponding logical addresses have a consecutive sequential ordering based on the data access pattern. The first set of data portions are stored at physical device locations mapped to the corresponding logical addresses of the second logical device.

Patent
25 Jun 2013
TL;DR: In this article, the authors proposed techniques for scheduling data access jobs based on a job dependency analysis, where a requested primary data access job is analyzed to determine one or more preliminary data access tasks on which it depends, and an execution duration of each data access task is predicted based on historical data or other factors.
Abstract: Techniques are described for scheduling data access jobs based on a job dependency analysis. A requested primary data access job is analyzed to determine one or more preliminary data access jobs on which it depends, and an execution duration of each data access job is predicted based on historical data or other factors. A time-sensitive subset of the preliminary data access jobs is determined as the subset of those serially dependent preliminary data access jobs for which there is a minimum time difference between the total predicted execution duration and a requested target completion time. Data access jobs are scheduled with priority given to those preliminary data access jobs in the time-sensitive subset, to enable the primary data access jobs to be completed by the requested target completion times.

Journal ArticleDOI
TL;DR: This case study focuses on how implementing the Privacy by Design model protects privacy while supporting access to individual-level data for research in the public interest and demonstrates how PopData achieves both operational efficiencies and due diligence.

Journal Article
Jan Lindström, Vilho Raatikka, Jarmo K Ruuth1, Petri Uolevi Soini1, Katriina Vakkila1 
TL;DR: The structural differences between in-memory and disk-based databases, and how solidDB works to deliver extreme speed are explored.
Abstract: A relational in-memory database, IBM solidDB is used worldwide for its ability to deliver extreme speed and availability. As the name implies, an in-memory database resides entirely in main memory rather than on disk, making data access an order of several magnitudes faster than with conventional, disk-based databases. Part of that leap is due to the fact that RAM simply provides faster data access than hard disk drives. But solidDB also has data structures and access methods specifically designed for storing, searching, and processing data in main memory. As a result, it outperforms ordinary diskbased databases even when the latter have data fully cached in memory. Some databases deliver low latency but cannot handle large numbers of transactions or concurrent sessions. IBM solidDB provides throughput measured in the range of hundreds of thousands to million of transactions per second while consistently achieving response times (or latency) measured in microseconds. This article explores the structural differences between in-memory and disk-based databases, and how solidDB works to deliver extreme speed.

Journal ArticleDOI
TL;DR: Sharing spatially specific data, which includes the characteristics and behaviors of individuals, households, or communities in geographical space, raises distinct technical and ethical challenges.
Abstract: Scholarly communication is at an unprecedented turning point created in part by the increasing saliency of data stewardship and data sharing. Formal data management plans represent a new emphasis in research, enabling access to data at higher volumes and more quickly, and the potential for replication and augmentation of existing research. Data sharing has recently transformed the practice, scope, content, and applicability of research in several disciplines, in particular in relation to spatially specific data. This lends exciting potentiality, but the most effective ways in which to implement such changes, particularly for disciplines involving human subjects and other sensitive information, demand consideration. Data management plans, stewardship, and sharing, impart distinctive technical, sociological, and ethical challenges that remain to be adequately identified and remedied. Here, we consider these and propose potential solutions for their amelioration.

Book ChapterDOI
23 Oct 2013
TL;DR: CloudFence is proposed, a framework for cloud hosting environments that provides transparent, fine-grained data tracking capabilities to both service providers, as well as their users, and allows users to independently audit the treatment of their data by third-party services through the intervention of the infrastructure provider that hosts these services.
Abstract: The risk of unauthorized private data access is among the primary concerns for users of cloud-based services. For the common setting in which the infrastructure provider and the service provider are different, users have to trust their data to both parties, although they interact solely with the latter. In this paper we propose CloudFence, a framework for cloud hosting environments that provides transparent, fine-grained data tracking capabilities to both service providers, as well as their users. CloudFence allows users to independently audit the treatment of their data by third-party services, through the intervention of the infrastructure provider that hosts these services. CloudFence also enables service providers to confine the use of sensitive data in well-defined domains, offering additional protection against inadvertent information leakage and unauthorized access. The results of our evaluation demonstrate the ease of incorporating CloudFence on existing real-world applications, its effectiveness in preventing a wide range of security breaches, and its modest performance overhead on real settings.