Showing papers on "Data access published in 2012"

PDF

Open Access

Journal Article•DOI•

The 1000 Genomes Project: data management and community access

[...]

Laura Clarke¹, Xiangqun Zheng-Bradley¹, Richard J.H. Smith¹, Eugene Kulesha¹, Chunlin Xiao², Iliana Toneva¹, Brendan Vaughan¹, Don Preuss², Rasko Leinonen¹, Martin Shumway², Stephen T. Sherry², Paul Flicek¹ - Show less +8 more•Institutions (2)

European Bioinformatics Institute¹, National Institutes of Health²

01 May 2012-Nature Methods

TL;DR: Members of the project data coordination center have developed and deployed several tools to enable widespread data access and to create a deep catalog of human genetic variation.

...read moreread less

Abstract: The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access.

...read moreread less

298 citations

Proceedings Article•DOI•

Fog Computing: Mitigating Insider Data Theft Attacks in the Cloud

[...]

Salvatore J. Stolfo¹, Malek Ben Salem², Angelos D. Keromytis•Institutions (2)

Columbia University¹, Accenture²

24 May 2012

TL;DR: Experiments conducted in a local file setting provide evidence that this approach to securing data in the cloud using offensive decoy technology may provide unprecedented levels of user data security in a Cloud environment.

...read moreread less

Abstract: Cloud computing promises to significantly change the way we use computers and access and store our personal and business information. With these new computing and communications paradigms arise new data security challenges. Existing data protection mechanisms such as encryption have failed in preventing data theft attacks, especially those perpetrated by an insider to the cloud provider. We propose a different approach for securing data in the cloud using offensive decoy technology. We monitor data access in the cloud and detect abnormal data access patterns. When unauthorized access is suspected and then verified using challenge questions, we launch a disinformation attack by returning large amounts of decoy information to the attacker. This protects against the misuse of the user's real data. Experiments conducted in a local file setting provide evidence that this approach may provide unprecedented levels of user data security in a Cloud environment.

...read moreread less

249 citations

Patent•

Encryption-based data access management

[...]

Joseph Nord¹, Benjamin Elliott Tucker¹, Timothy R. Gaylor¹•Institutions (1)

Citrix Systems¹

12 Dec 2012

TL;DR: In this article, the authors propose a scheme in which a device transmits a user authentication request for decrypting encrypted data to a data storage server storing the encrypted data, and receives a validation token associated with the user's authentication request, indicating that the user is authenticated to a domain.

...read moreread less

Abstract: Encryption-based data access management may include a variety of processes. In one example, a device may transmit a user authentication request for decrypting encrypted data to a data storage server storing the encrypted data. The computing device may then receive a validation token associated with the user's authentication request, the validation token indicating that the user is authenticated to a domain. Subsequently, the computing device may transmit the validation token to a first key server different from the data storage server. Then, in response to transmitting the validation token the computing device may receive, from the first key server, a key required for decrypting the encrypted data. The device may then decrypt at least a portion of the encrypted data using the key.

...read moreread less

235 citations

Journal Article•DOI•

SABIO-RK—database for biochemical reaction kinetics

[...]

Ulrike Wittig¹, Renate Kania¹, Martin Golebiewski¹, Maja Rey¹, Lei Shi¹, Lenneke M. Jong¹, Enkhjargal Algaa¹, Andreas Weidemann¹, Heidrun Sauer-Danzwith¹, Saqib Mir¹, Olga Krebs¹, Meik Bittkowski¹, Elina Wetsch¹, Isabel Rojas¹, Wolfgang Müller¹ - Show less +11 more•Institutions (1)

Heidelberg Institute for Theoretical Studies¹

01 Jan 2012-Nucleic Acids Research

TL;DR: SABIO-RK (http://sabio.h-its.org/) is a web-accessible database storing comprehensive information about biochemical reactions and their kinetic properties, supported by automated consistency checks.

...read moreread less

Abstract: SABIO-RK (http://sabio.h-its.org/) is a web-accessible database storing comprehensive information about biochemical reactions and their kinetic properties. SABIO-RK offers standardized data manually extracted from the literature and data directly submitted from lab experiments. The database content includes kinetic parameters in relation to biochemical reactions and their biological sources with no restriction on any particular set of organisms. Additionally, kinetic rate laws and corresponding equations as well as experimental conditions are represented. All the data are manually curated and annotated by biological experts, supported by automated consistency checks. SABIO-RK can be accessed via web-based user interfaces or automatically via web services that allow direct data access by other tools. Both interfaces support the export of the data together with its annotations in SBML (Systems Biology Markup Language), e.g. for import in modelling tools.

...read moreread less

231 citations

Proceedings Article•DOI•

Investigation of Data Locality in MapReduce

[...]

Zhenhua Guo¹, Geoffrey C. Fox¹, Mo Zhou¹•Institutions (1)

Indiana University¹

13 May 2012

TL;DR: This paper builds a mathematical model of scheduling in MapReduce and proposes an algorithm that schedules multiple tasks simultaneously rather than one by one to give optimal data locality and runs extensive experiments to quantify performance improvement of the proposed algorithms and measure how different factors impact data locality.

...read moreread less

Abstract: Traditional HPC architectures separate compute nodes and storage nodes, which are interconnected with high speed links to satisfy data access requirements in multi-user environments. However, the capacity of those high speed links is still much less than the aggregate bandwidth of all compute nodes. In Data Parallel Systems such as GFS/MapReduce, clusters are built with commodity hardware and each node takes the roles of both computation and storage, which makes it possible to bring compute to data. Data locality is a significant advantage of data parallel systems over traditional HPC systems. Good data locality reduces cross-switch network traffic - one of the bottlenecks in data-intensive computing. In this paper, we investigate data locality in depth. Firstly, we build a mathematical model of scheduling in MapReduce and theoretically analyze the impact on data locality of configuration factors, such as the numbers of nodes and tasks. Secondly, we find the default Hadoop scheduling is non-optimal and propose an algorithm that schedules multiple tasks simultaneously rather than one by one to give optimal data locality. Thirdly, we run extensive experiments to quantify performance improvement of our proposed algorithms, measure how different factors impact data locality, and investigate how data locality influences job execution time in both single-cluster and cross-cluster environments.

...read moreread less

151 citations

Journal Article•DOI•

iDASH: integrating data for analysis, anonymization, and sharing

[...]

Lucila Ohno-Machado¹, Vineet Bafna¹, Aziz A. Boxwala¹, Brian E. Chapman¹, Wendy W. Chapman¹, Kamalika Chaudhuri¹, Michele E. Day¹, Claudiu Farcas¹, Nathaniel D. Heintzman¹, Xiaoqian Jiang¹, Hyeoneui Kim¹, Jihoon Kim¹, Michael E. Matheny², Frederic S. Resnic³, Staal A. Vinterbo¹ - Show less +11 more•Institutions (3)

University of California, San Diego¹, Vanderbilt University², Brigham and Women's Hospital³

01 Mar 2012-Journal of the American Medical Informatics Association

TL;DR: Through these various mechanisms, iDASH implements its goal of providing biomedical and behavioral researchers with access to data, software, and a high-performance computing environment, thus enabling them to generate and test new hypotheses.

...read moreread less

148 citations

Proceedings Article•DOI•

Hails: protecting data privacy in untrusted web applications

[...]

Daniel B. Giffin¹, Amit Levy¹, Deian Stefan¹, David Terei¹, David Mazières¹, John C. Mitchell¹, Alejandro Russo² - Show less +3 more•Institutions (2)

Stanford University¹, Chalmers University of Technology²

08 Oct 2012

TL;DR: A new web framework, Hails, is presented that adds mandatory access control and a declarative policy language to the familiar MVC architecture and is demonstrated through GitStar.com, a code-hosting website that enforces robust privacy policies on user data even while allowing untrusted apps to deliver extended features to users.

...read moreread less

Abstract: Modern extensible web platforms like Facebook and Yammer depend on third-party software to offer a rich experience to their users. Unfortunately, users running a third-party "app" have little control over what it does with their private data. Today's platforms offer only ad-hoc constraints on app behavior, leaving users an unfortunate trade-off between convenience and privacy. A principled approach to code confinement could allow the integration of untrusted codewhile enforcing flexible, end-to-end policies on data access. This paper presents a new web framework, Hails, that adds mandatory access control and a declarative policy language to the familiar MVC architecture. We demonstrate the flexibility of Hails through GitStar.com, a code-hosting website that enforces robust privacy policies on user data even while allowing untrusted apps to deliver extended features to users.

...read moreread less

146 citations

Journal Article•DOI•

ImgLib2—generic image processing in Java

[...]

Tobias Pietzsch¹, Stephan Preibisch¹, Pavel Tomancak¹, Stephan Saalfeld¹•Institutions (1)

Max Planck Society¹

15 Nov 2012-Bioinformatics

TL;DR: ImgLib2 is an open-source Java library for n-dimensional data representation and manipulation with focus on image processing that aims at minimizing code duplication by cleanly separating pixel-algebra, data access and data representation in memory.

...read moreread less

Abstract: Summary: ImgLib2 is an open-source Java library for n-dimensional data representation and manipulation with focus on image processing. It aims at minimizing code duplication by cleanly separating pixel-algebra, data access and data representation in memory. Algorithms can be implemented for classes of pixel types and generic access patterns by which they become independent of the specific dimensionality, pixel type and data representation. ImgLib2 illustrates that an elegant high-level programming interface can be achieved without sacrificing performance. It provides efficient implementations of common data types, storage layouts and algorithms. It is the data model underlying ImageJ2, the KNIME Image Processing toolbox and an increasing number of Fiji-Plugins. Availability: ImgLib2 is licensed under BSD. Documentation and source code are available at http://imglib2.net and in a public repository at https://github.com/imagej/imglib. Supplementary Information: Supplementary data are available at Bioinformatics Online. Contact: ed.gbc-ipm@dleflaas

...read moreread less

133 citations

Proceedings Article•DOI•

Cachet: a decentralized architecture for privacy preserving social networking with caching

[...]

Shirin Nilizadeh¹, Sonia Jahid², Prateek Mittal³, Nikita Borisov², Apu Kapadia¹ - Show less +1 more•Institutions (3)

Indiana University¹, University of Illinois at Urbana–Champaign², University of California, Berkeley³

10 Dec 2012

TL;DR: This paper proposes Cachet, an architecture that provides strong security and privacy guarantees while preserving the main functionality of online social networks, and demonstrates that decentralized architectures for privacy preserving social networking are feasible, and use of social contacts for object caching results in significant performance improvements.

...read moreread less

Abstract: Online social networks (OSNs) such as Facebook and Google+ have transformed the way our society communicates However, this success has come at the cost of user privacy; in today's OSNs, users are not in control of their own data, and depend on OSN operators to enforce access control policies A multitude of privacy breaches has spurred research into privacy-preserving alternatives for social networking, exploring a number of techniques for storing, disseminating, and controlling access to data in a decentralized fashion In this paper, we argue that a combination of techniques is necessary to efficiently support the complex functionality requirements of OSNsWe propose Cachet, an architecture that provides strong security and privacy guarantees while preserving the main functionality of online social networks In particular, Cachet protects the confidentiality, integrity and availability of user content, as well as the privacy of user relationships Cachet uses a distributed pool of nodes to store user data and ensure availability Storage nodes in Cachet are untrusted; we leverage cryptographic techniques such as attribute based encryption to protect the confidentiality of data For efficient dissemination and retrieval of data, Cachet uses a hybrid structured-unstructured overlay paradigm in which a conventional distributed hash table is augmented with social links between users Social contacts in our system act as caches to store recent updates in the social network, and help reduce the cryptographic as well as the communication overhead in the networkWe built a prototype implementation of Cachet in the FreePastry simulator To demonstrate the functionality of existing OSNs we implemented the "newsfeed" application Our evaluation demonstrates that (a) decentralized architectures for privacy preserving social networking are feasible, and (b) use of social contacts for object caching results in significant performance improvements

...read moreread less

128 citations

Journal Article•DOI•

Yabi: An online research environment for grid, high performance and cloud computing

[...]

Adam Hunter, A. Macgregor, T. Szabo, C. Wellington, Matthew I. Bellgard - Show less +1 more

15 Feb 2012

TL;DR: The Yabi system encapsulates considered design of both execution and data models, while abstracting technical details away from users who are not skilled in HPC and providing an intuitive drag-and-drop scalable web-based workflow environment where the same tools can also be accessed via a command line.

...read moreread less

Abstract: Background There is a significant demand for creating pipelines or workflows in the life science discipline that chain a number of discrete compute and data intensive analysis tasks into sophisticated analysis procedures. This need has led to the development of general as well as domain-specific workflow environments that are either complex desktop applications or Internet-based applications. Complexities can arise when configuring these applications in heterogeneous compute and storage environments if the execution and data access models are not designed appropriately. These complexities manifest themselves through limited access to available HPC resources, significant overhead required to configure tools and inability for users to simply manage files across heterogenous HPC storage infrastructure.

...read moreread less

125 citations

Patent•

Systems and methods for gesture interaction with cloud-based applications

[...]

Shuki Binyamin, Jay Zaveri, Assaf Kamil, Tzahi Glik, Doron Yaacoby, Meir Morgenstern - Show less +2 more

10 Sep 2012

TL;DR: In this article, the authors describe a set of server-based systems and methods that enable a remotely executed application to receive gesture user input, even when the remote executed application is not natively configured to receive such user input.

...read moreread less

Abstract: Various systems and methods described herein relate to server-based computing, where the systems and methods provide a client with access to an application executing remotely from the client device and having access to data (e.g., one or more files) residing on a cloud-based storage (e.g., provided by a third-party cloud-based storage service, such as Dropbox or Box). For some systems and methods, the application may be remotely executed and provided to the client such that the application has in-application/embedded access (hereafter, referred to as “native access”) to the cloud-based storage and files residing on the cloud-based storage. Additionally, some systems and methods may enable a remotely executed application to receive a gesture user input, even when the remotely executed application is not natively configured to receive such user input.

...read moreread less

Journal Article•DOI•

Data Sharing in the Post-Genomic World: The Experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO)

[...]

Yann Joly, Edward S. Dove¹, Bartha Maria Knoppers¹, Martin Bobrow², Don Chalmers³ - Show less +1 more•Institutions (3)

McGill University¹, University of Cambridge², University of Tasmania³

12 Jul 2012-PLOS Computational Biology

TL;DR: The experience of the Data Access Compliance Office (DACO) of the International Cancer Genome Consortium (ICGC) is presented to provide information on this increasingly important type of database governance body.

...read moreread less

Abstract: The scientific community, research funders, and governments have repeatedly recognized the importance of open access to genomic data for scientific research and medical progress [1]–[4]. Open access is becoming a well-established practice for large-scale, publicly funded, data-intensive community science projects, particularly in the field of genomics. Given this consensus, restrictions to open access should be regarded as exceptional and treated with caution. Yet, several developments [5] have led scientists and policymakers to investigate and implement open access restrictions [5]–[9]. Notably, there are privacy concerns within the genomics community and critiques from some researchers that open access, if left completely unregulated, could raise significant scientific, ethical, and legal issues (e.g., quality of the data, appropriate credit to data generators, relevance of the system for small and medium projects, etc.) [1]–[10]. A recent paper by Greenbaum and colleagues in this journal [11] identified protecting the privacy of study participants as the main challenge to open genomic data sharing. One possible way to reconcile open data sharing with privacy concerns is to use a tiered access system to separate access into “open” and “controlled.” Open access remains the norm for data that cannot be linked with other data to generate a dataset that would uniquely identify an individual. A controlled access mechanism, on the other hand, regulates access to certain, more sensitive data (e.g., detailed phenotype and outcome data, genome sequences files, raw genotype calls) by requiring third parties to apply to a body (e.g., custodian, original data collectors, independent body, or data access committee) and complete an access application that contains privacy safeguards. This mechanism, while primarily designed to protect study participants, can also be used to protect investigators, database hosting institutions, and funders from perceptions or acts of favoritism or impropriety. The experience of controlled access bodies to date has been only minimally documented in the literature [9], [12]. To address this lacuna, we present the experience of the Data Access Compliance Office (DACO) of the International Cancer Genome Consortium (ICGC). The goal is to provide information on this increasingly important type of database governance body.

...read moreread less

Proceedings Article•DOI•

Towards temporal access control in cloud computing

[...]

Yan Zhu¹, Hongxin Hu², Gail-Joon Ahn², Dijiang Huang², Shanbiao Wang¹ - Show less +1 more•Institutions (2)

Peking University¹, Arizona State University²

25 Mar 2012

TL;DR: This paper presents an efficient temporal access control encryption scheme for cloud services with the help of cryptographic integer comparisons and a proxy-based re-encryption mechanism on the current time and provides a dual comparative expression of integer ranges to extend the power of attribute expression for implementing various temporal constraints.

...read moreread less

Abstract: Access control is one of the most important security mechanisms in cloud computing. Attribute-based access control provides a flexible approach that allows data owners to integrate data access policies within the encrypted data. However, little work has been done to explore temporal attributes in specifying and enforcing the data owner's policy and the data user's privileges in cloud-based environments. In this paper, we present an efficient temporal access control encryption scheme for cloud services with the help of cryptographic integer comparisons and a proxy-based re-encryption mechanism on the current time. We also provide a dual comparative expression of integer ranges to extend the power of attribute expression for implementing various temporal constraints. We prove the security strength of the proposed scheme and our experimental results not only validate the effectiveness of our scheme, but also show that the proposed integer comparison scheme performs significantly better than previous bitwise comparison scheme.

...read moreread less

Journal Article•DOI•

LogBase: a scalable log-structured database system in the cloud

[...]

Hoang Tam Vo¹, Sheng Wang¹, Divyakant Agrawal², Gang Chen³, Beng Chin Ooi¹ - Show less +1 more•Institutions (3)

National University of Singapore¹, University of California, Santa Barbara², Zhejiang University³

01 Jun 2012

TL;DR: This paper introduces LogBase -- a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery and is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments.

...read moreread less

Abstract: Numerous applications such as financial transactions (e.g., stock trading) are write-heavy in nature. The shift from reads to writes in web applications has also been accelerating in recent years. Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems. However, the separation of log and application data incurs write overheads observed in write-heavy environments and hence adversely affects the write throughput and recovery time in the system.In this paper, we introduce LogBase -- a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery. It is designed to be dynamically deployed on commodity clusters to take advantage of elastic scaling property of cloud environments. LogBase provides in-memory multiversion indexes for supporting efficient access to data maintained in the log. LogBase also supports transactions that bundle read and write operations spanning across multiple records. We implemented the proposed system and compared it with HBase and a disk-based log-structured record-oriented system modeled after RAMCloud. The experimental results show that LogBase is able to provide sustained write throughput, efficient data access out of the cache, and effective system recovery.

...read moreread less

Journal Article•DOI•

Distributed Privacy-Preserving Access Control in Sensor Networks

[...]

Rui Zhang¹, Yanchao Zhang¹, Kui Ren²•Institutions (2)

Arizona State University¹, Illinois Institute of Technology²

01 Aug 2012-IEEE Transactions on Parallel and Distributed Systems

TL;DR: In this paper, the authors proposed a distributed privacy-preserving access control (DP2AC) scheme for sensor networks, which is the first work of its kind, where users in DP2AC purchase tokens from the network owner whereby to query data from sensor nodes which will reply only after validating the tokens.

...read moreread less

Abstract: The owner and users of a sensor network may be different, which necessitates privacy-preserving access control. On the one hand, the network owner need enforce strict access control so that the sensed data are only accessible to users willing to pay. On the other hand, users wish to protect their respective data access patterns whose disclosure may be used against their interests. This paper presents DP2AC, a Distributed Privacy-Preserving Access Control scheme for sensor networks, which is the first work of its kind. Users in DP2AC purchase tokens from the network owner whereby to query data from sensor nodes which will reply only after validating the tokens. The use of blind signatures in token generation ensures that tokens are publicly verifiable yet unlinkable to user identities, so privacy-preserving access control is achieved. A central component in DP2AC is to prevent malicious users from reusing tokens, for which we propose a suite of distributed token reuse detection (DTRD) schemes without involving the base station. These schemes share the essential idea that a sensor node checks with some other nodes (called witnesses) whether a token has been used, but they differ in how the witnesses are chosen. We thoroughly compare their performance with regard to TRD capability, communication overhead, storage overhead, and attack resilience. The efficacy and efficiency of DP2AC are confirmed by detailed performance evaluations.

...read moreread less

Journal Article•DOI•

A survey of dynamic replication strategies for improving data availability in data grids

[...]

Tehmina Amjad¹, Muhammad Sher¹, Ali Daud¹•Institutions (1)

International Islamic University, Islamabad¹

01 Feb 2012-Future Generation Computer Systems

TL;DR: Different issues involved in data replication are identified and different replication techniques are studied to find out which attributes are addressed in a given technique and which are ignored to facilitate the future comparison of dynamic replication techniques.

...read moreread less

Patent•

Specializing I/O access patterns for flash storage

[...]

Christopher Small¹, Stephen M. Byan¹, James Lentini¹•Institutions (1)

NetApp¹

22 May 2012

TL;DR: In this paper, a data processing system that uses a non-volatile solid-state device as a circular log, with the goal of aligning data access patterns to the underlying, hidden device implementation, is described.

...read moreread less

Abstract: Systems and methods for efficiently using solid-state devices are provided. Some embodiments provide for a data processing system that uses a non-volatile solid-state device as a circular log, with the goal of aligning data access patterns to the underlying, hidden device implementation, in order to maximize performance. In addition, metadata can be interspersed with data in order to align data access patterns to the underlying device implementation. Multiple input/output (I/O) buffers can also be used to pipeline insertions of metadata and data into a linear log. The observed queuing behavior of the multiple I/O buffers can be used to determine when the utilization of the storage device is approaching saturation (e.g., in order to predict excessively-long response times). Then, the I/O load on the storage device may be shed when utilization approaches saturation. As a result, the overall response time of the system is improved.

...read moreread less

The Trinity Graph Engine

[...]

Bin Shao, Haixun Wang, Yatao Li

01 Mar 2012

TL;DR: Trinity as discussed by the authors is a general purpose graph engine over a distributed memory cloud, which leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance.

...read moreread less

Abstract: Computations performed by graph algorithms are data driven, and require a high degree of random data access. Despite the great progresses made in disk technology, it still cannot provide the level of efficient random access required by graph computation. On the other hand, memory-based approaches usually do not scale due to the capacity limit of single machines. In this paper, we introduce Trinity, a general purpose graph engine over a distributed memory cloud. Through optimized memory management and network communication, Trinity supports fast graph exploration as well as efficient parallel computing. In particular, Trinity leverages graph access patterns in both online and offline computation to optimize memory and communication for best performance. These enable Trinity to support efficient online query processing and offline analytics on large graphs with just a few commodity machines. Furthermore, Trinity provides a high level specification language called TSL for users to declare data schema and communication protocols, which brings great ease-of-use for general purpose graph management and computing. Our experiments show Trinity’s performance in both low latency graph queries as well as high throughput graph analytics on web-scale, billion-node graphs.

...read moreread less

Patent•

System and method for location-based protection of mobile data

[...]

Parameshwaran Krishnan¹, Navjot Singh¹•Institutions (1)

Avaya¹

12 Sep 2012

TL;DR: In this paper, a system and method to provide location-based levels of data protection is described, which includes: receiving, by a receiver, login credentials of a user of a mobile device; authenticating, by use of a policy server, a credentials-based level of data access as configured by a policy; retrieving, by using a geo-location module, a location of the mobile device.

...read moreread less

Abstract: System and method to provide location-based levels of data protection, the method including: receiving, by a receiver, login credentials of a user of a mobile device; authenticating, by use of a policy server, a credentials-based level of data access as configured by a policy; retrieving, by a geo-location module, a location of the mobile device; determining, by use of the policy server, a location-based level of data access as configured by the policy; and granting sensitive data access based upon a more restrictive limitation of the credentials-based level of data access and the location-based level of data access.

...read moreread less

Journal Article•DOI•

A novel dynamic network data replication scheme based on historical access record and proactive deletion

[...]

Zhe Wang¹, Tao Li¹, Naixue Xiong², Yi Pan²•Institutions (2)

Sichuan University¹, Georgia State University²

01 Oct 2012-The Journal of Supercomputing

TL;DR: A proactive deletion method is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead, and the results indicate that the new algorithm performs much better than those algorithms.

...read moreread less

Abstract: Data replication is becoming a popular technology in many fields such as cloud storage, Data grids and P2P systems. By replicating files to other servers/nodes, we can reduce network traffic and file access time and increase data availability to react natural and man-made disasters. However, it does not mean that more replicas can always have a better system performance. Replicas indeed decrease read access time and provide better fault-tolerance, but if we consider write access, maintaining a large number of replications will result in a huge update overhead. Hence, a trade-off between read access time and write updating cost is needed. File popularity is an important factor in making decisions about data replication. To avoid data access fluctuations, historical file popularity can be used for selecting really popular files. In this research, a dynamic data replication strategy is proposed based on two ideas. The first one employs historical access records which are useful for picking up a file to replicate. The second one is a proactive deletion method, which is applied to control the replica number to reach an optimal balance between the read access time and the write update overhead. A unified cost model is used as a means to measure and compare the performance of our data replication algorithm and other existing algorithms. The results indicate that our new algorithm performs much better than those algorithms.

...read moreread less

Patent•

Recording, monitoring, and analyzing driver behavior

[...]

Pavel Stankoulov

11 Dec 2012

TL;DR: In this article, a system adapted to monitor, record and analyze driver performance is described, which includes a vehicle sensor module adapted to receive data from a set of sensors that each measure a driving characteristic associated with a vehicle.

...read moreread less

Abstract: A system adapted to monitor, record and analyze driver performance is described. The system includes: a vehicle sensor module adapted to receive data from a set of sensors that each measure a driving characteristic associated with a vehicle; a map data access module adapted to retrieve, from a map database, map data elements indicating various features associated with at least one path of the vehicle; and a driver behavior engine adapted to receive information from the vehicle sensor module and the map data access module, and to monitor and evaluate driver performance based on the received information.

...read moreread less

Proceedings Article•DOI•

Scalia: an adaptive scheme for efficient multi-cloud storage

[...]

Thanasis G. Papaioannou¹, Nicolas Bonvin¹, Karl Aberer¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

10 Nov 2012

TL;DR: The cost-effectiveness of Scalia is proved against static placements and its proximity to the ideal data placement in various scenarios of data access patterns, of available cloud storage solutions and of failures.

...read moreread less

Abstract: A growing amount of data is produced daily resulting in a growing demand for storage solutions. While cloud storage providers offer a virtually infinite storage capacity, data owners seek geographical and provider diversity in data placement, in order to avoid vendor lock-in and to increase availability and durability. Moreover, depending on the customer data access pattern, a certain cloud provider may be cheaper than another. In this paper, we introduce Scalia, a cloud storage brokerage solution that continuously adapts the placement of data based on its access pattern and subject to optimization objectives, such as storage costs. Scalia efficiently considers repositioning of only selected objects that may significantly lower the storage cost. By extensive simulation experiments, we prove the cost-effectiveness of Scalia against static placements and its proximity to the ideal data placement in various scenarios of data access patterns, of available cloud storage solutions and of failures.

...read moreread less

Journal Article•DOI•

PDDRA: A new pre-fetching based dynamic data replication algorithm in data grids

[...]

Nazanin Saadat¹, Amir Masoud Rahmani¹•Institutions (1)

Islamic Azad University¹

01 Apr 2012-Future Generation Computer Systems

TL;DR: A new dynamic data replication algorithm named PDDRA is proposed that optimizes the traditional algorithms and has better performance in comparison with other algorithms in terms of job execution time, effective network usage, total number of replications, hit ratio and percentage of storage filled.

...read moreread less

Patent•

Information retrieval and navigation using a semantic layer and dynamic objects

[...]

Adam Ferrari¹, Joshua William Kapell¹, Jason Furtado¹, Matthew L. Brandwein¹, Spiro Michaylov¹, Omri Traub¹, Vladimir Zelevinsky¹, John Huffaker¹ - Show less +4 more•Institutions (1)

Business International Corporation¹

13 Jun 2012

TL;DR: In this article, a system and methods for information retrieval can return results from the one or more collections of information based not only on the data stored, but also on the virtual data generated from interpretation of the stored data.

...read moreread less

Abstract: Systems and methods for information retrieval are provided that permit users and/or processing entities to access and define synthetic data, synthetic objects, and/or synthetic groupings of data in one or more collections of information. In one embodiment, data access on an information retrieval system can occur through an interpretation layer which interprets any synthetic data against data physically stored in the collection. Synthetic data can define virtual data objects, virtual data elements, virtual data attributes, virtual data groupings, and/or data entities that can be interpreted against data that may be stored physically in the collection of information. The system and methods for information retrieval can return results from the one or more collections of information based not only on the data stored, but also on the virtual data generated from interpretation of the stored data.

...read moreread less

Journal Article•DOI•

CANDIShare: A Resource for Pediatric Neuroimaging Data

[...]

David N. Kennedy¹, Christian Haselgrove¹, Steven M. Hodge¹, Pallavi Rane¹, Nikos Makris², Jean A. Frazier¹ - Show less +2 more•Institutions (2)

University of Massachusetts Medical School¹, Harvard University²

01 Jul 2012-Neuroinformatics

TL;DR: The Child and Adolescent NeuroDevelopment Initiative at University of Massachusetts Medical School is making available a series of structural brain images, as well as their anatomic segmentations and demographic data, as a coordinated set of related morphometric resources.

...read moreread less

Abstract: There are numerous psychiatric disorders that can plague the development of children. Each of these disorders manifests as a distinct pattern of clinical, behavioral, etiological, neuroanatomic and neurofunctional characteristics that challenge the management of the individual patient, as well as the development of successful intervention and prevention strategies. In the area of neuroimaging, a substantial number of studies have been performed to date; and while much has been learned from this investment, this represents only the tip-of-the-iceberg of the information that can be gleaned from the data. Unfortunately, most of this additional, untapped information resource is lost due to ineffective sharing of the data that is possible by application of current technologies. The Child and Adolescent NeuroDevelopment Initiative (CANDI) at University of Massachusetts Medical School is making available a series of structural brain images, as well as their anatomic segmentations and demographic data, as a coordinated set of related morphometric resources. The initial data set is a release of 103 subjects (T1-weighted MRI scans and anatomic segmentation) that comprised the neuroanatomic data published in 2008 in an article by Frazier, et al.1 The subjects include 57 males and 46 females, aged 4–17, and come from four diagnostic groups: Healthy Controls (N=29), Schizophrenia Spectrum (N=20), Bipolar Disorder with Psychosis (N=19), and Bipolar Disorder without Psychosis (N=35). Both the McLean Hospital and Cambridge Health Alliance Institutional Review Boards approved the original data acquisition research protocols. All subjects signed assent forms, and their parents/legal guardians signed informed consent forms. The University of Massachusetts Medical School Institutional Review Board approved the data sharing protocol. Images were acquired from 1996–2006 at the McLean Hospital Brain Imaging Center on a 1.5 Tesla General Electric Signa Scanner. Structural imaging was performed using a three-dimensional inversion recovery-prepared spoiled gradient recalled echo in the coronal plane with 124 1.5 mm thick slices, repetition time 10 ms, TE 3, flip angle 25°, field of view 24 cm, acquisition matrix 256×192 and 2 excitations.2 All scans were reviewed by a clinical neuroradiologist to rule out gross pathology. The MR images released have undergone analysis at the Center for Morphometric Analysis (CMA) at the Massachusetts General Hospital which includes: preprocessing (‘positional normalization’ to put the image into the standard orientation of the Talairach coordinate space,3 and bias field correction4) and ‘general segmentation’ following the CMA segmentation protocol.5 Segmented regions include: Cerebral Cortex and White Matter, Cerebellum Cortex and White Matter, Lateral Ventricle, Thalamus, Ventral Diencephalon, Caudate, Putamen, Pallidum, Accumbens, Hippocampus and Amygdala, bilaterally; as well as Brain Stem, 3rd and 4th ventricles. Basic demographic details for each subject include diagnosis, age, gender, and handedness (Fig. 1). Fig. 1 The figure shows the data and resource relationships accessible from the CANDIShare data release. From the CANDIShare NITRC download page, both version V1.0 and V1.1 can be accessed. Accessing V1.1 for the healthy control group (HC) gains the user access ... The release is provided under the Creative Commons: Attribute license,6 and is structured as four bundles (tar) of imaging data, one for each diagnostic group. Within each bundle, there are separate directories for each subject that includes the data. Release version V1.0 includes only the imaging data and basic demographics and is accessible with no limitations. Version V1.1 adds the segmentation data as an indexed label file in register with the MR image for each subject, and requires provision of a contact email address (via the NITRC registration process) and requires a ‘clickthrough’ acceptance of the licensing terms. We do not perceive these extra access ‘hoops’ as burdensome, but they are necessary. While we hope that the data provided are perfect and free of any errors, experience has taught that some unforeseen errors may be present. If found, we will correct any error, and re-release a new version of the data and are obligated to inform all prior users of the data of any errors and corrections that were necessitated in order to limit the impact of any errors. In addition to the image releases, the CANDIShare portal begins to establish a more richly interconnected set of related informatics resources. The first example of this is the inclusion of linkages between each data release and their representation in the Internet Brain Volume Database (IBVD).7 The IBVD is a web-based database of published brain neuroanatomic volumetric observations. As the images and segmentation in this data release supports the volumetric observations in a specific publication, this linkage completes the connection between the high-level, group volumetric observation and the detailed individualized data that supports it. As the IBVD is also interoperable with PubMed via the ‘Link-Out’ function, a PubMed user can get from the publication listing at PubMed, to the summary volumetric observations reported in IBVD, to the raw data that supports the observation in the CANDIShare data release. Each of the diagnostic group bundles are linked to their respective group pages in the IBVD via the link exposed at the download page of the NITRC project. This image and segmentation data release for a specific publication represents the first in a series of data representing the pediatric brain in health and disease from data collected in our lab over the past 15 years. Additional publications will be added, and more resource interconnectivity will be included to maximize the utility of the released data with other classes of available data. This release of information is designed to be dramatically greater than merely ‘making the images available’: each image is associated with substantial analytic results, many of which have been utilized in the preparation of various publications and comparisons. Moreover, these data will be most effectively shared with the research community when shared in a way that preserves the linkages between the images, the resultant analytic data and meta-data, and its relationships to other public sources of related information. The utility of this shared data includes potentially expanding the numbers of subjects available for other studies (when the imaging protocols are comparable), facilitating additional research findings in this specific data set through reuse by other laboratories, and additional community access to accurate, manually-labeled datasets to promote development and testing of data analysis software. Access to data of this sort has already been valuable in development of numerous analysis techniques.8 In short, this represents a ‘Knowledge Management’ environment that will facilitate traversal of these data and linkages.

...read moreread less

Proceedings Article•DOI•

Multi-version Concurrency via Timestamp Range Conflict Management

[...]

David B. Lomet¹, Alan Fekete², Rui Wang¹, Peter Ward²•Institutions (2)

Microsoft¹, University of Sydney²

01 Apr 2012

TL;DR: This work introduces a new concurrency control approach that enables all SQL isolation levels including serializability to utilize multiple versions to increase concurrency while also supporting transaction time database functionality.

...read moreread less

Abstract: A database supporting multiple versions of records may use the versions to support queries of the past or to increase concurrency by enabling reads and writes to be concurrent. We introduce a new concurrency control approach that enables all SQL isolation levels including serializability to utilize multiple versions to increase concurrency while also supporting transaction time database functionality. The key insight is to manage a range of possible timestamps for each transaction that captures the impact of conflicts that have occurred. Using these ranges as constraints often permits concurrent access where lock based concurrency control would block. This can also allow blocking instead of some aborts that are common in earlier multi-version concurrency techniques. Also, timestamp ranges can be used to conservatively find deadlocks without graph based cycle detection. Thus, our multi-version support can enhance performance of current time data access via improved concurrency, while supporting transaction time functionality.

...read moreread less

Patent•

Managing Patient Consent in a Master Patient Index

[...]

Jared B. Crapo, David M. Coyle, Carol Lynn Owen, Preston Pearson, Kristen McRae - Show less +1 more

09 Jan 2012

TL;DR: In this article, a system and method for managing patient consent is presented, which includes a controller, a lookup module, a clinical authorization engine, a logging/auditing unit, a user profile engine, and a report module and a user interface engine.

...read moreread less

Abstract: A system and method for managing patient consent. A data access manager includes a controller, a lookup module, a clinical authorization engine, a logging/auditing unit, a user profile engine, a report module and a user interface engine. The controller manages the core functions and the transmission of data between the data access manager components. The lookup module enables a user to query patient data. The clinical authorization engine authorizes access to patient data. The logging/auditing unit logs and monitors user activity. The user profile engine accesses and updates user profile information. The patient profile engine accesses and updates patient profile information. The report module generates reports related to the user activity. The user interface engine generates user interfaces for displaying the user profiles and patient information data.

...read moreread less

Patent•

System and method for importing and merging content items from different sources

[...]

Chris Beckmann, Ramesh Balakrishnan, Rajeev Nayak, Yi Wei

21 Dec 2012

TL;DR: In this paper, a system for importing and merging photos from different sources is described, where the system receives credentials from a user, who has an account with a content management system, associated with content item storage entities such as photo repositories.

...read moreread less

Abstract: Systems, methods, and computer-readable storage media for importing and merging photos from different sources are disclosed. The system receives credentials from a user, who has an account with a content management system. The credentials are associated with content item storage entities such as photo repositories. The system accesses the photo repositories, using the plurality of credentials if authorization is required for data access. The system identifies source photo data in each of the photo repositories, and duplicates the source photo data in the content management system account to create consolidated photo data.

...read moreread less

Journal Article•DOI•

The ABCD of primary biodiversity data access

[...]

Jörg Holetschek, Gabriele Dröge, Anton Güntsch, Walter G. Berendsohn

16 Oct 2012-Plant Biosystems

TL;DR: It is posited that the XML-based networking approach using a highly standardised data definition such as ABCD continues to be a valuable approach towards mobilising natural history information.

...read moreread less

Abstract: Within the context of the Global Biodiversity Information Facility (GBIF), the Biological Collections Access Service (BioCASe) has been set up to foment data provision by natural history content providers. Products include the BioCASe Protocol and the PyWrapper software, a web service allowing to access rich natural history data using complex schemas like ABCD (Access to Biological Collection Data). New developments include the possibility to produce DarwinCore-Archive files using PyWrapper, in order to facilitate the indexing of large datasets by aggregators such as GBIF. However, BioCASe continues to be committed to distributed data access and continues to provide the possibility to directly query the web service for up-to-date data directly from the provider's database. ABCD provides comprehensive coverage of natural history data, and has been extended to cover DNA collections (ABCD-DNA) and geosciences (ABCD-EFG, the extension for geosciences). BioCASe also developed web portal software that ...

...read moreread less

Patent•

System and method for efficiently securing enterprise data resources

[...]

Dmitri Korablev, Gregory Danforth

13 Mar 2012

TL;DR: In this paper, the authors present a system and method that secures access to data objects of an enterprise that includes multiple data objects and multiple user applications that access data attributes of the data objects.

...read moreread less

Abstract: Some embodiments provide a system and method that secures access to data objects of an enterprise that includes multiple data objects and multiple user applications that access data attributes of the data objects. In some embodiments, secure access is provided via a secure resource that secures access to data attributes of at least two objects by defining access control permissions for the secure resource and applying the defined access control permissions to the data attributes of the secure resource.

...read moreread less

Collapse