scispace - formally typeset
Search or ask a question

Showing papers on "Data access published in 2017"


Proceedings ArticleDOI
05 Jun 2017
TL;DR: This paper proposes a blockchain platform architecture for clinical trial and precision medicine and discusses various design aspects and provides some insights in the technology requirements and challenges.
Abstract: This paper proposes a blockchain platform architecture for clinical trial and precision medicine and discusses various design aspects and provides some insights in the technology requirements and challenges. We identify 4 new system architecture components that are required to be built on top of traditional blockchain and discuss their technology challenges in our blockchain platform: (a) a new blockchain based general distributed and parallel computing paradigm component to devise and study parallel computing methodology for big data analytics, (b) blockchain application data management component for data integrity, big data integration, and integrating disparity of medical related data, (c) verifiable anonymous identity management component for identity privacy for both person and Internet of Things (IoT) devices and secure data access to make possible of the patient centric medicine, and (d) trust data sharing management component to enable a trust medical data ecosystem for collaborative research.

172 citations


Proceedings ArticleDOI
22 Oct 2017
TL;DR: The proposed architecture facilitates IoT communications on top of a software stack of blockchains and peer-to-peer data storage mechanisms to have privacy built into it, and to be adaptable for various IoT use cases.
Abstract: Blockchain, the underlying technology of cryptocurrency networks like Bitcoin, can prove to be essential towards realizing the vision of a decentralized, secure, and open Internet of Things (IoT) revolution. There is a growing interest in many research groups towards leveraging blockchains to provide IoT data privacy without the need for a centralized data access model. This paper aims to propose a decentralized access model for IoT data, using a network architecture that we call a modular consortium architecture for IoT and blockchains. The proposed architecture facilitates IoT communications on top of a software stack of blockchains and peer-to-peer data storage mechanisms. The architecture is aimed to have privacy built into it, and to be adaptable for various IoT use cases. To understand the feasibility and deployment considerations for implementing the proposed architecture, we conduct performance analysis of existing blockchain development platforms, Ethereum and Monax.

145 citations


Journal ArticleDOI
01 Jul 2017
TL;DR: A scheme to control data access in cloud computing based on trust evaluated by the data owner and/or reputations generated by a number of reputation centers in a flexible manner is proposed by applying Attribue-Based Encryption and Proxy Re-Encryption.
Abstract: Cloud computing offers a new way of services and has become a popular service platform. Storing user data at a cloud data center greatly releases storage burden of user devices and brings access convenience. Due to distrust in cloud service providers, users generally store their crucial data in an encrypted form. But in many cases, the data need to be accessed by other entities for fulfilling an expected service, e.g., an eHealth service. How to control personal data access at cloud is a critical issue. Various application scenarios request flexible control on cloud data access based on data owner policies and application demands. Either data owners or some trusted third parties or both should flexibly participate in this control. However, existing work hasn't yet investigated an effective and flexible solution to satisfy this demand. On the other hand, trust plays an important role in data sharing. It helps overcoming uncertainty and avoiding potential risks. But literature still lacks a practical solution to control cloud data access based on trust and reputation. In this paper, we propose a scheme to control data access in cloud computing based on trust evaluated by the data owner and/or reputations generated by a number of reputation centers in a flexible manner by applying Attribue-Based Encryption and Proxy Re-Encryption. We integrate the concept of context-aware trust and reputation evaluation into a cryptographic system in order to support various control scenarios and strategies. The security and performance of our scheme are evaluated and justified through extensive analysis, security proof, comparison and implementation. The results show the efficiency, flexibility and effectiveness of our scheme for data access control in cloud computing.

124 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper illustrates the specific problems and the benefits of the blockchain technology for the deployment of a secure and a scalable solution for medical data exchange in order to have the best performance possible.
Abstract: eHealth is a technology that is growing in importance over time, varying from remote access to Medical Records, such as Electronic Health Records (EHR), or Electronic Medical Records (EMR), to real-time data exchange from different on-body sensors coming from different patients. With this huge amount of critical data being exchanged, problems and challenges arise. Privacy and confidentiality of this critical medical data are of high concern to the patients and authorized persons to use this data. On the other hand, scalability and interoperability are also important problems that should be considered in the final solution. This paper illustrates the specific problems and highlights the benefits of the blockchain technology for the deployment of a secure and a scalable solution for medical data exchange in order to have the best performance possible.

115 citations


Journal ArticleDOI
TL;DR: The Water Quality Portal (WQP) as mentioned in this paper is the largest standardized water quality data set available at the time of this writing, with more than 290 million records from more than 2.7 million sites in groundwater, inland, and coastal waters.
Abstract: Aquatic systems are critical to food, security, and society. But, water data are collected by hundreds of research groups and organizations, many of which use nonstandard or inconsistent data descriptions and dissemination, and disparities across different types of water observation systems represent a major challenge for freshwater research. To address this issue, the Water Quality Portal (WQP) was developed by the U.S. Environmental Protection Agency, the U.S. Geological Survey, and the National Water Quality Monitoring Council to be a single point of access for water quality data dating back more than a century. The WQP is the largest standardized water quality data set available at the time of this writing, with more than 290 million records from more than 2.7 million sites in groundwater, inland, and coastal waters. The number of data contributors, data consumers, and third-party application developers making use of the WQP is growing rapidly. Here we introduce the WQP, including an overview of data, the standardized data model, and data access and services; and we describe challenges and opportunities associated with using WQP data. We also demonstrate through an example the value of the WQP data by characterizing seasonal variation in lake water clarity for regions of the continental U.S. The code used to access, download, analyze, and display these WQP data as shown in the figures is included as supporting information.

115 citations


Journal ArticleDOI
TL;DR: This work has developed a deployment module to create ontologies and mappings from relational databases in a semi-automatic fashion; a query processing module to perform and optimise the process of translating ontological queries into data queries and their execution over either a single DB of federated DBs; and a query formulation module to support query construction for engineers with a limited IT background.

105 citations


Journal ArticleDOI
TL;DR: A novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications is presented and 15 popular workflow management Systems are classified in terms of workflow execution models, heterogeneous computing environments, and data access methods.

100 citations


Journal ArticleDOI
TL;DR: A data-oriented M2M messaging mechanism based on ZeroMQ for the ubiquitous data access in rich sensing pervasive industrial applications is presented and the results demonstrate the feasibility of the proposed messaging mechanism.
Abstract: Machine-to-machine (M2M) communication is a key enabling technology for the future industrial Internet of Things applications. It plays an important role in the connectivity and integration of computerized machines, such as sensors, actuators, controllers, and robots. The requirements in flexibility, efficiency, and cross-platform compatibility of the intermodule communication between the connected machines raise challenges for the M2M messaging mechanism toward ubiquitous data access and events notification. This investigation determines the challenges facing the M2M communication of industrial systems and presents a data-oriented M2M messaging mechanism based on ZeroMQ for the ubiquitous data access in rich sensing pervasive industrial applications. To prove the feasibility of the proposed solution, the EU funded PickNPack production line with a reference industrial network architecture is presented, and the communication between a microwave sensor device and the quality assessment and sensing module controller of the PickNPack line is illustrated as a case study. The evaluation is carried out through qualitative analysis and experimental studies, and the results demonstrate the feasibility of the proposed messaging mechanism. Due to the flexibility in dealing with hierarchical system architecture and cross-platform heterogeneity of industrial applications, this messaging mechanism deserves extensive investigations and further evaluations.

96 citations


Journal ArticleDOI
TL;DR: The Optique platform is introduced as a suitable OBDA solution for Siemens with a number of novel techniques and components including a deployment module, BootOX for ontology and mapping bootstrapping, a query language STARQL that allows for a uniform querying of both streaming and static data, and a query formulation interface, OptiqueVQS, that allows to formulate STARQL queries without prior knowledge of its formal syntax.

84 citations


Proceedings ArticleDOI
Kedar Dhamdhere1, Kevin Snow McCurley1, Ralfi Nahmias1, Mukund Sundararajan1, Qiqi Yan1 
07 Mar 2017
TL;DR: Analyza, a system that helps lay users explore data and discuss the key design decisions in implementing this system, including how to mix structured and natural language modalities, how to use conversation to disambiguate and simplify querying, and how to efficiently curate the data.
Abstract: We describe Analyza, a system that helps lay users explore data. Analyza has been used within two large real world systems. The first is a question-and-answer feature in a spreadsheet product. The second provides convenient access to a revenue/inventory database for a large sales force. Both user bases consist of users who do not necessarily have coding skills, demonstrating Analyza's ability to democratize access to data. We discuss the key design decisions in implementing this system. For instance, how to mix structured and natural language modalities, how to use conversation to disambiguate and simplify querying, how to rely on the ``semantics' of the data to compensate for the lack of syntactic structure, and how to efficiently curate the data.

78 citations


Journal ArticleDOI
TL;DR: This paper employs the attribute-based encryption with decryption outsourcing to encrypt the published data, such that the publishers can control the data access by themselves and the major decryption overhead can be shift from the subscribers devices to the cloud server.

Proceedings ArticleDOI
12 Sep 2017
TL;DR: This paper illustrates an architecture based on blockchain technology, and a protocol for data access, using smart contracts and a publisher-subscriber mechanism.
Abstract: In the past few years, the number of wireless devices connected to the Internet has increased to a number that could reach billions in the next few years. While cloud computing is being seen as the solution to process this data, security challenges could not be addressed solely with this technology. Security problems will continue to increase with such a model, especially for private and sensitive data such as data personal and medical data collected with more and more sophisticated connected devices (forming the IoT). Thus the need for a fully decentralized peer to peer and secure technology to overcome these problems. The blockchain Technology is a promising approach giving the properties it brings to the field. This paper illustrates an architecture based on blockchain technology, and a protocol for data access, using smart contracts and a publisher-subscriber mechanism.

Proceedings ArticleDOI
01 Jun 2017
TL;DR: A risk-based access control model for IoT technology that takes into account real-time data information request for IoT devices and gives dynamic feedback and uses smart contracts to provide adaptive features in which the user behaviour is monitored to detect any abnormal actions from authorized users.
Abstract: The Internet of Things (IoT) is creating a revolution in the number of connected devices. Cisco reported that there were 25 billion IoT devices in 2015 and modest estimation that this number will almost double by 2020. Society has become dependent on these billions of devices, devices that are connected and communicating with each other all the time with information constantly share between users, services, and internet providers. The emergent IoT devices as a technology are creating a huge security rift between users and usability, sacrificing usability for security created a number of major issues. First, IoT devices are classified under Bring Your Own Device (BYOD) that blows any organization security boundary and make them a target for espionage or tracking. Second, the size of the data generated from IoT makes big data problems pale in comparison not to mention IoT devices need a real-time response. Third, is incorporating secure access and control for IoT devices ranging from edge nodes devices to application level (business intelligence reporting tools) is a challenge because it has to account for several hardware and application levels. Establishing a secure access control model between different IoT devices and services is a major milestone for the IoT. This is important because data leakage and unauthorized access to data have a high impact on our IoT devices. However, traditional access control models with the static and rigid infrastructure cannot provide the required security for the IoT infrastructure. Therefore, this paper proposes a risk-based access control model for IoT technology that takes into account real-time data information request for IoT devices and gives dynamic feedback. The proposed model uses IoT environment features to estimate the security risk associated with each access request using user context, resource sensitivity, action severity and risk history as inputs for security risk estimation algorithm that is responsible for access decision. Then the proposed model uses smart contracts to provide adaptive features in which the user behaviour is monitored to detect any abnormal actions from authorized users.

Journal ArticleDOI
11 Sep 2017
TL;DR: This paper presents the design and implementation of ProtectMyPrivacy (PmP) for Android, which can detect critical contextual information at runtime when privacy-sensitive data accesses occur and infers the purpose of the data access, i.e. whether the dataAccess is by a third-party library or by the app itself for its functionality.
Abstract: The enormous popularity of smartphones, their rich sensing capabilities, and the data they have about their users have lead to millions of apps being developed and used. However, these capabilities have also led to numerous privacy concerns. Platform manufacturers, as well as researchers, have proposed numerous ways of mitigating these concerns, primarily by providing fine-grained visibility and privacy controls to the user on a per-app basis. In this paper, we show that this per-app permission approach is suboptimal for many apps, primarily because most data accesses occur due to a small set of popular third-party libraries which are common across multiple apps. To address this problem, we present the design and implementation of ProtectMyPrivacy (PmP) for Android, which can detect critical contextual information at runtime when privacy-sensitive data accesses occur. In particular, PmP infers the purpose of the data access, i.e. whether the data access is by a third-party library or by the app itself for its functionality. Based on crowdsourced data, we show that there are in fact a set of 30 libraries which are responsible for more than half of private data accesses. Controlling sensitive data accessed by these libraries can therefore be an effective mechanism for managing their privacy. We deployed our PmP app to 1,321 real users, showing that the number of privacy decisions that users have to make are significantly reduced. In addition, we show that our users are better protected against data leakage when using our new library-based blocking mechanism as compared to the traditional app-level permission mechanisms.

Proceedings ArticleDOI
06 Nov 2017
TL;DR: This work proposes the first end-to-end framework to build an NL2API for a given web API, and applies it to real-world APIs, and shows that it can collect high-quality training data at a low cost, and build NL2APIs with good performance from scratch.
Abstract: As the Web evolves towards a service-oriented architecture, application program interfaces (APIs) are becoming an increasingly important way to provide access to data, services, and devices. We study the problem of natural language interface to APIs (NL2APIs), with a focus on web APIs for web services. Such NL2APIs have many potential benefits, for example, facilitating the integration of web services into virtual assistants. We propose the first end-to-end framework to build an NL2API for a given web API. A key challenge is to collect training data, i.e., NL command-API call pairs, from which an NL2API can learn the semantic mapping from ambiguous, informal NL commands to formal API calls. We propose a novel approach to collect training data for NL2API via crowdsourcing, where crowd workers are employed to generate diversified NL commands. We optimize the crowdsourcing process to further reduce the cost. More specifically, we propose a novel hierarchical probabilistic model for the crowdsourcing process, which guides us to allocate budget to those API calls that have a high value for training NL2APIs. We apply our framework to real-world APIs, and show that it can collect high-quality training data at a low cost, and build NL2APIs with good performance from scratch. We also show that our modeling of the crowdsourcing process can improve its effectiveness, such that the training data collected via our approach leads to better performance of NL2APIs than a strong baseline.

Journal ArticleDOI
TL;DR: In this article, a behavioral biometric signature-based authentication mechanism is proposed to ensure the security of e-medical data access in cloud-based healthcare management system, which achieves high accuracy rate for secure data access and retrieval.

Proceedings Article
01 Jan 2017
TL;DR: It is demonstrated by two real-world use cases that nonrecursive datalogMTL programs can express complex temporal concepts from typical user queries and thereby facilitate access to log data.
Abstract: We advocate datalogMTL, a datalog extension of a Horn fragment of the metric temporal logic MTL, as a language for ontology-based access to temporal log data. We show that datalogMTL is EXPSPACE-complete even with punctual intervals, in which case MTL is known to be undecidable. Nonrecursive datalogMTL turns out to be PSPACE-complete for combined complexity and in AC0 for data complexity. We demonstrate by two real-world use cases that nonrecursive datalogMTL programs can express complex temporal concepts from typical user queries and thereby facilitate access to log data. Our experiments with Siemens turbine data and MesoWest weather data show that datalogMTL ontology-mediated queries are efficient and scale on large datasets of up to 11GB.

Proceedings Article
Qingda Hu1, Jinglei Ren2, Anirudh Badam2, Jiwu Shu1, Thomas Moscibroda2 
12 Jul 2017
TL;DR: This paper presents a log-structured NVMM system that not only maintains NVMM in a compact manner but also reduces the write traffic and the number of persist barriers needed for executing transactions.
Abstract: Emerging non-volatile main memory (NVMM) unlocks the performance potential of applications by storing persistent data in the main memory. Such applications require a lightweight persistent transactional memory (PTM) system, instead of a heavyweight filesystem or database, to have fast access to data. In a PTM system, the memory usage, both capacity and bandwidth, plays a key role in dictating performance and efficiency. Existing memory management mechanisms for PTMs generate high memory fragmentation, high write traffic and a large number of persist barriers, since data is first written to a log and then to the main data store. In this paper, we present a log-structured NVMM system that not only maintains NVMM in a compact manner but also reduces the write traffic and the number of persist barriers needed for executing transactions. All data allocations and modifications are appended to the log which becomes the location of the data. Further, we address a unique challenge of log-structured memory management by designing a tree-based address translation mechanism where access granularities are flexible and different from allocation granularities. Our results show that the new system enjoys up to 89.9% higher transaction throughput and up to 82.8% lower write traffic than a traditional PTM system.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: A blockchain-based data usage auditing architecture ensuring availability and accountability in a privacy-preserving fashion and based on cryptographic mechanisms, preserves privacy of data owners and ensures secrecy for shared data with multiple service providers.
Abstract: Recent years have witnessed the trend of increasingly relying on distributed infrastructures. This increased the number of reported incidents of security breaches compromising users' privacy, where third parties massively collect, process and manage users' personal data. Towards these security and privacy challenges, we combine hierarchical identity based cryptographic mechanisms with emerging blockchain infrastructures and propose a blockchain-based data usage auditing architecture ensuring availability and accountability in a privacy-preserving fashion. Our approach relies on the use of auditable contracts deployed in blockchain infrastructures. Thus, it offers transparent and controlled data access, sharing and processing, so that unauthorized users or untrusted servers cannot process data without client's authorization. Moreover, based on cryptographic mechanisms, our solution preserves privacy of data owners and ensures secrecy for shared data with multiple service providers. It also provides auditing authorities with tamper-proof evidences for data usage compliance.

Journal ArticleDOI
TL;DR: A platform for sharing medical imaging data between clinicians and researchers that automates anonymisation of pixel data and metadata at the clinical site and maintains subject data groupings while preserving anonymity.

Journal ArticleDOI
TL;DR: The efficient cataloguing approach of the federated query processing system ’BioFed’, the triple pattern wise source selection and the semantic source normalisation forms the core to the solution and facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data.
Abstract: Biomedical data, e.g. from knowledge bases and ontologies, is increasingly made available following open linked data principles, at best as RDF triple data. This is a necessary step towards unified access to biological data sets, but this still requires solutions to query multiple endpoints for their heterogeneous data to eventually retrieve all the meaningful information. Suggested solutions are based on query federation approaches, which require the submission of SPARQL queries to endpoints. Due to the size and complexity of available data, these solutions have to be optimised for efficient retrieval times and for users in life sciences research. Last but not least, over time, the reliability of data resources in terms of access and quality have to be monitored. Our solution (BioFed) federates data over 130 SPARQL endpoints in life sciences and tailors query submission according to the provenance information. BioFed has been evaluated against the state of the art solution FedX and forms an important benchmark for the life science domain. The efficient cataloguing approach of the federated query processing system ’BioFed’, the triple pattern wise source selection and the semantic source normalisation forms the core to our solution. It gathers and integrates data from newly identified public endpoints for federated access. Basic provenance information is linked to the retrieved data. Last but not least, BioFed makes use of the latest SPARQL standard (i.e., 1.1) to leverage the full benefits for query federation. The evaluation is based on 10 simple and 10 complex queries, which address data in 10 major and very popular data sources (e.g., Dugbank, Sider). BioFed is a solution for a single-point-of-access for a large number of SPARQL endpoints providing life science data. It facilitates efficient query generation for data access and provides basic provenance information in combination with the retrieved data. BioFed fully supports SPARQL 1.1 and gives access to the endpoint’s availability based on the EndpointData graph. Our evaluation of BioFed against FedX is based on 20 heterogeneous federated SPARQL queries and shows competitive execution performance in comparison to FedX, which can be attributed to the provision of provenance information for the source selection. Developing and testing federated query engines for life sciences data is still a challenging task. According to our findings, it is advantageous to optimise the source selection. The cataloguing of SPARQL endpoints, including type and property indexing, leads to efficient querying of data resources over the Web of Data. This could even be further improved through the use of ontologies, e.g., for abstract normalisation of query terms.

Book ChapterDOI
28 Jun 2017
TL;DR: This work exploits a framework and associated methodology for the extraction of XES event logs from relational data sources that have recently been introduced, and builds on the ontology-based data access (OBDA) paradigm for the actual log extraction.
Abstract: Process mining aims at discovering, monitoring, and improving business processes by extracting knowledge from event logs. In this respect, process mining can be applied only if there are proper event logs that are compatible with accepted standards, such as extensible event stream (XES). Unfortunately, in many real world set-ups, such event logs are not explicitly given, but instead are implicitly represented in legacy information systems. In this work, we exploit a framework and associated methodology for the extraction of XES event logs from relational data sources that we have recently introduced. Our approach is based on describing logs by means of suitable annotations of a conceptual model of the available data, and builds on the ontology-based data access (OBDA) paradigm for the actual log extraction. Making use of a real-world case study in the services domain, we compare our novel approach with a more traditional extract-transform-load based one, and are able to illustrate its added value. We also present a set of tools that we have developed and that support the OBDA-based log extraction framework. The tools are integrated as plugins of the ProM process mining suite.

Proceedings ArticleDOI
04 Oct 2017
TL;DR: DPCM is designed, which reduces data access latency through parallel processing approaches and exploiting device-side state replica and is implemented and validated with extensive evaluations.
Abstract: Control-plane operations are indispensable to providing data access to mobile devices in the 4G LTE networks. They provision necessary control states at the device and network nodes to enable data access. However, the current design may suffer from long data access latency even under good radio conditions. The fundamental problem is that, data-plane packet delivery cannot start or resume until all control-plane procedures are completed, and these control procedures run sequentially by design. We show both are more than necessary under popular use cases. We design DPCM, which reduces data access latency through parallel processing approaches and exploiting device-side state replica. We implement DPCM and validate its effectiveness with extensive evaluations.

Book ChapterDOI
06 Dec 2017
TL;DR: This paper proposes using the trusted execution platform enabled by Intel SGX to provide accountability for data access and proposes a decentralized approach with blockchain technology to address the privacy concern.
Abstract: With the increasing development and adoption of wearable devices, people care more about their health conditions than ever before. Both patients and doctors as well as insurance agencies benefit from this advanced technology. However, the emerging wearable devices creates a major concern over health data privacy as data collected from those devices can reflect patients’ heath conditions and habits, and could increase the data disclosure risks among the healthcare providers and application vendors. In this paper, we propose using the trusted execution platform enabled by Intel SGX to provide accountability for data access and propose a decentralized approach with blockchain technology to address the privacy concern. By developing a web application for personal health data management (PHDM) systems, the individuals are capable of synchronizing sensor data from wearable devices with online account and controlling data access from any third parties. The protected personal health data and data access records are hashed and anchored to a permanent but secure ledger with platform dependency, ensuring data integrity and accountability. Analysis shows that our approach provides user privacy and accountability with acceptable overhead.

Journal ArticleDOI
01 Jan 2017
TL;DR: In this article, the authors review the research challenges in building personal Databoxes that hold personal data and enable data access by other parties and potentially thus sharing of data with other parties.
Abstract: The Internet of Things is expected to generate large amounts of heterogeneous data from diverse sources including physical sensors, user devices and social media platforms. Over the last few years, significant attention has been focused on personal data, particularly data generated by smart wearable and smart home devices. Making personal data available for access and trade is expected to become a part of the data-driven digital economy. In this position paper, we review the research challenges in building personal Databoxes that hold personal data and enable data access by other parties and potentially thus sharing of data with other parties. These Databoxes are expected to become a core part of future data marketplaces. Copyright © 2016 The Authors Transactions on Emerging Telecommunications Technologies Published by John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A text mining based method to infer the purpose of sensitive data access by Android apps is proposed, to extract multiple features from app code and then use those features to train a machine learning classifier for purpose inference.
Abstract: Mobile apps frequently request access to sensitive data, such as location and contacts. Understanding the purpose of why sensitive data is accessed could help improve privacy as well as enable new kinds of access control. In this article, we propose a text mining based method to infer the purpose of sensitive data access by Android apps. The key idea we propose is to extract multiple features from app code and then use those features to train a machine learning classifier for purpose inference. We present the design, implementation, and evaluation of two complementary approaches to infer the purpose of permission use, first using purely static analysis, and then using primarily dynamic analysis. We also discuss the pros and cons of both approaches and the trade-offs involved.

Journal ArticleDOI
TL;DR: An overview of WebMeV is provided and two simple use cases are demonstrated that illustrate the value of putting data analysis in the hands of those looking to explore the underlying biology of the systems being studied.
Abstract: Although large, complex genomic datasets are increasingly easy to generate, and the number of publicly available datasets in cancer and other diseases is rapidly growing, the lack of intuitive, easy-to-use analysis tools has remained a barrier to the effective use of such data. WebMeV (http://mev.tm4.org) is an open-source, web-based tool that gives users access to sophisticated tools for analysis of RNA-Seq and other data in an interface designed to democratize data access. WebMeV combines cloud-based technologies with a simple user interface to allow users to access large public datasets, such as that from The Cancer Genome Atlas or to upload their own. The interface allows users to visualize data and to apply advanced data mining analysis methods to explore the data and draw biologically meaningful conclusions. We provide an overview of WebMeV and demonstrate two simple use cases that illustrate the value of putting data analysis in the hands of those looking to explore the underlying biology of the systems being studied. Cancer Res; 77(21); e11-14. ©2017 AACR.

Journal ArticleDOI
TL;DR: The increased availability of large remote sensing datasets is generating heightened interest within the geoscience community, and more generally within human society.

Journal ArticleDOI
01 Apr 2017
TL;DR: In this article, a framework for modular data access is presented, in which individual data accessors for simple data structures may be freely combined to obtain more complex data accessor for compound data structures.
Abstract: CONTEXT: Data accessors allow one to read and write components of a data structure, such as the fields of a record, the variants of a union, or the elements of a container. These data accessors are collectively known as optics; they are fundamental to programs that manipulate complex data. INQUIRY: Individual data accessors for simple data structures are easy to write, for example as pairs of "getter" and "setter" methods. However, it is not obvious how to combine data accessors, in such a way that data accessors for a compound data structure are composed out of smaller data accessors for the parts of that structure. Generally, one has to write a sequence of statements or declarations that navigate step by step through the data structure, accessing one level at a time - which is to say, data accessors are traditionally not first-class citizens, combinable in their own right. APPROACH: We present a framework for modular data access, in which individual data accessors for simple data structures may be freely combined to obtain more complex data accessors for compound data structures. Data accessors become first-class citizens. The framework is based around the notion of profunctors, a flexible generalization of functions. KNOWLEDGE: The language features required are higher-order functions ("lambdas" or "closures"), parametrized types ("generics" or "abstract types"), and some mechanism for separating interfaces from implementations ("abstract classes" or "modules"). We use Haskell as a vehicle in which to present our constructions, but languages such as Java, C#, or Scala that provide the necessary features should work just as well. GROUNDING: We provide implementations of all our constructions, in the form of a literate program: the manuscript file for the paper is also the source code for the program, and the extracted code is available separately for evaluation. We also prove the essential properties demonstrating that our profunctor-based representations are precisely equivalent to the more familiar concrete representations. IMPORTANCE: Our results should pave the way to simpler ways of writing programs that access the components of compound data structures.

Journal ArticleDOI
TL;DR: A web-based application is developed that convincingly confirms the usefulness of the novel data integration methodology, based on a metamodel approach, to query data individually from different relational and NoSQL database systems.