scispace - formally typeset
Search or ask a question

Showing papers on "Distributed database published in 2016"


Proceedings ArticleDOI
01 Dec 2016
TL;DR: This paper explains the concept, characteristics, need, and how Bitcoin works and attempts to highlights role of Blockchain in shaping the future of banking, financial institutions and adoption of Internet of Things(IoT).
Abstract: Blockchain is a decentralized ledger used to securely exchange digital currency, perform deals and transactions. Each member of the network has access to the latest copy of encrypted ledger so that they can validate a new transaction. Blockchain ledger is a collection of all Bitcoin transactions executed in the past. Basically, it's a distributed database which maintains a continuously growing tamper proof data structure blocks which holds batches of individual transactions. The completed blocks are added in a linear and chronological order. Each block contains a timestamp and information link which points to a previous block. Bitcoin is peer-to-peer permission-less network which allows every user to connect to the network and send new transaction to verify and create new blocks. Satoshi Nakamoto described design of Bitcoin digital currency in his research paper posted to cryptography listserv in 2008. Nakamoto's suggestion has solved long pending problem of cryptographers and laid the foundation stone for digital currency. This paper explains the concept, characteristics, need of Blockchain and how Bitcoin works. It attempts to highlights role of Blockchain in shaping the future of banking, financial institutions and adoption of Internet of Things(IoT).

149 citations


Journal ArticleDOI
TL;DR: This paper proposes a sensor-integrated radio frequency identification (RFID) data repository-implementation model using MongoDB, the most popular big data-savvy document-oriented database system now, and devise a data repository schema that can effectively integrate and store the heterogeneous IoT data sources.
Abstract: Internet of Things (IoT)-generated data are characterized by its continuous generation, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such IoT-generated data due to the limited processing speed and the significant storage-expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement IoT-generated data repositories. In this paper, we propose a sensor-integrated radio frequency identification (RFID) data repository-implementation model using MongoDB, the most popular big data-savvy document-oriented database system now. First, we devise a data repository schema that can effectively integrate and store the heterogeneous IoT data sources, such as RFID, sensor, and GPS, by extending the event data types in electronic product code information services standard, a de facto standard for the information exchange services for RFID-based traceability. Second, we propose an effective shard key to maximize query speed and uniform data distribution over data servers. Last, through a series of experiments measuring query speed and the level of data distribution, we show that the proposed design strategy, which is based on horizontal data partitioning and a compound shard key, is effective and efficient for the IoT-generated RFID/sensor big data.

128 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: The protocols for highly available transactions, and an experimental evaluation showing that Cure is able to achieve scalability similar to eventually-consistent NoSQL databases, while providing stronger guarantees.
Abstract: Developers of cloud-scale applications face a difficult decision of which kind of storage to use, summarised by the CAP theorem. Currently the choice is between classical CP databases, which provide strong guarantees but are slow, expensive, and unavailable under partition, and NoSQL-style AP databases, which are fast and available, but too hard to program against. We present an alternative: Cure provides the highest level of guarantees that remains compatible with availability. These guarantees include: causal consistency (no ordering anomalies), atomicity (consistent multi-key updates), and support for high-level data types (developer friendly API) with safe resolution of concurrent updates (guaranteeing convergence). These guarantees minimise the anomalies caused by parallelism and distribution, thus facilitating the development of applications. This paper presents the protocols for highly available transactions, and an experimental evaluation showing that Cure is able to achieve scalability similar to eventually-consistent NoSQL databases, while providing stronger guarantees.

125 citations


Journal ArticleDOI
TL;DR: An analysis of cloud K-SVD is provided that gives insights into its properties as well as deviations of the dictionaries learned at individual sites from a centralized solution in terms of different measures of local/global data and topology of interconnections.
Abstract: This paper studies the problem of data-adaptive representations for big, distributed data. It is assumed that a number of geographically-distributed, interconnected sites have massive local data and they are interested in collaboratively learning a low-dimensional geometric structure underlying these data. In contrast with previous works on subspace-based data representations, this paper focuses on the geometric structure of a union of subspaces (UoS). In this regard, it proposes a distributed algorithm—termed cloud K-SVD—for collaborative learning of a UoS structure underlying distributed data of interest. The goal of cloud K-SVD is to learn a common overcomplete dictionary at each individual site such that every sample in the distributed data can be represented through a small number of atoms of the learned dictionary. Cloud K-SVD accomplishes this goal without requiring exchange of individual samples between sites. This makes it suitable for applications where sharing of raw data is discouraged due to either privacy concerns or large volumes of data. This paper also provides an analysis of cloud K-SVD that gives insights into its properties as well as deviations of the dictionaries learned at individual sites from a centralized solution in terms of different measures of local/global data and topology of interconnections. Finally, the paper numerically illustrates the efficacy of cloud K-SVD on real and synthetic distributed data.

99 citations


Proceedings ArticleDOI
19 Dec 2016
TL;DR: This paper presents blockchain and discusses key applications to network systems in the literature and its application to other systems than the cryptocurrency one.
Abstract: Recent interest about the blockchain technology brought questions about its application to other systems than the cryptocurrency one. In this paper we present blockchain and discuss key applications to network systems in the literature.

95 citations


Journal ArticleDOI
TL;DR: This paper develops a parallel and distributed implementation of a widely used technique for hyperspectral dimensionality reduction: principal component analysis (PCA), based on cloud computing architectures, taking full advantage of the high throughput access and high performance distributed computing capabilities of cloud computing environments.
Abstract: Cloud computing offers the possibility to store and process massive amounts of remotely sensed hyperspectral data in a distributed way. Dimensionality reduction is an important task in hyperspectral imaging, as hyperspectral data often contains redundancy that can be removed prior to analysis of the data in repositories. In this regard, the development of dimensionality reduction techniques in cloud computing environments can provide both efficient storage and preprocessing of the data. In this paper, we develop a parallel and distributed implementation of a widely used technique for hyperspectral dimensionality reduction: principal component analysis (PCA), based on cloud computing architectures. Our implementation utilizes Hadoop’s distributed file system (HDFS) to realize distributed storage, uses Apache Spark as the computing engine, and is developed based on the map-reduce parallel model, taking full advantage of the high throughput access and high performance distributed computing capabilities of cloud computing environments. We first optimized the traditional PCA algorithm to be well suited for parallel and distributed computing, and then we implemented it on a real cloud computing architecture. Our experimental results, conducted using several hyperspectral datasets, reveal very high performance for the proposed distributed parallel method.

94 citations


Proceedings ArticleDOI
01 Nov 2016
TL;DR: The proposed approach is called Optimal Telehealth Data Sharing Model (OTDSM), which considers transmission probabilities, maximizing network capacities, and timing constraints, and the experimental results have proved the flexibility and adoptability of the proposed method.
Abstract: The rapid development of tele-health systems have received driving engagements from various emerging techniques, such as big data and cloud computing. Sharing data among multiple tele-health systems is an adaptive approach for improving service quality via the network-based technologies. However, current implementations of data sharing in cloud computing is still facing the restrictions caused by the networking capacities and virtual machine switches. In this paper, we focus on the problem of data sharing obstacles in cloud computing and propose an approach that uses dynamic programming to produce optimal solutions to data sharing mechanisms. The proposed approach is called Optimal Telehealth Data Sharing Model (OTDSM), which considers transmission probabilities, maximizing network capacities, and timing constraints. Our experimental results have proved the flexibility and adoptability of the proposed method.

90 citations


Patent
07 Jan 2016
TL;DR: In this article, an efficient large scale search system for video and multi-media content using a distributed database and search, and tiered search servers is described, where content is classified using feature descriptors and geographical aspects, at feature level and in time segments.
Abstract: An efficient large scale search system for video and multi-media content using a distributed database and search, and tiered search servers is described. Selected content is stored at the distributed local database and tier1 search server(s). Content matching frequent queries, and frequent unidentified queries are cached at various levels in the search system. Content is classified using feature descriptors and geographical aspects, at feature level and in time segments. Queries not identified at clients and tier1 search server(s) are queried against tier2 or lower search server(s). Search servers use classification and geographical partitioning to reduce search cost. Methods for content tracking and local content searching are executed on clients. The client performs local search, monitoring and/or tracking of the query content with the reference content and local search with a database of reference fingerprints. This shifts the content search workload from central servers to the distributed monitoring clients.

89 citations


Journal ArticleDOI
TL;DR: An information-weighted, consensus-based, distributed multi-target tracking algorithm referred to as the Multi-target Information Consensus (MTIC) algorithm that is designed to address both the naivety and the data association problems and converges to the centralized minimum mean square error estimate.
Abstract: Distributed algorithms have recently gained immense popularity. With regards to computer vision applications, distributed multi-target tracking in a camera network is a fundamental problem. The goal is for all cameras to have accurate state estimates for all targets. Distributed estimation algorithms work by exchanging information between sensors that are communication neighbors. Vision-based distributed multi-target state estimation has at least two characteristics that distinguishes it from other applications. First, cameras are directional sensors and often neighboring sensors may not be sensing the same targets, i.e., they are naive with respect to that target. Second, in the presence of clutter and multiple targets, each camera must solve a data association problem. This paper presents an information-weighted, consensus-based, distributed multi-target tracking algorithm referred to as the Multi-target Information Consensus (MTIC) algorithm that is designed to address both the naivety and the data association problems. It converges to the centralized minimum mean square error estimate. The proposed MTIC algorithm and its extensions to non-linear camera models, termed as the Extended MTIC (EMTIC), are robust to false measurements and limited resources like power, bandwidth and the real-time operational requirements. Simulation and experimental analysis are provided to support the theoretical results.

87 citations


Journal ArticleDOI
01 Nov 2016
TL;DR: A new on-line partitioning approach, called Clay, that supports both tree-based schemas and more complex "general" schemas with arbitrary foreign key relationships is presented and it is shown that it can generate partitioning schemes that enable the system to achieve up to 15× better throughput and 99% lower latency than existing approaches.
Abstract: Transaction processing database management systems (DBMSs) are critical for today's data-intensive applications because they enable an organization to quickly ingest and query new information. Many of these applications exceed the capabilities of a single server, and thus their database has to be deployed in a distributed DBMS. The key factor affecting such a system's performance is how the database is partitioned. If the database is partitioned incorrectly, the number of distributed transactions can be high. These transactions have to synchronize their operations over the network, which is considerably slower and leads to poor performance. Previous work on elastic database repartitioning has focused on a certain class of applications whose database schema can be represented in a hierarchical tree structure. But many applications cannot be partitioned in this manner, and thus are subject to distributed transactions that impede their performance and scalability.In this paper, we present a new on-line partitioning approach, called Clay, that supports both tree-based schemas and more complex "general" schemas with arbitrary foreign key relationships. Clay dynamically creates blocks of tuples to migrate among servers during repartitioning, placing no constraints on the schema but taking care to balance load and reduce the amount of data migrated. Clay achieves this goal by including in each block a set of hot tuples and other tuples co-accessed with these hot tuples. To evaluate our approach, we integrate Clay in a distributed, main-memory DBMS and show that it can generate partitioning schemes that enable the system to achieve up to 15× better throughput and 99% lower latency than existing approaches.

82 citations


Patent
22 Nov 2016
TL;DR: In this paper, the authors present a system and a method for creating a holistic, flexible, scalable, confidential, low-latency, high-volume, immutable distributed ledger for financial services and other industries.
Abstract: A system and a method for creating a holistic, flexible, scalable, confidential, low-latency, high-volume, immutable distributed ledger for the financial services and other industries The system allows a scalable blockchain solution with respect to accessible memory requirements of distributed ledgers or distributed databases with confidentiality in the shared records as well as accommodating low-latency, high-capacity transaction capabilities The method includes a fundamental, generic, logical representation of financial services life-cycles transactions in terms of variable sets of four simple, sequential components The optimal process generates a self-validating, variable n-dimensional, multi-hash-linked, interdependent distributed ledger that allows the individual network participants to recreate the ledger without having to refer to or confirm with other network participants

Journal ArticleDOI
Ningnan Zhou1, Wayne Xin Zhao1, Xiao Zhang1, Ji-Rong Wen1, Shan Wang1 
TL;DR: This paper proposes a Multi-Context Trajectory Embedding Model, called MC-TEM, to explore contexts in a systematic way, and is the first time that the distributed representation learning methods apply to trajectory data.
Abstract: The proliferation of location-based social networks, such as Foursquare and Facebook Places, offers a variety of ways to record human mobility, including user generated geo-tagged contents, check-in services, and mobile apps. Although trajectory data is of great value to many applications, it is challenging to analyze and mine trajectory data due to the complex characteristics reflected in human mobility, which is affected by multiple contextual information. In this paper, we propose a Multi-Context Trajectory Embedding Model, called MC-TEM, to explore contexts in a systematic way. MC-TEM is developed in the distributed representation learning framework, and it is flexible to characterize various kinds of useful contexts for different applications. To the best of our knowledge, it is the first time that the distributed representation learning methods apply to trajectory data. We formally incorporate multiple context information of trajectory data into the proposed model, including user-level, trajectory-level, location-level, and temporal contexts. All the context information is represented in the same embedding space. We apply MC-TEM to two challenging tasks, namely location recommendation and social link prediction. We conduct extensive experiments on three real-world datasets. Extensive experiment results have demonstrated the superiority of our MC-TEM model over several state-of-the-art methods.

Journal ArticleDOI
TL;DR: A new architecture of DaVe is introduced to utilize efficiently the potential resource from connected vehicles and to mitigate the congestion problem in other data networks.
Abstract: The promising connected vehicle technologies will enable a huge network of roadside units (RSUs) and vehicles equipped with communication, computing, storage, and positioning devices. Current research on connected vehicle networks focuses on delivering the data generated from or required by the vehicle networks themselves, of which the data traffic is light; thus, the vehicle-network resource utilization efficiency is low. On the other hand, a large amount of delay-tolerant traffic in other data networks consumes significant communication resources. In this paper, we introduce a new architecture of DaVe to utilize efficiently the potential resource from connected vehicles and to mitigate the congestion problem in other data networks. Delay-tolerant data traffic is offloaded from the data networks to the connected vehicle networks without extra infrastructure/hardware deployment. An optimal distributed data hopping mechanism is also proposed to enable delay-tolerant data routing over connected vehicle networks. We formulate the next-hop decision optimization problem as a partially observable Markov decision process (POMDP) and propose a heuristic algorithm to reduce computational complexity. Extensive simulation results are also presented to demonstrate the significant performance improvement of the proposed scheme.

Journal ArticleDOI
TL;DR: This paper proposes Selective Data replication mechanism in Distributed Datacenters (SD3), where in SD3, a datacenter jointly considers update rate and visit rate to select user data for replication, and further atomizes a user's different types of data for replicate, making sure that a replica always reduces inter-datacenter communication.
Abstract: Though the new OSN model, which deploys datacenters globally, helps reduce service latency, it causes higher inter-datacenter communication load. In Facebook, each datacenter has a full copy of all data, and the master datacenter updates all other datacenters, generating tremendous load in this new model. Distributed data storage, which only stores a user's data to his/her geographically closest datacenters mitigates the problem. However, frequent interactions between distant users lead to frequent inter-datacenter communication and hence long service latencies. In this paper, we aim to reduce inter-datacenter communications while still achieving low service latency. We first verify the benefits of the new model and present OSN typical properties that underlie the basis of our design. We then propose Selective Data replication mechanism in Distributed Datacenters ( $SD^3$ ). Since replicas need inter-datacenter data updates, datacenters in $SD^3$ jointly consider update rates and visit rates to select user data for replication; furthermore, $SD^3$ atomizes users’ different types of data (e.g., status update, friend post, music) for replication, ensuring that a replica always reduces inter-datacenter communication. $SD^3$ also incorporates three strategies to further enhance its performance: locality-aware multicast update tree, replica deactivation, and datacenter congestion control. The results of trace-driven experiments on the real-world PlanetLab testbed demonstrate the higher efficiency and effectiveness of $SD^3$ in comparison to other replication methods and the effectiveness of its three schemes.

Journal ArticleDOI
01 Sep 2016
TL;DR: The architecture of the MemSQL Query Optimizer is described and the design choices and innovations which enable it to quickly produce highly efficient execution plans for complex distributed queries are described.
Abstract: Real-time analytics on massive datasets has become a very common need in many enterprises. These applications require not only rapid data ingest, but also quick answers to analytical queries operating on the latest data. MemSQL is a distributed SQL database designed to exploit memory-optimized, scale-out architecture to enable real-time transactional and analytical workloads which are fast, highly concurrent, and extremely scalable. Many analytical queries in MemSQL's customer workloads are complex queries involving joins, aggregations, sub-queries, etc. over star and snowflake schemas, often ad-hoc or produced interactively by business intelligence tools. These queries often require latencies of seconds or less, and therefore require the optimizer to not only produce a high quality distributed execution plan, but also produce it fast enough so that optimization time does not become a bottleneck.In this paper, we describe the architecture of the MemSQL Query Optimizer and the design choices and innovations which enable it quickly produce highly efficient execution plans for complex distributed queries. We discuss how query rewrite decisions oblivious of distribution cost can lead to poor distributed execution plans, and argue that to choose high-quality plans in a distributed database, the optimizer needs to be distribution-aware in choosing join plans, applying query rewrites, and costing plans. We discuss methods to make join enumeration faster and more effective, such as a rewrite-based approach to exploit bushy joins in queries involving multiple star schemas without sacrificing optimization time. We demonstrate the effectiveness of the MemSQL optimizer over queries from the TPC-H benchmark and a real customer workload.

Journal ArticleDOI
TL;DR: This article presents a distributed communication architecture that implements smart grid communications in an efficient and cost-effective way and can manage and analyze data locally, rather than backhauling all raw data to the central operation center, leading to reduced cost and burden on communication resources.
Abstract: One big challenge in building a smart grid arises from the fast growing amount of data and limited communication resources. The traditional centralized communication architecture does not scale well with the explosive increase of data and has a high probability of encountering communication bottlenecks due to long communication paths. To address this challenging issue, this article presents a distributed communication architecture that implements smart grid communications in an efficient and cost-effective way. This distributed architecture consists of multiple distributed operation centers, each of which is connected to several data concentrators serving one local area and only sends summary or required integrated information to a central operation center. Using this distributed architecture, communication distance is much shortened, and thus data will be delivered more efficiently and reliably. In addition, such a distributed architecture can manage and analyze data locally, rather than backhauling all raw data to the central operation center, leading to reduced cost and burden on communication resources. Advanced metering infrastructure is chosen as an example to demonstrate benefits of this architecture on improving communication performance. The distributed communication architecture is also readily applicable to other smart grid applications, for example, demand response management systems.

Journal ArticleDOI
TL;DR: The architecture of DiploCloud, its main data structures, as well as the new algorithms the authors use to partition and distribute data are described, showing that the system is often two orders of magnitude faster than state-of-the-art systems on standard workloads.
Abstract: Despite recent advances in distributed RDF data management, processing large-amounts of RDF data in the cloud is still very challenging. In spite of its seemingly simple data model, RDF actually encodes rich and complex graphs mixing both instance and schema-level data. Sharding such data using classical techniques or partitioning the graph using traditional min-cut algorithms leads to very inefficient distributed operations and to a high number of joins. In this paper, we describe DiploCloud, an efficient and scalable distributed RDF data management system for the cloud. Contrary to previous approaches, DiploCloud runs a physiological analysis of both instance and schema information prior to partitioning the data. In this paper, we describe the architecture of DiploCloud, its main data structures, as well as the new algorithms we use to partition and distribute data. We also present an extensive evaluation of DiploCloud showing that our system is often two orders of magnitude faster than state-of-the-art systems on standard workloads.

Proceedings ArticleDOI
26 Jun 2016
TL;DR: Results of an extensive experimental evaluation show that the LEAP-based engines are superior over H-Store by a wide margin, especially for workloads that exhibit locality-based data accesses.
Abstract: Shared-nothing architecture has been widely used in distributed databases to achieve good scalability. While it offers superior performance for local transactions, the overhead of processing distributed transactions can degrade the system performance significantly. The key contributor to the degradation is the expensive two-phase commit (2PC) protocol used to ensure atomic commitment of distributed transactions. In this paper, we propose a transaction management scheme called LEAP to avoid the 2PC protocol within distributed transaction processing. Instead of processing a distributed transaction across multiple nodes, LEAP converts the distributed transaction into a local transaction. This benefits the processing locality and facilitates adaptive data repartitioning when there is a change in data access pattern. Based on LEAP, we develop an online transaction processing (OLTP) system, L-Store, and compare it with the state-of-the-art distributed in-memory OLTP system, H-Store, which relies on the 2PC protocol for distributed transaction processing, and H^L-Store, a H-Store that has been modified to make use of LEAP. Results of an extensive experimental evaluation show that our LEAP-based engines are superior over H-Store by a wide margin, especially for workloads that exhibit locality-based data accesses.

Journal ArticleDOI
TL;DR: The proposed ROSCC can be applied to enhance EOD sharing in cloud computing context, so as to achieve soil moisture mapping via the modified perpendicular drought index in an efficient way to better serve precision agriculture.
Abstract: The inversion of remote sensing images is crucial for soil moisture mapping in precision agriculture. However, the large size of remote sensing images complicates their management. Therefore, this study proposes a remote sensing observation sharing method based on cloud computing (ROSCC) to enhance remote sensing observation storage, processing, and service capability. The ROSCC framework consists of a cloud computing-enabled sensor observation service, web processing service tier, and a distributed database tier. Using MongoDB as the distributed database and Apache Hadoop as the cloud computing service, this study achieves a high-throughput method for remote sensing observation storage and distribution. The map, reduced algorithms and the table structure design in distributed databases are then explained. Along the Yangtze River, the longest river in China, Hubei Province was selected as the study area to test the proposed framework. Using GF-1 as a data source, an experiment was performed to enhance earth observation data (EOD) storage and achieve large-scale soil moisture mapping. The proposed ROSCC can be applied to enhance EOD sharing in cloud computing context, so as to achieve soil moisture mapping via the modified perpendicular drought index in an efficient way to better serve precision agriculture.

Journal ArticleDOI
TL;DR: This paper presents Map- Reduce implementations of two well-known process mining algorithms to take advantage of the scalability of the Map-Reduce approach and presents the design of a series of mappers and reducers to compute the log-based ordering relations from distributed event logs.
Abstract: Process discovery is an approach to extract process models from event logs. Given the distributed nature of modern information systems, event logs are likely to be distributed across different physical machines. Map-Reduce is a scalable approach for efficient computations on distributed data. In this paper we present Map-Reduce implementations of two well-known process mining algorithms to take advantage of the scalability of the Map-Reduce approach. We present the design of a series of mappers and reducers to compute the log-based ordering relations from distributed event logs. These can then be used to discover a process model. We provide experimental results that show the performance and scalability of our implementations.

Journal ArticleDOI
TL;DR: Theoretical analysis shows that the proposed set of secure building blocks and outsourced collaborative kNN protocol not only preserves the privacy of distributed databases and kNN query but also hides access patterns in the semi-honest model.
Abstract: With the advent of big data era, clients lack of computational and storage resources tends to outsource data mining tasks to cloud computing providers in order to improve efficiency and save costs. Generally, different clients choose different cloud companies for the sake of security, business cooperation, location, and so on. However, due to the rise of privacy leakage issues, the data contributed by clients should be encrypted under their own keys. This paper focuses on privacy-preserving k-nearest neighbor (kNN) computation over the databases distributed among multiple cloud environments. Unfortunately, existing secure outsourcing protocols are either restricted to a single key setting or quite inefficient because of frequent client-to-server interactions, making it impractical for wide application. To address these issues, we propose a set of secure building blocks and outsourced collaborative kNN protocol. Theoretical analysis shows that our scheme not only preserves the privacy of distributed databases and kNN query but also hides access patterns in the semi-honest model. Experimental evaluation demonstrates its significant efficiency improvements compared with existing methods.

Proceedings ArticleDOI
23 May 2016
TL;DR: This paper presents parallel and highly optimized kd*tree based KNN algorithms (both construction and querying) suitable for distributed architectures and outperforms earlier implementations by more than order of magnitude, thereby radically improving the applicability of the implementation to state-of-the-art Big Data analytics problems.
Abstract: Computing k-Nearest Neighbors (KNN) is oneof the core kernels used in many machine learning, datamining and scientific computing applications. Although kd-treebased O(log n) algorithms have been proposed for computingKNN, due to its inherent sequentiality, linear algorithms arebeing used in practice. This limits the applicability of suchmethods to millions of data points, with limited scalabilityfor Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kdtreebased KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includesnovel approaches for pruning search space and improvingload balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we showthat our implementation can construct kd-tree of 189 billionparticles in 48 seconds on utilizing ~50,000 cores. We alsodemonstrate computation of KNN of 19 billion queries in12 seconds. We demonstrate almost linear speedup both forshared and distributed memory computers. Our algorithmsoutperforms earlier implementations by more than order ofmagnitude, thereby radically improving the applicability of ourimplementation to state-of-the-art Big Data analytics problems.

Patent
21 Dec 2016
TL;DR: In this article, a distributed database at a first compute device configured to be included within a set of compute devices that implement the distributed database is described, and a processor is configured to define a first event linked to a first set of events.
Abstract: In some embodiments, an apparatus includes an instance of a distributed database at a first compute device configured to be included within a set of compute devices that implement the distributed database. The apparatus also includes a processor configured to define a first event linked to a first set of events. The processor is configured to receive, from a second compute device from the set of compute devices, a signal representing a second event (1) defined by the second compute device and (2) linked to a second set of events. The processor is configured to identify an order associated with a third set of events based at least one a result of a protocol. The processor is configured to store in the instance of the distributed database the order associated with the third set of events.

Journal ArticleDOI
01 Jan 2016
TL;DR: OverFlow is introduced, a uniform data management system for scientific workflows running across geographically distributed sites, aiming to reap economic benefits from this geo-diversity, as it monitors and models the global cloud infrastructure, offering high and predictable data handling performance for transfer cost and time, within and across sites.
Abstract: The global deployment of cloud datacenters is enabling large scale scientific workflows to improve performance and deliver fast responses. This unprecedented geographical distribution of the computation is doubled by an increase in the scale of the data handled by such applications, bringing new challenges related to the efficient data management across sites. High throughput, low latencies or cost-related trade-offs are just a few concerns for both cloud providers and users when it comes to handling data across datacenters. Existing solutions are limited to cloud-provided storage, which offers low performance based on rigid cost schemes. In turn, workflow engines need to improvise substitutes, achieving performance at the cost of complex system configurations, maintenance overheads, reduced reliability and reusability. In this paper, we introduce OverFlow, a uniform data management system for scientific workflows running across geographically distributed sites, aiming to reap economic benefits from this geo-diversity. Our solution is environment-aware, as it monitors and models the global cloud infrastructure, offering high and predictable data handling performance for transfer cost and time, within and across sites. OverFlow proposes a set of pluggable services, grouped in a data scientist cloud kit. They provide the applications with the possibility to monitor the underlying infrastructure, to exploit smart data compression, deduplication and geo-replication, to evaluate data management costs, to set a tradeoff between money and time, and optimize the transfer strategy accordingly. The system was validated on the Microsoft Azure cloud across its 6 EU and US datacenters. The experiments were conducted on hundreds of nodes using synthetic benchmarks and real-life bio-informatics applications (A-Brain, BLAST). The results show that our system is able to model accurately the cloud performance and to leverage this for efficient data dissemination, being able to reduce the monetary costs and transfer time by up to three times.

Posted Content
TL;DR: In this article, the authors present the design of a scalable database system NAM-DB and show that distributed transactions with the very common Snapshot Isolation guarantee can indeed scale using the next generation of RDMA-enabled network technology without any inherent bottlenecks.
Abstract: The common wisdom is that distributed transactions do not scale. But what if distributed transactions could be made scalable using the next generation of networks and a redesign of distributed databases? There would be no need for developers anymore to worry about co-partitioning schemes to achieve decent performance. Application development would become easier as data placement would no longer determine how scalable an application is. Hardware provisioning would be simplified as the system administrator can expect a linear scale-out when adding more machines rather than some complex sub-linear function, which is highly application specific. In this paper, we present the design of our novel scalable database system NAM-DB and show that distributed transactions with the very common Snapshot Isolation guarantee can indeed scale using the next generation of RDMA-enabled network technology without any inherent bottlenecks. Our experiments with the TPC-C benchmark show that our system scales linearly to over 6.5 million new-order (14.5 million total) distributed transactions per second on 56 machines.

Journal ArticleDOI
TL;DR: A novel collaboration- and fairness-aware big data management problem in distributed cloud environments that aims to maximize the system throughout, while minimizing the operational cost of service providers to achieve the system throughput, subject to resource capacity and user fairness constraints is studied.
Abstract: With the advancement of information and communication technology, data are being generated at an exponential rate via various instruments and collected at an unprecedented scale. Such large volume of data generated is referred to as big data, which now are revolutionizing all aspects of our life ranging from enterprises to individuals, from science communities to governments, as they exhibit great potentials to improve efficiency of enterprises and the quality of life. To obtain nontrivial patterns and derive valuable information from big data, a fundamental problem is how to properly place the collected data by different users to distributed clouds and to efficiently analyze the collected data to save user costs in data storage and processing, particularly the cost savings of users who share data. By doing so, it needs the close collaborations among the users, by sharing and utilizing the big data in distributed clouds due to the complexity and volume of big data. Since computing, storage and bandwidth resources in a distributed cloud usually are limited, and such resource provisioning typically is expensive, the collaborative users require to make use of the resources fairly. In this paper, we study a novel collaboration- and fairness-aware big data management problem in distributed cloud environments that aims to maximize the system throughout, while minimizing the operational cost of service providers to achieve the system throughput, subject to resource capacity and user fairness constraints. We first propose a novel optimization framework for the problem. We then devise a fast yet scalable approximation algorithm based on the built optimization framework. We also analyze the time complexity and approximation ratio of the proposed algorithm. We finally conduct experiments by simulations to evaluate the performance of the proposed algorithm. Experimental results demonstrate that the proposed algorithm is promising, and outperforms other heuristics.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: In this paper, the tradeoff between storage (per worker) and worst-case communication overhead for the data shuffling problem is studied, and it is shown that increasing the storage across workers can reduce the communication overhead by leveraging coding.
Abstract: Data shuffling is one of the fundamental building blocks for distributed learning algorithms, that increases the statistical gain for each step of the learning process. In each iteration, different shuffled data points are assigned by a central node to a distributed set of workers to perform local computation, which leads to communication bottlenecks. The focus of this paper is on formalizing and understanding the fundamental information-theoretic tradeoff between storage (per worker) and the worst-case communication overhead for the data shuffling problem. We completely characterize the information theoretic tradeoff for K = 2, and K = 3 workers, for any value of storage capacity, and show that increasing the storage across workers can reduce the communication overhead by leveraging coding. We propose a novel and systematic data delivery and storage update strategy for each data shuffle iteration, which preserves the structural properties of the storage across the workers, and aids in minimizing the communication overhead in subsequent data shuffling iterations.

Journal ArticleDOI
01 Sep 2016
TL;DR: The schemes for the efficient clustering and data partitioning for the automatic scale out of processing across multiple nodes and for optimizing the usage of CPUs, DRAM, SSDs and networks to efficiently scale up performance on one node are described.
Abstract: In this paper, we describe the solutions developed to address key technical challenges encountered while building a distributed database system that can smoothly handle demanding real-time workloads and provide a high level of fault tolerance. Specifically, we describe schemes for the efficient clustering and data partitioning for the automatic scale out of processing across multiple nodes and for optimizing the usage of CPUs, DRAM, SSDs and networks to efficiently scale up performance on one node.The techniques described here were used to develop Aerospike (formerly Citrusleaf), a high performance distributed database system built to handle the needs of today's interactive online services. Most real-time decision systems that use Aerospike require very high scale and need to make decisions within a strict SLA by reading from, and writing to, a database containing billions of data items at a rate of millions of operations per second with sub-millisecond latency. For over five years, Aerospike has been continuously used in over a hundred successful production deployments, as many enterprises have discovered that it can substantially enhance their user experience.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This paper proposes a complete solution called AutoReplica — a replica manager in distributed caching and data processing systems with SSD-HDD tier storages, and proposes the a migrate-on-write technique called “fusion cache” to seamlessly migrate and prefetch among local and remote replicas without pausing the subsystem.
Abstract: Nowadays, replication technique is widely used in data center storage systems for large scale Cyber-physical Systems (CPS) to prevent data loss. However, side-effect of replication is mainly the overhead of extra network and I/O traffics, which inevitably downgrades the overall I/O performance of the cluster. To effectively balance the trade-off between I/O performance and fault tolerance, in this paper, we propose a complete solution called “AutoReplica” — a replica manager in distributed caching and data processing systems with SSD-HDD tier storages. In detail, AutoReplica utilizes the remote SSDs (connected by high speed fibers) to replicate local SSD caches to protect data. In order to conduct load balancing among nodes and reduce the network overhead, we propose three approaches (i.e., ring, network, and multiple-SLA network) to automatically setup the cross-node replica structure with the consideration of network traffic, I/O speed and SLAs. To improve the performance during migrations triggered by load balance and failure recovery, we propose the a migrate-on-write technique called “fusion cache” to seamlessly migrate and prefetch among local and remote replicas without pausing the subsystem. Moreover, AutoReplica can also recover from different failure scenarios, while limits the performance downgrading degree. Lastly, AutoReplica supports parallel prefetching from multiple nodes with a new dynamic optimizing streaming technique to improve I/O performance. We are currently in the process of implementing AutoReplica to be easily plugged into commonly used distributed caching systems, and solidifying our design and implementation details.

Proceedings ArticleDOI
23 Aug 2016
TL;DR: This work proposes a novel privacy-preserving k-means algorithm based on a simple yet secure and efficient multiparty additive scheme that is cryptography-free and designed for horizontally partitioned data.
Abstract: Recent advances in sensing and storing technologies have led to big data age where a huge amount of data are distributed across sites to be stored and analysed. Indeed, cluster analysis is one of the data mining tasks that aims to discover patterns and knowledge through different algorithmic techniques such as k-means. Nevertheless, running k-means over distributed big data stores has given rise to serious privacy issues. Accordingly, many proposed works attempted to tackle this concern using cryptographic protocols. However, these cryptographic solutions introduced performance degradation issues in analysis tasks which does not meet big data properties. In this work, we propose a novel privacy-preserving k-means algorithm based on a simple yet secure and efficient multiparty additive scheme that is cryptography-free. We designed our solution for horizontally partitioned data. Moreover, we demonstrate that our scheme resists against adversaries passive model.