scispace - formally typeset
Open AccessProceedings Article

A Scalable Data Platform for a Large Number of Small Applications.

TLDR
This work explores a new point in the design space whereby commodity hardware and free software are used to scale to a large number of applications while still supporting full SQL functionality, transactional guarantees, high availability and Service Level Agreements (SLAs).
Abstract
As a growing number of websites open up their APIs to ex- ternal application developers (e.g., Facebook, Yahoo! Wid- gets, Google Gadgets), these websites are facing an intrigu- ing scalability problem: while each user-generated applica- tion is by itself quite small (in terms of size and through- put requirements), there are many many such applications. Unfortunately, existing data-management solutions are not designed to handle this form of scalability in a cost-effective, manageable and/or flexible manner. For instance, large in- stallations of commercial database systems such as Oracle, DB2 and SQL Server are usually very expensive and diffi- cult to manage. At the other extreme, low-cost hosted data- management solutions such as Amazon's SimpleDB do not support sophisticated data-manipulation primitives such as joins that are necessary for developing most Web applica- tions. To address this issue, we explore a new point in the design space whereby we use commodity hardware and free software (MySQL) to scale to a large number of applications while still supporting full SQL functionality, transactional guarantees, high availability and Service Level Agreements (SLAs). We do so by exploiting the key property that each application is "small" and can fit in a single machine (which can possibly be shared with other applications). Using this property, we design replication strategies, data migration techniques and load balancing operations that automate the tasks that would otherwise contribute to the operational and management complexity of dealing with a large number of applications. Our experiments based on the TPC-W bench- mark suggest that the proposed system can scale to a large number of small applications.

read more

Citations
More filters
Proceedings Article

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

TL;DR: Megastore provides fully serializable ACID semantics within ne-grained partitions of data, which allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.
Proceedings ArticleDOI

Big data and cloud computing: current state and future opportunities

TL;DR: This tutorial presents an organized picture of the challenges faced by application developers and DBMS designers in developing and deploying internet scale applications, and crystallizes the design choices made by some successful systems large scale database management systems, analyze the application demands and access patterns, and enumerate the desiderata for a cloud-bound DBMS.
Proceedings ArticleDOI

Mirror mirror on the ceiling: flexible wireless links for data centers

TL;DR: 3D beamforming is proposed and evaluated, where 60 GHz signals bounce off data center ceilings, thus establishing indirect line-of-sight between any two racks in a data center, thus improving link range and number of concurrent transmissions in the data center.
Proceedings ArticleDOI

G-Store: a scalable data store for transactional multi key access in the cloud

TL;DR: G-Store is designed and implemented which uses a key-value store as an underlying substrate to provide efficient, scalable, and transactional multi key access, and preserves the desired properties of key- Value stores.
Proceedings ArticleDOI

Zephyr: live migration in shared nothing databases for elastic cloud platforms

TL;DR: Zephyr is proposed, a technique to efficiently migrate a live database in a shared nothing transactional database architecture that uses phases of on-demand pull and asynchronous push of data, requires minimal synchronization, and provides ACID guarantees during migration and ensures correctness in the presence of failures.
References
More filters
Proceedings ArticleDOI

Chord: A scalable peer-to-peer lookup service for internet applications

TL;DR: Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.
Book ChapterDOI

Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems

TL;DR: Pastry as mentioned in this paper is a scalable, distributed object location and routing substrate for wide-area peer-to-peer ap- plications, which performs application-level routing and object location in a po- tentially very large overlay network of nodes connected via the Internet.
Proceedings ArticleDOI

A scalable content-addressable network

TL;DR: The concept of a Content-Addressable Network (CAN) as a distributed infrastructure that provides hash table-like functionality on Internet-like scales is introduced and its scalability, robustness and low-latency properties are demonstrated through simulation.
Proceedings Article

Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!).

TL;DR: Bigtable as mentioned in this paper is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers, including web indexing, Google Earth and Google Finance.
Book

Concurrency Control and Recovery in Database Systems

TL;DR: In this article, the design and implementation of concurrency control and recovery mechanisms for transaction management in centralized and distributed database systems is described. But this can lead to interference between queries and updates.
Related Papers (5)