scispace - formally typeset
Search or ask a question
Author

Ju Wang

Bio: Ju Wang is an academic researcher from Microsoft. The author has contributed to research in topics: Virtual machine & Scalability. The author has an hindex of 20, co-authored 44 publications receiving 1944 citations. Previous affiliations of Ju Wang include University of California, San Diego.

Papers
More filters
Proceedings ArticleDOI
23 Oct 2011
TL;DR: The WAS architecture, global namespace, and data model is described, as well as its resource provisioning, load balancing, and replication systems.
Abstract: Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time. WAS customers have access to their data from anywhere at any time and only pay for what they use and store. In WAS, data is stored durably using both local and geographic replication to facilitate disaster recovery. Currently, WAS storage comes in the form of Blobs (files), Tables (structured storage), and Queues (message delivery). In this paper, we describe the WAS architecture, global namespace, and data model, as well as its resource provisioning, load balancing, and replication systems.

871 citations

Patent
23 Oct 2009
TL;DR: In this article, a table is partitioned into a number of partitions, each partition including a contiguous range of rows, and the partitions are served by table servers and managed by a table master.
Abstract: Partition management for a scalable, structured storage system is provided. The storage system provides storage represented by one or more tables, each of which includes rows that represent data entities. A table is partitioned into a number of partitions, each partition including a contiguous range of rows. The partitions are served by table servers and managed by a table master. Load distribution information for the table servers and partitions is tracked, and the table master determines to split and/or merge partitions based on the load distribution information.

126 citations

Patent
28 Dec 2012
TL;DR: In this paper, a platform as a service (PaaS) is used to provide resources by way of a distributed computing environment to perform a job, where the system may be comprised of a number of components, such as a task machine, a task location service machine, and a high level location service machines.
Abstract: In various embodiments, systems and methods are presented for providing resources by way of a platform as a service in a distributed computing environment to perform a job. The system may be comprised of a number of components, such as a task machine, a task location service machine, and a high-level location service machines that in combination are useable to accomplish functions provided herein. It is contemplated that the system performs methods for providing resources by determining resources of the system, such as virtual machines, and applying auto-scaling rules to the system to scale those resources. Based on the determination of the auto-scaling rules, the resources may be allocated to achieve a desired result.

109 citations

Patent
07 Jan 2013
TL;DR: In this article, the authors present a platform as a service (PaaS) based approach for providing resources by way of a platform-as-a-service in a distributed computing environment to perform a job.
Abstract: Systems and methods are presented for providing resources by way of a platform as a service in a distributed computing environment to perform a job. Resources of the system, job performing on the system, and schedulers of the jobs performing on the system are decoupled in a manner that allows a job to easily migrate among resources. It is contemplated that the migration of jobs from a first pool of resource to a second pool of resource is performed by the system without human intervention. The migration of a job may utilize different schedulers for the different resources. Further, it is contemplated that a pool of resources may automatically allocate additional or fewer resources in response to a migration of a job.

96 citations

Patent
09 Jan 2012
TL;DR: In this paper, the authors present a system for providing resources by way of a platform as a service (PaaS) in a distributed computing environment to perform a job, where a user may submit a work item to the system that results in a job being processed on a pool of virtual machines.
Abstract: Systems and methods are presented for providing resources by way of a platform as a service in a distributed computing environment to perform a job. A user may submit a work item to the system that results in a job being processed on a pool of virtual machines. The pool may be automatically established by the system in response to the work item and other information associated with the work item, the user, and/or the account. Further, it is contemplated that resources associated with the pool, such as virtual machines, may be automatically allocated based, at least in part, on information associated with the work item, the user, the account, the pool, and/or the system.

91 citations


Cited by
More filters
Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Proceedings Article
13 Jun 2012
TL;DR: This paper describes how LRC is used in WAS to provide low overhead durable storage with consistently low read latencies, and introduces a new set of codes for erasure coding called Local Reconstruction Codes (LRC).
Abstract: Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time WAS customers have access to their data from anywhere, at any time, and only pay for what they use and store To provide durability for that data and to keep the cost of storage low, WAS uses erasure coding In this paper we introduce a new set of codes for erasure coding called Local Reconstruction Codes (LRC) LRC reduces the number of erasure coding fragments that need to be read when reconstructing data fragments that are offline, while still keeping the storage overhead low The important benefits of LRC are that it reduces the bandwidth and I/Os required for repair reads over prior codes, while still allowing a significant reduction in storage overhead We describe how LRC is used in WAS to provide low overhead durable storage with consistently low read latencies

1,002 citations

Proceedings ArticleDOI
03 Nov 2013
TL;DR: It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining.
Abstract: Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, applications that require all three have relied on multiple platforms, at the expense of efficiency, maintainability, and simplicity. Naiad resolves the complexities of combining these features in one framework.A new computational model, timely dataflow, underlies Naiad and captures opportunities for parallelism across a wide class of algorithms. This model enriches dataflow computation with timestamps that represent logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism.We show that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining. Naiad outperforms specialized systems in their target application domains, and its unique features enable the development of new high-performance applications.

779 citations

Journal ArticleDOI
TL;DR: This paper discusses approaches and environments for carrying out analytics on Clouds for Big Data applications, and identifies possible gaps in technology and provides recommendations for the research community on future directions on Cloud-supported Big Data computing and analytics solutions.

773 citations

Journal ArticleDOI
01 Mar 2013
TL;DR: In this article, the authors present a family of erasure codes that are efficient repairable and offer higher reliability compared to Reed-Solomon codes, which is the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability.
Abstract: Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability.This paper shows how to overcome this limitation. We present a novel family of erasure codes that are efficiently repairable and offer higher reliability compared to Reed-Solomon codes. We show analytically that our codes are optimal on a recently identified tradeoff between locality and minimum distance.We implement our new codes in Hadoop HDFS and compare to a currently deployed HDFS module that uses Reed-Solomon codes. Our modified HDFS implementation shows a reduction of approximately 2× on the repair disk I/O and repair network traffic. The disadvantage of the new coding scheme is that it requires 14% more storage compared to Reed-Solomon codes, an overhead shown to be information theoretically optimal to obtain locality. Because the new codes repair failures faster, this provides higher reliability, which is orders of magnitude higher compared to replication.

742 citations