scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Network Coding for Distributed Storage Systems

01 Sep 2010-IEEE Transactions on Information Theory (Institute of Electrical and Electronics Engineers Inc.)-Vol. 56, Iss: 9, pp 4539-4551
TL;DR: It is shown that there is a fundamental tradeoff between storage and repair bandwidth which is theoretically characterize using flow arguments on an appropriately constructed graph and regenerating codes are introduced that can achieve any point in this optimal tradeoff.
Abstract: Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, requires less redundancy than simple replication for the same level of reliability. However, since fragments must be periodically replaced as nodes fail, a key question is how to generate encoded fragments in a distributed way while transferring as little data as possible across the network. For an erasure coded system, a common practice to repair from a single node failure is for a new node to reconstruct the whole encoded data object to generate just one encoded block. We show that this procedure is sub-optimal. We introduce the notion of regenerating codes, which allow a new node to communicate functions of the stored data from the surviving nodes. We show that regenerating codes can significantly reduce the repair bandwidth. Further, we show that there is a fundamental tradeoff between storage and repair bandwidth which we theoretically characterize using flow arguments on an appropriately constructed graph. By invoking constructive results in network coding, we introduce regenerating codes that can achieve any point in this optimal tradeoff.
Citations
More filters
Journal ArticleDOI
TL;DR: A comprehensive review of the domain of physical layer security in multiuser wireless networks, with an overview of the foundations dating back to the pioneering work of Shannon and Wyner on information-theoretic security and observations on potential research directions in this area.
Abstract: This paper provides a comprehensive review of the domain of physical layer security in multiuser wireless networks. The essential premise of physical layer security is to enable the exchange of confidential messages over a wireless medium in the presence of unauthorized eavesdroppers, without relying on higher-layer encryption. This can be achieved primarily in two ways: without the need for a secret key by intelligently designing transmit coding strategies, or by exploiting the wireless communication medium to develop secret keys over public channels. The survey begins with an overview of the foundations dating back to the pioneering work of Shannon and Wyner on information-theoretic security. We then describe the evolution of secure transmission strategies from point-to-point channels to multiple-antenna systems, followed by generalizations to multiuser broadcast, multiple-access, interference, and relay networks. Secret-key generation and establishment protocols based on physical layer mechanisms are subsequently covered. Approaches for secrecy based on channel coding design are then examined, along with a description of inter-disciplinary approaches based on game theory and stochastic geometry. The associated problem of physical layer message authentication is also briefly introduced. The survey concludes with observations on potential research directions in this area.

1,294 citations

Proceedings Article
13 Jun 2012
TL;DR: This paper describes how LRC is used in WAS to provide low overhead durable storage with consistently low read latencies, and introduces a new set of codes for erasure coding called Local Reconstruction Codes (LRC).
Abstract: Windows Azure Storage (WAS) is a cloud storage system that provides customers the ability to store seemingly limitless amounts of data for any duration of time WAS customers have access to their data from anywhere, at any time, and only pay for what they use and store To provide durability for that data and to keep the cost of storage low, WAS uses erasure coding In this paper we introduce a new set of codes for erasure coding called Local Reconstruction Codes (LRC) LRC reduces the number of erasure coding fragments that need to be read when reconstructing data fragments that are offline, while still keeping the storage overhead low The important benefits of LRC are that it reduces the bandwidth and I/Os required for repair reads over prior codes, while still allowing a significant reduction in storage overhead We describe how LRC is used in WAS to provide low overhead durable storage with consistently low read latencies

1,002 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that there is a tradeoff between having good locality and the ability to correct erasures beyond the minimum distance for linear [n,k,d]q codes.
Abstract: Consider a linear [n,k,d]q code C. We say that the ith coordinate of C has locality r , if the value at this coordinate can be recovered from accessing some other r coordinates of C. Data storage applications require codes with small redundancy, low locality for information coordinates, large distance, and low locality for parity coordinates. In this paper, we carry out an in-depth study of the relations between these parameters. We establish a tight bound for the redundancy n-k in terms of the message length, the distance, and the locality of information coordinates. We refer to codes attaining the bound as optimal. We prove some structure theorems about optimal codes, which are particularly strong for small distances. This gives a fairly complete picture of the tradeoffs between codewords length, worst case distance, and locality of information symbols. We then consider the locality of parity check symbols and erasure correction beyond worst case distance for optimal codes. Using our structure theorem, we obtain a tight bound for the locality of parity symbols possible in such codes for a broad class of parameter settings. We prove that there is a tradeoff between having good locality and the ability to correct erasures beyond the minimum distance.

793 citations

Journal ArticleDOI
01 Mar 2013
TL;DR: In this article, the authors present a family of erasure codes that are efficient repairable and offer higher reliability compared to Reed-Solomon codes, which is the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability.
Abstract: Distributed storage systems for large clusters typically use replication to provide reliability. Recently, erasure codes have been used to reduce the large storage overhead of three-replicated systems. Reed-Solomon codes are the standard design choice and their high repair cost is often considered an unavoidable price to pay for high storage efficiency and high reliability.This paper shows how to overcome this limitation. We present a novel family of erasure codes that are efficiently repairable and offer higher reliability compared to Reed-Solomon codes. We show analytically that our codes are optimal on a recently identified tradeoff between locality and minimum distance.We implement our new codes in Hadoop HDFS and compare to a currently deployed HDFS module that uses Reed-Solomon codes. Our modified HDFS implementation shows a reduction of approximately 2× on the repair disk I/O and repair network traffic. The disadvantage of the new coding scheme is that it requires 14% more storage compared to Reed-Solomon codes, an overhead shown to be information theoretically optimal to obtain locality. Because the new codes repair failures faster, this provides higher reliability, which is orders of magnitude higher compared to replication.

742 citations

Journal ArticleDOI
04 Feb 2011
TL;DR: In this paper, the authors provide an overview of the research results on network coding for distributed storage systems and provide a comparison between erasure codes and network coding techniques, showing that maintenance bandwidth can be reduced by orders of magnitude compared to standard erasure code.
Abstract: Distributed storage systems often introduce redundancy to increase reliability. When coding is used, the repair problem arises: if a node storing encoded information fails, in order to maintain the same level of reliability we need to create encoded information at a new node. This amounts to a partial recovery of the code, whereas conventional erasure coding focuses on the complete recovery of the information from a subset of encoded packets. The consideration of the repair network traffic gives rise to new design challenges. Recently, network coding techniques have been instrumental in addressing these challenges, establishing that maintenance bandwidth can be reduced by orders of magnitude compared to standard erasure codes. This paper provides an overview of the research results on this topic.

738 citations

References
More filters
Journal ArticleDOI
TL;DR: This work reveals that it is in general not optimal to regard the information to be multicast as a "fluid" which can simply be routed or replicated, and by employing coding at the nodes, which the work refers to as network coding, bandwidth can in general be saved.
Abstract: We introduce a new class of problems called network information flow which is inspired by computer network applications. Consider a point-to-point communication network on which a number of information sources are to be multicast to certain sets of destinations. We assume that the information sources are mutually independent. The problem is to characterize the admissible coding rate region. This model subsumes all previously studied models along the same line. We study the problem with one information source, and we have obtained a simple characterization of the admissible coding rate region. Our result can be regarded as the max-flow min-cut theorem for network information flow. Contrary to one's intuition, our work reveals that it is in general not optimal to regard the information to be multicast as a "fluid" which can simply be routed or replicated. Rather, by employing coding at the nodes, which we refer to as network coding, bandwidth can in general be saved. This finding may have significant impact on future design of switching systems.

8,533 citations


"Network Coding for Distributed Stor..." refers result in this paper

  • ...As shown by the pioneering work of Ahlswede et al. [ 19 ], network coding can achieve the cut-set bound throughput for the multicasting case....

    [...]

Journal ArticleDOI
TL;DR: This work forms this multicast problem and proves that linear coding suffices to achieve the optimum, which is the max-flow from the source to each receiving node.
Abstract: Consider a communication network in which certain source nodes multicast information to other nodes on the network in the multihop fashion where every node can pass on any of its received data to others. We are interested in how fast each node can receive the complete information, or equivalently, what the information rate arriving at each node is. Allowing a node to encode its received data before passing it on, the question involves optimization of the multicast mechanisms at the nodes. Among the simplest coding schemes is linear coding, which regards a block of data as a vector over a certain base field and allows a node to apply a linear transformation to a vector before passing it on. We formulate this multicast problem and prove that linear coding suffices to achieve the optimum, which is the max-flow from the source to each receiving node.

3,660 citations


"Network Coding for Distributed Stor..." refers background in this paper

  • ...Subsequent work [ 20 ], [21] showed that linear network codes suffice for the multicasting problem....

    [...]

Proceedings Article
16 Nov 2002
TL;DR: LT codes are introduced, the first rateless erasure codes that are very efficient as the data length grows, and are based on EMMARM code, which was introduced in version 2.0.
Abstract: We introduce LT codes, the first rateless erasure codes that are very efficient as the data length grows.

2,970 citations

Journal ArticleDOI
TL;DR: This work presents a distributed random linear network coding approach for transmission and compression of information in general multisource multicast networks, and shows that this approach can take advantage of redundant network capacity for improved success probability and robustness.
Abstract: We present a distributed random linear network coding approach for transmission and compression of information in general multisource multicast networks. Network nodes independently and randomly select linear mappings from inputs onto output links over some field. We show that this achieves capacity with probability exponentially approaching 1 with the code length. We also demonstrate that random linear coding performs compression when necessary in a network, generalizing error exponents for linear Slepian-Wolf coding in a natural way. Benefits of this approach are decentralized operation and robustness to network changes or link failures. We show that this approach can take advantage of redundant network capacity for improved success probability and robustness. We illustrate some potential advantages of random linear network coding over routing in two examples of practical scenarios: distributed network operation and networks with dynamically varying connections. Our derivation of these results also yields a new bound on required field size for centralized network coding on general multicast networks

2,806 citations


"Network Coding for Distributed Stor..." refers background in this paper

  • ...The studies by Ho et al. [ 22 ] and Sanders et al. [23] further showed that random linear network coding over a sufficiently large finite field can (asymptotically) achieve the multicast capacity....

    [...]

  • ...Further, simple random linear combinations will suffice with high probability as the field size over which coding is performed grows, as shown by Ho. et al. [ 22 ]....

    [...]

Journal ArticleDOI
TL;DR: For the multicast setup it is proved that there exist coding strategies that provide maximally robust networks and that do not require adaptation of the network interior to the failure pattern in question.
Abstract: We take a new look at the issue of network capacity. It is shown that network coding is an essential ingredient in achieving the capacity of a network. Building on recent work by Li et al.(see Proc. 2001 IEEE Int. Symp. Information Theory, p.102), who examined the network capacity of multicast networks, we extend the network coding framework to arbitrary networks and robust networking. For networks which are restricted to using linear network codes, we find necessary and sufficient conditions for the feasibility of any given set of connections over a given network. We also consider the problem of network recovery for nonergodic link failures. For the multicast setup we prove that there exist coding strategies that provide maximally robust networks and that do not require adaptation of the network interior to the failure pattern in question. The results are derived for both delay-free networks and networks with delays.

2,628 citations


"Network Coding for Distributed Stor..." refers background in this paper

  • ...Subsequent work [20], [ 21 ] showed that linear network codes suffice for the multicasting problem....

    [...]