How many k bytes does a leaf node send?

For each leaf node in the regeneration tree, the size of data it sends is Mk bytes, because the size of the data it stores is M k bytes.

How many pings are sent to a node?

In the trace file, a node is considered to be up at t time if and only if at least half pings in the batch of pings immediately prior to t are sent to the node successfully.

What is the probability of a file being available?

The authors measure the performance of regeneration schemes from three aspects: (i) regeneration time: how much time is spent from the start of a regeneration to the end; (ii) probability of the successful regeneration: the probability that a regeneration finishes successfully, not interrupted by the node departures; (iii) data availability: the probability that a file is available.

How does the proof of Lemma 1 show that the traffic on each edge is uniform?

Since the outgoing edges of all the providers have the same traffic on them, the authors can say that the traffic on each edge is uniform and is equal to Mk bytes.

What is the Y-axis of the regeneration scheme?

The Y-axis shows E(Gk), the available bandwidth capacity of the corresponding regeneration scheme in Gk. Because Estar(Gk) = (b−a)k+1 + a, the expected value of the available bandwidth capacity of the star-structured regeneration scheme decreases and converges to a, the lower bound of uniformly distribution U [a, b], with the increasing of k.

What is the data availability of the tree-structured regeneration scheme?

What’s more, when k ≥ 10, the data availability of the tree-structured regeneration scheme is always more than 90%, while the availability of the starstructured scheme is less than 60%.

What is the probability of MST(Gk) being i?

According to Lemma 5, the probability density function of ω(ei) isf(i:Mk+1)(x) = Mk+1!F Mk+1−i(x)(1− F (x))i−1f(x) (i− 1)!(Mk+1 − i)! . (7)Let E(i:Mk+1) be the expected value of ω(ei),E(i:Mk+1) = Z +∞ 0 xf(i,Mk+1)(x)dx. (8)Let p(k+1, i) be the probability that MST(Gk) = i.

how can tree-structured regeneration improve bandwidth capacity?

Their mathematical analysis shows that the tree-structured regeneration scheme can improve the available bandwidth capacity and the adaptability to the bandwidth heterogeneity, compared with the conventional star-structured regeneration scheme.

What is the probability density function of the distribution of the weight of the edge in E?

Assume the probability density function of the distribution of the weight of the edge in E is f(x) and F (x) is the cumulative distribution function.

How many times can the tree-structured regeneration scheme save time?

InFig. 5, the authors show that the tree-structured regeneration scheme can save regeneration time by at least 75% when k ≥ 4, and by 82% at most when k = 20.

(Open Access) Tree-structured data regeneration with network coding in distributed storage systems (2009) | Jun Li

Q: What are the contributions in "Tree-structured data regeneration with network coding in distributed storage systems" ?

However, previous regeneration schemes are all star-structured regeneration schemes, in which data are transferred directly from existing storage nodes, referred to as providers, to the newcomer, so the regeneration time is always limited by the path with the narrowest bandwidth between newcomer and provider, due to bandwidth heterogeneity. In this paper, the authors exploit the bandwidth between providers and propose a tree-structured regeneration scheme using linear network coding. In their scheme, data can be transferred from providers to the newcomer through a regeneration tree, defined as a spanning tree covering the newcomer and all the providers. In a regeneration tree, a provider can receive data from other providers, then encode the received data with the data this provider stores, and finally send the encoded data to another provider or to the newcomer. The authors prove that a maximum spanning tree is an optimal regeneration tree and analyze its performance.

Q: What is the incoming edge of a node?

The incoming edge of a node is the edge whose other endpoint is the child of this node, and the outgoing edge of a node is the edge whose other endpoint is its parent node.

Q: What is the expected value of the available bandwidth capacity of the tree-structured regeneration scheme?

The authors notice that when the bandwidth heterogeneity, i.e. the variance of the bandwidth distribution increases, the expected value of the available bandwidth capacity of the tree-structured regeneration scheme increases, but the expected value of the available bandwidth capacity of the star-structured regeneration scheme decreases.

Q: What is the definition of a maximum spanning tree?

In a weighted undirected graph, a minimum (maximum) spanning tree is a bottleneck spanning tree, i.e. the weight of whose largest (smallest) edge is the minimum (maximum) over all spanning trees in this graph.

Tree-structured Data Regeneration w ith Network

Coding in Distributed Storage Systems

Jun Li, Shuang Yang, Xin Wang, Xiangyang Xue

School of Computer Science

Fudan University, China

{0572222, 06300720227, xinw, xyxue}@fudan.edu.cn

Baochun Li

Department of Electrical and Computer Eng ineering

University of Toronto, Canada

bli@eecg.toronto.edu

Abstract—Distributed storage systems, built on peer-to-peer

networks, can provide large-scale data storage and high data

reliability by redundant schemes, such as replica, erasure codes

and linear network coding. Redundant data may get lost due to

the instability of distributed systems, such as permanent node

departures, hardware failures, and accidental deletions. In order

to maintain data availability, it is necessary to regenerate new

redundant data in another node, referred to as a newcomer.

Regeneration is expected to be ﬁnished as soon as possible,

because the regeneration time can inﬂuence the data reliability

and availability of distributed storage systems. It has been ac-

knowledged that linear network coding can regenerate redundant

data with less network trafﬁc than replica and erasure codes.

Howev er, previous regeneration schemes are all star-structured

regeneration schemes, in which data are transferred directly from

existing storage nodes, referred to as providers, to the newcomer,

so the regeneration time is always limited by the path with the

narrowest bandwidth between newcomer and provider, due to

bandwidth heterogeneity .

In this paper, we exploit the bandwidth between providers

and pr o pose a tree-structured regeneration scheme using linear

network coding. In our scheme, data can be transferred from

providers to the newcomer through a regeneration tree, de ﬁned

as a spanning tree covering the newcomer and all the providers.

In a regeneration tree, a provider can receive data from other

providers, then encode the received data with the data this

provider stores, and ﬁnally send the encoded data to another

provider or to the newcomer. We prove that a maximum spanning

tree is an optimal regeneration tree and analyze its performance.

In a trace-based simulation, the results show the tree-structured

scheme can reduce the regeneration time by 75%-82% and

improve data availability by 73%-124%.

Index Terms—Distributed Storage System, Linear Network

Coding, Maximum Spanning Tree.

I. INTRODUCTION

Distribu ted storage systems store data in a large number

of storage nodes, either in the context of data centers in

cloud computing systems, or in the context of peer-assisted

online storage system s e.g., [1]. Due to the inherent lack of

reliability caused by node departures and hard ware failures,

data may become temporarily or permanently unavailable in

such systems. Concerns about Quality of Service (QoS) in

storage systems hinge upon two aspects: the reliability and

availability of data. Data are reliable w hen data saved in the

distributed storage system are suf ﬁcient to recover the original

data. Data are available when there are enough active nodes in

the distributed storage system so that the original data can be

recovered at once. In order to provide high data reliability and

availability, distributed storage systems usually use redundant

data. The form s of redundant data include replica, erasure

codes and linear network coding.

Redundant data can provide higher availability because

there can be more active storage nodes for data recovery, when

there may be nodes temporarily unav ailable. Ho wever , when

data are lost permanently in the distributed storage system, the

number of storage nodes will decrease gradually. Therefore

it is necessary to regenerate new redundant data to maintain

data availability. Regeneration is the process that a node in the

distributed storage system, referred to as a newcomer, receiv es

data from active storage nodes, referred to as providers, and

ﬁnally becomes a new storage node, so that the lost redundant

data are regener ated.

To ensure data reliability and av ailability, we expect the

regenerate time to be as little as possible. The less time

regeneration costs, the more redundant data can be preserved

in the distributed storage system with data loss. The newcomer

or the provider may also leave the system even during the

regeneration process, so less regeneration time can result in

higher probability that the regeneration is ﬁnished before any

node (newcomer or provider) leaves the system. The simplest

way to reduce the regeneration time is to reduce the network

trafﬁc in the regeneration. Dimakis et al. [2] showed that linear

network coding can incur less regeneration trafﬁc and the

corresponding encoding scheme is given in [3].

To our knowled ge, previous regeneration schemes mainly

focused on how to generate redundant data to reduce the

regeneration trafﬁc, but the bandwidth capacity between nodes

has not been taken into account. In this paper, we propose

a tree-structured regeneration scheme based on linear net-

work coding from the perspec tive of bandwidth capacity.

Conventional regeneration is a star-structured scheme, i.e. the

newcomer downlo ads data directly from providers. Thus the

regeneration time is limited by the path between the newcomer

and the provider with the narrow est bandwidth, if the network

of the storage system suffers from bandwidth heterogeneity. In

our tree-structured scheme, we deﬁnearegenerationtreeasa

spanning tree covering the newcomer and all the providers.

In the regeneration tree, the child n od e sends data to its

parent node, and the parent node encodes the received data

with the data it stores and then sends the encoded data to its

parent node. If the transmission is pipelined, the bandwidth

bottleneck is the edge with the narrowest bandwidt h in the tree.

We prov e a maximum spanning tree is an optimal regeneration

tree.

In this paper, we present the tree-structured regeneration

scheme and analyze its performance mathematically. We ﬁrst

show ho w the tree-structured scheme regenerates redundant

data at the newcomer. Then we prove a maximum spanning

tree is an optimal regeneration tree. By analysis based on

probability theory and order statistics, we show our scheme

can reduce the regeneration time by improving the transmis-

sion rate, and can improve the adaptability to the bandwidth

heterogeneity, while not increasing the regeneration trafﬁc. We

evaluate our scheme by a trace-based simulation. The simula-

tion results show that our scheme can reduce regeneration time

by 75%-82% and improve data av ailability by 73%-124%.

The remainder of the paper is orga nized as follows. In

Section II we introduce the related work. We introduce some

basic concepts of distributed storage systems using linear

network coding and present the network model in Section III.

In Section IV, we present the tree-structu red regeneration

scheme and analyze its performance. We show the simulation

results in Section V. Finally, Section VI concludes this paper.

II. R

ELATED WORK

Many papers discussed how to improve data reliability from

the perspective of redundant data. The forms of redundant data

include replica, erasure codes and linear network coding. Some

distributed storage systems u se replica, such as BitVault [4].

In OceanStore [1], ho wever, the original data are encoded at

the source node by erasure codes. Lin et al. [5] investigated

and compared some decentralized replication a lgorithms for

improving ﬁle availability in P2P network. Compared with

replica, erasure codes pro vide higher data availability, because

in the storage systems using (n, k)−erasure codes, any k nodes

of n storage nodes are sufﬁcient to recover the original data.

However, erasure codes incur more storage space at the source

node than replica, when disseminating the encoded data [6].

What’s more, Rodrigues et al. [7] pointed out that in some

cases, the beneﬁts from erasure codes might not be worth its

disadvantages.

Ahlswede et al. [8] introduced the idea of network coding

that the intermediate nodes can encode the data they have

received and send out the encoded data. It has been proved

that network coding can utilize the network resource optimally.

Yang et al. [9] presented a ﬁle sharing scheme based on

network coding, which used the combination network as the

network topology . Taking (n, k)−linear network coding for

example, the data are divided into k blocks, F

, i =1, 2,...,k

and n encoded block s, B

,...,B

,n > k, are generated

as linear combinations of F

,...,F

on Galois Field F

where q is the size of the Galois Field. B

i=1

∈ F

, where (α

,α

,...,α

)

is a coefﬁcient vector,

j =1, 2,...,k. When a node wishes to access the original

data, it has to receive m encoded blocks, m ≥ k. Then

decoding becomes sol ving a linear system with k unknowns

and m equations. The m encoded blocks can be decoded if and

only if the linear system is solvable, i.e. k of the m coefﬁcient

vectors are linear independent. Random linear coding [10]

is a form of linear network coding, which encodes data

at the intermediate node linearly using randomly generated

coefﬁcient vector. If all the coefﬁcient vectors are random ly

generated, more than k encoded blocks may be required to

decode.However,whenq is large enough, any k encoded

blocks are sufﬁcient to decode with high probability [11].

Accendanski et al. [6] compared the performance of different

forms of redundant data, includin g replica, erasure codes and

random linear coding. They showed random linear coding

pro vided data availability no worse than erasure codes, but

saved storage cost at the source node when disseminating data

into the network.

For different forms of redundant data, the regeneration

mecha nisms are different. For replica, the newcomer only

needs to download one replica from one active storage node.

Chun et al. [12] proposed a Carbonite replication algorithm

to schedule the regeneration of ne w replica. For erasure codes

and linear network coding, every bit of new data is encoded

fromthedatastoredintheproviders,soitwillincurmore

network trafﬁc than replica. The simplest way is to recover

the original data from providers and encode the original data

into a new block. Duminuco et al. [13] proposed a n ew

class of erasure codes, aiming to achieve the tradeoff between

regeneration trafﬁc and data reliability. Dim akis et al. showed

that linear network cod ing can reduce the network trafﬁc

in the regen eration than erasure codes [2]. They proposed

Regeneration Codes, a new form of linear network coding,

which achie ved the optimal trad eoff between storage cost

and network trafﬁc. Wu et al. [3] showed further analysis

of the relation between storage cost and network trafﬁc, and

presented a construction method of Regeneration Codes.

Previous works mainly considered the form of redundant

data and tried to reduce the regeneration trafﬁc, but did not

take the bandwidth capacity between two nodes into account.

Lee et al. [14] proposed a bandwidth-aware routing scheme

in ove rlay networks, which measured bandwidth capacity

between hosts in the overlay networks and selected th e best

paths so as to bypass the problematic path in the networks. In

this paper, we will consider the bandwidth heterogeneity and

propose a tree-structured regeneration scheme to reduce the

regeneration time and hence to improve data availability. The

primary part of our work can be found in [15].

III. P

RELIMINARIES

A. Node and Redundant Data

A distributed storage system provides its service based on a

distributed network containing a large number of nodes, which

may play different roles in the system. A source node is a node

which sends data into other nodes, and a stora ge node is a node

which stores data for source nodes. In some distributed storage

systems, one node may function as a source node as well as a

storage node at the same time. When a source node wishes to

save data into the storage system, it generates redundant data

and sends them to one or more storage nodes. For replica, the

storage node stores one replica of the original ﬁle. For erasure

codes, the original ﬁle is divided into a number of blocks.

Redundant blocks are generated at the source node by erasure

codes, such as Reed-Solomon codes and fountain codes. Each

storage node stores one redundant block.

In a distributed storage system using linear network cod-

ing, each block saved in the distributed storage system is

generated by linear network coding. We take (n, k)−linear

network coding for example. The source node divides the

original data into k blocks, F

,...,F

, and encodes them

into n encoded blocks, B

,...,B

, which are all linear

combinations of F

,...,F

. The coefﬁcient vector of B

,...,a

)

∈ F

,j=1, 2,...,k,i=1, 2,...,n,

where q is the size of the Galois ﬁeld F

. Thus we can get

⎛

⎜

⎝

··· a

⎞

⎟

⎠

⎛

⎜

⎝

⎞

⎟

⎠

⎛

⎜

⎝

⎞

⎟

⎠

. (1)

The coefﬁcient vectors form an encoding matrix C,

C =

⎛

⎜

⎝

··· a

⎞

⎟

⎠

. (2)

A download node is a node which wishes to access data

saved in the distrib uted storage system. For replica, the down-

load node needs to download data from only one storage

node. For (n, k)−erasure codes or (n, k)−linear network

coding, the download node can recover data as soon as it has

received k redundant blocks or k linear independent encoded

blocks respectively. For linear network coding, we assume the

k encoded blocks are B

,...,B

, {B

,...,B

} ⊂

,...,B

}. Let C

be the encoding matrix formed

by the coefﬁcient vectors of B

,...,B

. Then decoding

becom es a linear transformation as follows:

⎛

⎜

⎝

⎞

⎟

⎠

= C

0−1

⎛

⎜

⎝

⎞

⎟

⎠

. (3)

If the coefﬁcient vectors are randomly generated, i.e. the

system uses random linear coding, the encoding matrix C

is non-singular with high probability when q is large enough

[11]. Conventionally q =2

, so the encoded block can be

genera ted byte by byte, and it is guaranteed with very high

probability that any k encoded blocks are sufﬁcient to decode.

B. Regeneration

In distribu ted storage systems, redundan t data can improve

data availability and provide data reliability. However, it

cannot guarantee data reliability and availab ility fore ver. Data

saved in a stora ge node may get lost due to accidental

deletions, hardware failures, or permanent node departures.

Therefor e if data loss is detected in the storage system by a

data loss detection mechanism, such as Carbonite algorithm

proposed in [12], the distributed storage system will generate

new redundant data and save them into another node, referred

to as a newcomer.

During the regeneration, the newcomer must receive data

from one or more existing storage nodes to become a new

storage node. We deﬁne providers as storage nodes providing

data for the ne wcomer in the regeneration. For replica, the

newcomer needs only one provider. For erasure codes, the

newcomer must recover the original data from providers and

then encode the original data into a new redundant block.

For (n, k)−linear network coding, th e newcomer also needs

to receive data from at least k providers. However, the

newcomer can directly generate a new encoded block. We

assume there are k providers. The k encoded blocks they

store are B

,...,B

. The encoding matrix formed by

the coefﬁcient vectors of B

,...,B

is C

. Similar to

decoding, generating a new encoded block is also a linear

transforma tion, if C

is non-singular. Let the coefﬁcient vector

of the new encoded block is (σ

,σ

,...,σ

)

,σ

∈ F

j =1, 2,...,k. We assume that

(σ

··· σ

)

⎛

⎜

⎝

⎞

⎟

⎠

=(r

··· r

)

⎛

⎜

⎝

⎞

⎟

⎠

, (4)

where r

∈ F

,j=1, 2,...,k. According to Eq. (3),

··· r

)=(σ

··· σ

) C

0−1

. (5)

If C

is non-singular, (r

··· r

)

is a random vector if and

only if (σ

··· σ

)

is a random vector. For random linear

coding, (r

··· r

)

can be randomly generated rather than

computed according to Eq. (5). Therefore the newcomer can

encode B

,...,B

directly into a new encoded block by

the coefﬁcient vector (r

··· r

)

Dimakis et al. [2] and Wu et al. [3] analyzed the lower

bound of network trafﬁc in the regeneration for distributed

storage systems using linear network codin g. If the size of

the original data is M bytes, and each storage node stores

bytes, the minimal regeneration trafﬁcis

k(d−k+1)

M bytes

if d providers are required, otherwise the new encoded block

will not be equivalent to other encoded blocks in decodability.

It is clear that the regeneration scheme showed in Eq. (4)

and Eq. (5) achieves the optimal regeneration trafﬁc when the

number of providers is k.

C. Network Model

In this paper, we focus on how to regenerate new redundant

data quickly u sing linear network coding. The distributed

storage system uses (n, k)−linear network coding, n>k.

Thus the newcomer requires at least k providers in the regen-

eration. It will become more difﬁcult to ﬁnd more providers

in order to start the regeneration, and the regeneration is more

likely to be interrupted by node departures, when it requires

more providers. Therefore in this paper, we only d iscuss the

case that the re generation scheme requires k providers. A

regeneration scheme can transfer data from k providers to the

newcomer, regenerate new redundan t data and save them at

the newcomer. Different from conventional schemes, we also

consider bandwidth heterogeneity in the network.

Assume that one ﬁle has been saved in the distributed

storage system. The size of the origin al ﬁle is M bytes. Each

storage node stores an encoded block of

bytes. Our network

model focuses on the regeneration in the system. In one

regeneration, we assume k active storage nodes are required

as providers. The node set V = {V

,...,V

},whereV

is the new comer, and V

,...,V

are providers. V

recei ves

data from the k providers. Node departures are ignored, i.e.

the newcomer and provider s are assumed to be stable during

the regeneration process. Ed ge set E = {(V

)|i, j =

0, 1,...,k,i < j}. ω(V

) denotes the bandwidth capacity

between V

and V

. Thus the weighted undirected complete

graph G

=(V, E,ω) denotes the network model of the regen-

eration, where k is the number of providers, i.e. k = |V | − 1.

30KB/s

40KB/s

50KB/s

30KB/s

40KB/s

50KB/s

Fig. 1. Comparison between the star-structured and the tree-structured

regeneration scheme in an example of the network model containing 3 nodes.

Fig. 1 shows an example of the network model described

above. (n, 2)−linear network coding is employed in this

model, n>2. When a regeneration starts, the newcomer

receives data from 2 storage nodes. Conventionally, the new-

comer receives data from each provider directly. Fig. 1(a)

illustrates the conventional regeneration scheme. In this re-

generation scheme, the topology of newcomer and providers

is like a star, so this scheme is referred to as a star-structured

regeneration scheme in this paper. In Fig. 1(a), the newcomer

receives encoded block s directly from V

and V

and then

encodes them again to obtain an encoded block with a new

coefﬁcient vector. In the star-structured regeneration scheme,

the regeneration time depends on the minimal edge connecting

to the newcomer V

. In Fig. 1, ω(V

),ω(V

)

is 30KB/s, 50KB/s and 40KB/s respectively, so the bandwidth

bottleneck is (V

) and the available bandwidth capacity, i.e.

the actual transmission rate during the regeneration process is

30KB/s.

In this paper, we propose a tree-structured regeneration

scheme, which constructs a spanning tree in the network model

. Our regeneration scheme does not incur more regeneration

trafﬁc than the star-structured scheme, but it can improve

available bandwidth capacity and thus reduce the regenera-

tion time. Each node in the m odel, no matter newcomer or

provider, can receive data from other nodes. To preve nt from

increasing the regeneration trafﬁc and thus aggravating the

bandwidth bottleneck, we assume each node can receive data

from multiple nodes, but can send data to only one node.

Encodin g operation can be execu ted on the newcomer and

the provi ders, and the encoding delay is ignored, since the

transmission delay is usually much more critical. Fig. 1(b)

is an example of the tree-structured regeneration scheme. V

sends its data to V

. V

encodes the received data with the

data it stores and sends the encoded data to V

. As we will

show in Section IV, the bandwidth bottleneck is (V

),and

the available bandwidth capacity is ω(V

)=40KB/s. We

can see the tree-structured regeneration scheme can regenerate

new redundant data faster.

IV. T

REE-STRUCTURED REGENERATION SCHEME

In this section, we present our tree-structured regeneration

scheme, based on the network model above. First, we show

how the tree-structured scheme can regenerate new redundant

data at the newcomer and prove that a maximum spanning

tree is an optimal regeneration tree. Then we give the encoding

scheme for linear network coding, especially for random linear

coding. We analyze the available bandwidth capacity of the

tree-structured and star-structured regener ation scheme based

on probability theory and order statistics. We compare the

available band width capacity of the two schemes at last.

A. Re generation Tree

Lemma 1: AnyspanningtreeT in G

=(V, E, ω),whose

root is V

, corresponds to one and only one regeneration

scheme in which V

is the newcomer.

Proo f: Given a spanning tree T , we can build a regener-

ation scheme as follows. For any node in T, it receives data

from its children if it is not a leaf node, encodes the received

data with the data it stores, and sen ds the encoded data to

its parent node if it is not the newcomer. In this case, the

newcomer can get the data or its linear combin ation of the

providers, and then become a new storage node.

Giv en a regeneration scheme of G

=(V, E,ω),wecan

build a graph T =(V, E

),where(V

) ∈ E

when data are

transferred on (V

),i,j=0, 1, 2,...,k, i < j. For each

edge in T , it can be mapped to one and only one provider

which sends out data on this edge, since one node can send

data to only one node and the newcomer does not send data

to other nodes. Because there are k providers, |E

| ≤ k.On

the other hand, because the newcomer can receive encoded

blocks or their linear combinations from all providers, T is a

connected graph. So |E

| ≥ k. Because |E

| = k and T is a

connected graph, T is a spanning tree of G

Notice that T and G

are both undirected graphs, but the

transmission is always directed. From the proof of Lemma 1

we can see that for each edge in T , the transmission direction

is from the child node to the parent node. In th is sense, all

edges in T can be regarded as “directed”. The incoming edge

of a node is the edge whose other endpoint is the child of this

node, and the outgoing edge of a node is the edge whose other

endpoint is its parent node.

Lemma 1 shows that we can use a spanning tree to represent

a regeneration scheme of G

. Howe ver, it does not show how

to encode the data at each provider. In Sec tio n IV-B, we will

discuss this question.

Deﬁnition 1: A regeneration tree is a spanning tree in G

Lemma 2: For each edge in the regeneration tree T ,the

amount of transferred data on it is

bytes, where M is the

size of the original ﬁle.

Proo f: According to the proof of Lemma 1, each edge in

T corresponds to one and only one provider, so we give the

proof from the perspective of the providers.

For each leaf node in the regeneration tree, the size of data

it sends is

bytes, because the size of the data it s tores is

bytes.

Assume for each non-leaf nod e except the new comer, the

trafﬁc on each incoming edge is

bytes. The amount of data

it stores is also

bytes. So after linear encoding, the trafﬁc

on its outgoing edge is

bytes.

Since the outgoing edges of all the providers have the same

trafﬁc on them, we can say that the trafﬁc on each edge is

uniform and is equal to

bytes.

From Lemma 2, we can see the regeneration trafﬁcofa

tree-struc tured regeneration scheme is M bytes, so the optimal

re generation traf ﬁc shown in Section III-B has been achieved.

Lemma 3: For each regeneration tree T in G

, the regen-

eration time depends on the edge with the minimal weight.

Proo f: We have known that the weight of each edge in T,

ω(V

),i,j=0, 1, 2, ...,k, i < j, denotes the bandwidth

capacity between V

and V

. According to Lemma 2, the trafﬁc

on each edge in T is uniform. If a node sends data after it

has receiv ed all the data from its children, it will waste a

substantial amount of time. The optimal transmission method

is to use the principle of pipelining. The node encodes and

sends data to its parent node immediately after it has receiv ed

one byte/packet from all of its children. So the bandwidth

bottleneck is the minimal edge in the regeneration tree.

FromtheproofofLemma3,wegivethedeﬁnition of the

available bandwidth capacity of a regeneration tree.

Deﬁnition 2: The ava ilable bandwidth capacity of a regen-

eration tree T in G

is the weight of the minimum edge in

T .

Lemma 4: [16] In a weighted undirected graph, a minimum

(maximum) spanning tree is a bottleneck spanning tree, i.e.

the weight of whose largest (smallest) edge is the minimum

(maximum) over all spanning trees in this graph.

Theor em 1: A maximum spanning tree in G

is a regener-

ation tree with the maximal available bandwidth capacity.

Proo f: The proof is clear according to Lemma 3 and

Lemma 4.

Theorem 1 shows how to ﬁnd an optimal reg eneration tree

in G

. We can see the star-structured regeneration scheme is

a special form of the tree-structured regeneration scheme and

sometimes it is the optimal. However, the tree-structured re-

generation scheme is always no worse than the star-structured

scheme.

Since the bandwidth capacity is time-sensitiv e, its measure-

ment should be triggered before each regeneration, after which

the regeneration tree can be constructed. However, because the

regeneration tree is spanned over the newcomer and all the

providers, the bandwidth measures are made between these

nodes rather than all nodes in the network and thus are limited.

Theorem 2: Let B(T ) be the available bandw idth capacity

of a regeneration tree T in G

. Then the regeneration time is

kB(T )

, where M is the size of the original ﬁle.

Proo f: According to Lemma 2, the amount of trafﬁcon

each edge in T is

bytes. From Lemma 3, we know the

regeneration time depends on the minimal weighted edge in

T .Accordingtothedeﬁnition of B(T ), the regeneration time

B(T )

kB(T )

B. Encoding Scheme

In G

, assume that the encoded block stor ed in V

,i=1, 2,...,k. According to Eq. (4) and Eq. (5), if

(σ

,σ

,...,σ

)

is a random vector, (r

,...,r

)

is also

a random v ector . Thus (r

,...,r

)

can be generated

randomly o n the encoding nodes in a distributed fashion. V

is responsible to generate r

randomly, i =1, 2,...,k. In one

regeneration tree, if V

does not receive data from other nodes,

it sends r

.IfV

recei ves data from V

,...,V

in(V

)

where in(V

) is the indegree of V

, assuming the data received

from V

is B

, it sends r

in(V

)

j=1

. Therefore, the

newcomer can get

i=1

, which is equal to

i=1

C. Available Bandwidth Capacity

We analyze the available bandwidth capacity of tree-

structured and star-structured regeneration scheme by order

statistics. First, we introduce a basic theorem of order statistics

in Lemma 5.

Lemma 5: [17] Assume X

,...,X

are n independent

random variab les, for each of which the cumulative distribu-

tionfunctionisF (x) an d the probability density function is

f(x). Let f

(r:n)

(x) denote the probability density function of

the r

variable X

(r:n)

, X

(1:n)

≥ X

(2:n)

≥ ··· ≥ X

(n:n)

. If

is with continuous distribution,

(r:n)

(x)=

n!F

n−r

(x)[1 − F ( x)]

r−1

f(x)

(n − r)!(r − 1)!

. (6)

Let E = {e

,...,e

k(k+1)

} in G

=(V,E,ω), where

ω(e

) >ω(e

) > ··· >ω(e

k(k+1)

). The bandwidth capacity

on each edge is assumed to be different from each other, as it

realistically reﬂects real-world networ ks with high probability.

Deﬁnition 3: MST(G

)=r if and only if the minimal edge

in the maximum spanning tree of G

=(V, E, ω) is the r

maximal edge of E.

Property 1: k ≤ MST(G

) ≤ M

k+1

− k +1, where M

k(k−1)

Proo f: Let E

= {e

,...,e

k−1

}. If (V, E

) is con-

nected, it must be a maximum spanning tree of G

. Since no

other spanning tree in G

whose m inimal edge is larger than

(V,E

), we can say r ≥ k.

Because G

is a com plete graph, it is k-edge-connected.

Thus it is always connected after removing k − 1 edges. Let

Tree-structured data regeneration with network coding in distributed storage systems

Figures

Citations

Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes

A learning automata-based heuristic algorithm for solving the minimum spanning tree problem in stochastic graphs

Heterogeneity-aware data regeneration in distributed storage systems

Learning automata-based algorithms for solving stochastic minimum spanning tree problem

T-Update: A tree-structured update scheme with top-down transmission in erasure-coded systems

References

Network information flow

OceanStore: an architecture for global-scale persistent storage

Order Statistics: David/Order Statistics

Network Coding for Distributed Storage Systems

The Algorithm Design Manual

Related Papers (5)

The Google file system

Network Coding for Distributed Storage Systems

Erasure Coding Vs. Replication: A Quantitative Comparison

Polynomial Codes Over Certain Finite Fields

XORing elephants: novel erasure codes for big data

Frequently Asked Questions (17)

Q1. What are the contributions in "Tree-structured data regeneration with network coding in distributed storage systems" ?

Q2. How many k bytes does a leaf node send?

Q3. How many pings are sent to a node?

Q4. What is the incoming edge of a node?

Q5. What is the expected value of the available bandwidth capacity of the tree-structured regeneration scheme?

Q6. What is the probability of a file being available?

Q7. How does the proof of Lemma 1 show that the traffic on each edge is uniform?

Q8. What is the encoding matrix of a download node?

Q9. What is the Y-axis of the regeneration scheme?

Q10. What is the definition of a maximum spanning tree?

Q11. What is the data availability of the tree-structured regeneration scheme?

Q12. What is the probability of MST(Gk) being i?

Q13. What is the way to regenerate redundant data?

Q14. How is the regeneration traffic shown in Section III-B?

Q15. how can tree-structured regeneration improve bandwidth capacity?

Q16. What is the probability density function of the distribution of the weight of the edge in E?

Q17. How many times can the tree-structured regeneration scheme save time?