Tree-structured data regeneration with network coding in distributed storage systems
read more
Citations
Tree-structured Data Regeneration in Distributed Storage Systems with Regenerating Codes
A learning automata-based heuristic algorithm for solving the minimum spanning tree problem in stochastic graphs
Heterogeneity-aware data regeneration in distributed storage systems
Learning automata-based algorithms for solving stochastic minimum spanning tree problem
T-Update: A tree-structured update scheme with top-down transmission in erasure-coded systems
References
Network information flow
OceanStore: an architecture for global-scale persistent storage
Network Coding for Distributed Storage Systems
The Algorithm Design Manual
Related Papers (5)
Frequently Asked Questions (17)
Q2. How many k bytes does a leaf node send?
For each leaf node in the regeneration tree, the size of data it sends is Mk bytes, because the size of the data it stores is M k bytes.
Q3. How many pings are sent to a node?
In the trace file, a node is considered to be up at t time if and only if at least half pings in the batch of pings immediately prior to t are sent to the node successfully.
Q4. What is the incoming edge of a node?
The incoming edge of a node is the edge whose other endpoint is the child of this node, and the outgoing edge of a node is the edge whose other endpoint is its parent node.
Q5. What is the expected value of the available bandwidth capacity of the tree-structured regeneration scheme?
The authors notice that when the bandwidth heterogeneity, i.e. the variance of the bandwidth distribution increases, the expected value of the available bandwidth capacity of the tree-structured regeneration scheme increases, but the expected value of the available bandwidth capacity of the star-structured regeneration scheme decreases.
Q6. What is the probability of a file being available?
The authors measure the performance of regeneration schemes from three aspects: (i) regeneration time: how much time is spent from the start of a regeneration to the end; (ii) probability of the successful regeneration: the probability that a regeneration finishes successfully, not interrupted by the node departures; (iii) data availability: the probability that a file is available.
Q7. How does the proof of Lemma 1 show that the traffic on each edge is uniform?
Since the outgoing edges of all the providers have the same traffic on them, the authors can say that the traffic on each edge is uniform and is equal to Mk bytes.
Q8. What is the encoding matrix of a download node?
For (n, k)−erasure codes or (n, k)−linear network coding, the download node can recover data as soon as it has received k redundant blocks or k linear independent encoded blocks respectively.
Q9. What is the Y-axis of the regeneration scheme?
The Y-axis shows E(Gk), the available bandwidth capacity of the corresponding regeneration scheme in Gk. Because Estar(Gk) = (b−a)k+1 + a, the expected value of the available bandwidth capacity of the star-structured regeneration scheme decreases and converges to a, the lower bound of uniformly distribution U [a, b], with the increasing of k.
Q10. What is the definition of a maximum spanning tree?
In a weighted undirected graph, a minimum (maximum) spanning tree is a bottleneck spanning tree, i.e. the weight of whose largest (smallest) edge is the minimum (maximum) over all spanning trees in this graph.
Q11. What is the data availability of the tree-structured regeneration scheme?
What’s more, when k ≥ 10, the data availability of the tree-structured regeneration scheme is always more than 90%, while the availability of the starstructured scheme is less than 60%.
Q12. What is the probability of MST(Gk) being i?
According to Lemma 5, the probability density function of ω(ei) isf(i:Mk+1)(x) = Mk+1!F Mk+1−i(x)(1− F (x))i−1f(x) (i− 1)!(Mk+1 − i)! . (7)Let E(i:Mk+1) be the expected value of ω(ei),E(i:Mk+1) = Z +∞ 0 xf(i,Mk+1)(x)dx. (8)Let p(k+1, i) be the probability that MST(Gk) = i.
Q13. What is the way to regenerate redundant data?
the authors show how the tree-structured scheme can regenerate new redundant data at the newcomer and prove that a maximum spanning tree is an optimal regeneration tree.
Q14. How is the regeneration traffic shown in Section III-B?
From Lemma 2, the authors can see the regeneration traffic of a tree-structured regeneration scheme isM bytes, so the optimal regeneration traffic shown in Section III-B has been achieved.
Q15. how can tree-structured regeneration improve bandwidth capacity?
Their mathematical analysis shows that the tree-structured regeneration scheme can improve the available bandwidth capacity and the adaptability to the bandwidth heterogeneity, compared with the conventional star-structured regeneration scheme.
Q16. What is the probability density function of the distribution of the weight of the edge in E?
Assume the probability density function of the distribution of the weight of the edge in E is f(x) and F (x) is the cumulative distribution function.
Q17. How many times can the tree-structured regeneration scheme save time?
InFig. 5, the authors show that the tree-structured regeneration scheme can save regeneration time by at least 75% when k ≥ 4, and by 82% at most when k = 20.