scispace - formally typeset
Search or ask a question

Showing papers by "Lars Lundberg published in 2003"


Proceedings ArticleDOI
22 Apr 2003
TL;DR: This paper defines recovery schemes, which are optimal for a number of important cases, and shows that the problem of finding optimal recovery schemes corresponds to the mathematical problem called Golomb rulers.
Abstract: Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. In this paper we define recovery schemes, which are optimal for a number of important cases. We also show that the problem of finding optimal recovery schemes corresponds to the mathematical problem called Golomb rulers. These provide optimal recovery schemes for up to 373 computers in the cluster.

12 citations



Proceedings ArticleDOI
22 Apr 2003
TL;DR: This paper defines recovery schemes, which are optimal for a number of important cases, and shows that the problem of finding optimal recovery schemes corresponds to the mathematical problem ofFinding sequences of integers with minimal sum and for which all sums of subsequences are unique.
Abstract: Clusters and distributed systems offer fault tolerance and high performance through load sharing, and are thus attractive in real-time applications. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers-fail the must be redistributed. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. In this paper we define recovery schemes, which are optimal for a number of important cases. We also show that the problem of finding optimal recovery schemes corresponds to the mathematical problem of finding sequences of integers with minimal sum and for which all sums of subsequences are unique.

8 citations




Journal Article
TL;DR: An optimal upper bound on the loss of normal case performance when optimizing for worst-case performance is put and a heuristic algorithm is provided for doing engineering trade-offs between worst- case andnormal case performance.
Abstract: Clusters and distributed systems offer fault tolerance and high performance, When all computers are up and running, we would like the load to be evenly distributed among the computers. When a computer breaks down the load on this computer must be redistributed to the other computers in the cluster. Most cluster systems are designed to tolerate one single fault, and one can thus distinguish between two modes of operation: normal operation when all computers are up and running and worst-case operation when one computer is down. The performance during these two modes of operation is determined by the way work is allocated to the computers in the cluster or distributed system. It turns out that the same allocation can in general not achieve optimal normal and worst-case performance, i.e. there is a trade-off. In this paper we put an optimal upper bound on the loss of normal case performance when optimizing for worst-case performance, and an optimal upper bound on the loss of worst-case case performance when optimizing for normal case performance. We also provide a heuristic algorithm for doing engineering trade-offs between worst-case and normal case performance.

1 citations