# Parallelism versus Memory Allocation in Pipelined Router Forwarding Engines

## Summary (2 min read)

### 1. Introduction

- Parallel processors are often used to solve time-consuming problems.
- To their best knowledge, this problem was first raised and left as an open problem in [14].
- Given that minimizing memory is required to minimize cost and that pipelining is required for speed, one way out of the dilemma is to change the underlying model.
- It is difficult today to imagine a very high speed design with more than say b = 100 banks of memory connected via the crossbar.
- The authors say that an allocation is feasible if every processor’s request is satisfied and no more than two processors are allocated to any one memory.

### 3. Our Bin Packing Problem Is NP-Complete

- The authors will prove the NP-completeness of the bin packing problem with the constraint that each bin can have at most two types.
- In fact, they showed that the 3-PARTITION problem is NP-complete in the strong sense (see [8]).
- Determine if W can be packed into 2m bins such that no bin contains more than two types.
- First, the authors observe that a weight type cannot be partitioned into more than two parts.
- Hence, the authors have shown the following: Theorem 1.

### 4. A Graph Representation

- Before the authors discuss approximation algorithms for their bin packing problem and their worst-case analysis, they consider a graph representation of a packing.
- (3) If the bin is partially filled with only one type, the authors say the corresponding loop is weak.
- If the bin is completely filled with two types, the authors say the corresponding edge is strong.
- There are, of course, different ways to pack W into two bins so that each bin contains at most two types.
- In this paper the authors use the convention that a cycle must have at least two vertices.

### 5. Approximation Algorithms

- The authors now describe a simple algorithm for bin packing subject to the constraint that no bin contains weights of more than two types.
- (3) Each connected component except for the last one has at most one weak edge which can only appear at the end of the component.
- Let OPT denote the number of bins needed in the optimum packing.
- It is worth noting that if the associated graph of the resulted packing does not have any weak loop, Algorithm A is exactly at most 32 from optimal.

### 6. Some Properties of the Associated Graphs

- These properties provide the foundation for the reduction steps in the approximation algorithm to be discussed in the next section.
- During the moving process, the authors might split the original component into two, but the total number of bins will never increase.
- If the authors successfully carry this on until the two weak edges become adjacent, they then use Operation 2 to eliminate one weak edge or split the component into two.
- In the latter case, the authors need to check whether there are two weak loops in the entire graph.
- If the associated graph G P contains a strong loop in one connected component X and a weak edge in another connected component Y and is stable, the authors can find another packing P ′ which uses no more than b bins with its associated graph with one fewer strong loops than packing P and also stable.

### 7. An Improved Algorithm

- The authors will show that the modified algorithm gives an optimal solution when the total weight is greater than or equal to the number of types.
- Every time the authors eliminate one strong loop, at most a linear number of atomic operations are involved, each taking constant time.
- Now the authors have their linear time Algorithm B: Algorithm B.
- While there exists a component X containing a strong loop and another component Y containing a weak edge, the authors use only the first step as described in the proof of Lemma 2 to merge these two components into one, without taking care of the possible multiple edges in any one connected component.

### 8. Dynamic Memory Allocation

- So far the authors have only dealt with approximation and exact algorithms for static memory allocation.
- In this situation the authors have a tradeoff between memory utilization and cost of repacking or compaction [14].
- Call any memory piece that has not been swapped clean and otherwise dirty.
- The authors then allocate one more weight of size α, and follow this with a deallocation of one of the paired weights.
- Observing the allocation assignment made by the online Algorithm D, the authors are then given a list of k/2 deallocation requests which remove exactly one weight from every shared bin.

### 9. Conclusions

- In practice, one would simply choose the parameters such that the number of memories is larger than the number of processor stages.
- In that case, the approximation algorithm the authors presented will provide 100% efficiency.
- The authors know at least one implementation of one of their models that scales to multiple OC-768 speeds.
- On the theoretical front, their paper also poses an interesting open problem for the general case of packing bins so that each bin contains at most r types for some fixed integer r .

Did you find this useful? Give us your feedback

##### Citations

929 citations

### Cites background from "Parallelism versus Memory Allocatio..."

...An alternate design is to assign each logical stage to a decoupled set of memories via a crossbar [4]....

[...]

65 citations

48 citations

46 citations

### Cites methods from "Parallelism versus Memory Allocatio..."

...Pipelined tries [7, 17, 5, 32, 27] are used by algorithmic lookup modules....

[...]

30 citations

### Cites background or methods from "Parallelism versus Memory Allocatio..."

...Using the fact (shown in [3]), that an optimal packing can be represented by a forest with loops, a pattern is defined as a tree with at most 1 δ2 nodes....

[...]

...In [3], the authors show that the problem which they study is NP-hard in the strong sense for k = 2....

[...]

...The paper [3] showed that for any given packing, it is possible to modify the packing such that there are no cycles in the associated graph, apart from the loops....

[...]

...[3] studied this problem and described the drawbacks of the methods stated above....

[...]

...Thus, this simple algorithm performs as well as the algorithm from [3] for k = 2....

[...]

##### References

40,020 citations

3,885 citations

### "Parallelism versus Memory Allocatio..." refers background in this paper

...[13] L. Valiant, A bridging model for paral lel computation....

[...]

...Similar notions of randomizing accesses to memory date back to Valiant [13] and Ranade [8], as well as some recent work [3]....

[...]

...Similar notions of randomizing accesses to memory date back to Valiant [13] and Ranade [8], as well as some recent work [3]....

[...]

3,381 citations

^{1}

1,741 citations