# Parallel QR Factorization of Block-Tridiagonal Matrices

## Summary (2 min read)

### 2. Algorithm.

- The authors description and analysis will rely on the theory of sparse matrix factorizations.
- The necessary theoretical background is briefly described in the next section; the authors refer the reader to the cited documents for a detailed discussion of this topic.

### 2.1. Preliminaries.

- Nested Dissection applies this procedure recursively until subgraphs of a given minimum size are achieved.
- The resulting tree of separators matches the assembly tree.

### 2.3. Node elimination.

- This process closely resembles the QR multifrontal method [2, 13, 10] .
- It must be noted, however, that in the multifrontal method, frontal matrices are explicitly assembled by copying coefficients into dense matrix data structures which are allocated in memory and initialized to zero beforehand; these copies can be extremely costly due to the large size of blocks and the heavy use of indirect addressing.
- In their approach, instead, the frontal matrices need not be formed explicitly but the constituent blocks can be easily accessed through pointers (see Section 3 for further details).
- Additionally, the multifrontal method can only take advantage of the zeroes in the bottom-left part of frontal matrices (this is referred to as "Strategy 3" in the work of Amestoy, Duff, and Puglisi [2] ) whereas, through the use of variable pivoting, their approach can avoid more unnecessary computations.

### 2.4. Complexity.

- By the same token, it is possible to compute the amount of memory (in number of coefficients) consumed at each node type, reported in the bottom part of Table 2 ; this is made up of the blocks coming from the original matrix (underlined in the table) plus the fill-in generated during the processing of the node .
- 2 for all the nodes of the tree leads to the overall cost of the factorization, which is.

### Summing up the values in Table

- In the above formula, C can be replaced with either F or M leading to, respectively, the overall flop count or the overall memory consumption.
- This manuscript is for review purposes only.
- Finding an optimal pivotal sequence is an extremely challenging task due to its combinatorial nature.
- Figure 7 shows which fraction of the total floating point operations is performed within chain or leaf nodes demonstrating that even with matrices of relatively small size it is possible to achieve high levels of parallelism (by increasing l) without incurring an excessive volume of communications.

### 4. Experimental results.

- As discussed above, some concurrency is available within each branch or node of the tree, this parallelism involves communications due to the fact that multiple processes share the same data.
- On shared memory systems, these communications take the form of memory traffic and synchronizations.
- In distributed memory systems, these communications amount to transferring data through the slow network interconnection and, therefore, are much more penalizing.
- The use of nested dissection introduces embarrassing parallelism because each process may potentially work on a different branch without communicating with others.

Did you find this useful? Give us your feedback

##### Citations

1 citations

1 citations

### Cites methods from "Parallel QR Factorization of Block-..."

...There are other works where different factorization methods (LU, Cholesky or Householder) are parallelized for multicore architectures or GPUs [22], [23], [24], [25], [26][25], [27], [28]....

[...]

1 citations

##### References

11,201 citations

### "Parallel QR Factorization of Block-..." refers background in this paper

..., estimate) a representation of data that is suitable for a given task [5]....

[...]

1,907 citations

### "Parallel QR Factorization of Block-..." refers methods in this paper

...All of the methods referenced, including ours, are, in essence, specialized multifrontal methods [13]....

[...]

1,366 citations

### "Parallel QR Factorization of Block-..." refers methods in this paper

...As it is commonly done in the literature [10, 8], our symbolic analysis of the QR factorization of a sparse matrix relies on the equivalence between the R factor of a real matrix A and the Cholesky factor of the normal equations B = AA....

[...]

1,317 citations

### "Parallel QR Factorization of Block-..." refers background in this paper

...If the nodes in S are eliminated last, no fill-in is possible between the nodes of G1 and those of G2 because of the Rose, Tarjan and Lueker theorem [22]....

[...]

1,116 citations