Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization
Summary (2 min read)
1 Introduction
- In recent years, a large amount of work has been devoted to the problem of solving large symmetric indefinite systems in saddle-point form efficiently.
- For simplicity in the notation the authors assume without loss of generality that all variables have only a lower bound.
- For a detailed survey on solution techniques for large linear saddle-point systems the interested reader should consult [2].
- Unfortunately, until recently, sparse symmetric indefinite factorization methods were relatively inefficient compared to symmetric positive definite solvers [22].
- The matching algorithms used to reorder the matrix before the factorization are discussed in Sect.
2 Optimization algorithm
- Over the past ten years, a number of primal–dual interior-point methods for general nonlinear continuous optimization problems, such as (1.1), have been proposed (see [19] for a comprehensive survey).
- Therefore, at least close to the solution, the algorithms derived from both points of views are very similar, and fast local convergence can be achieved by either.
- To ensure this, SQP-type line search methods usually require that the projection of W̃k (or a modification thereof) onto the null space of ATk is positive definite, or, equivalently, that the matrix in (2.4) is non-singular and has exactly m (the number of constraints) negative eigenvalues [35].
- The perturbation parameter for the Hessian, δx ≥ 0, is chosen to ensure that the matrix in (2.6) has the correct inertia, and that therefore the search direction has the required descent properties.
3 Supernodal factorizations methods
- The authors use a Level-3 BLAS supernodal left-looking factorization as described in [27, 33, 38] and discuss two additional pivoting techniques for symmetric indefinite systems that are described below.
- The first pivoting technique has also been used in [38]—the second variant is novel and is used to improve the accuracy.
- The element growth in the diagonal block after k steps is bounded by the factor (2.57)k−1.
- In the extreme case, the supernode exists of only one column and the SBK pivoting can degenerate to diagonal pivoting.
- In Sect. 4, the authors will discuss these permutations which are based on combinatorial graph algorithms.
4 Symmetric matchings for a-priori pivoting
- Matching algorithms work on the associate graph representations of the matrices.
- The use of combinatorial techniques in the solution of indefinite linear systems goes back to the early eighties.
- Duff and Pralet [13] showed that the problem of finding symmetric maximum weight matchings in a symmetric matrix is equivalent to finding a non-bipartite matching.
- The matching algorithm also provides, as a byproduct, a row scaling Dr and a column scaling Dc.
5 Numerical experiments
- In this section the authors present a number of numerical results to explore the robustness and efficiency of different direct linear solvers within the interior point code IPOPT.
- The solvers used for the comparisons is the PARDISO solver with the different ordering strategies, as well as the Harwell routines MA27 and MA57 (Version 3.0) [10].
- Further to the right the authors can see how far behind an option is (e.g., in about 55% of the problems, PARDISO1 was not more than 21 = 2 times worse than the best solver for each individual instance).
- 2 Optimal control problems with discretized PDEs.
5.2.1 Parallel solution
- In their final experiments the authors explore the parallel scale-up of the PARDISO code (only using the PARDISO2 method), solving the discretized PDE optimal control problems for instances with several million variables.
- From each problem class the authors picked the first instance, and used the same options for IPOPT as before.
- The table shows the number of grip points N for each problem, the fraction of runtime for the linear solver PARDISO compared to IPOPT, and the numbers of processors available to the direct solver.
6 Conclusion
- The authors presented an analysis of direct solution methods based on restricted pivoting and combinatorial preprocessing for the solution of large-scale saddle-point problems stemming from interior-point optimization methods.
- The results indicate that the combination of the preprocessing step based on weighted symmetric matchings, and the static factorization with Supernodal-Bunch– Kaufman pivoting provides sufficient accuracy both in terms of inertia information (an important requirement for solving nonconvex nonlinear optimization problems), as well as accuracy of the solution.
- Let us briefly recall the main ingredients necessary for this progress: Furthermore, the approximate constraint matching approach is also of importance for the computation of orderings and scalings at a manageable cost.
- The authors thank Andrew Conn for valuable suggestions and Iain Duff for a temporary license from HSL.
Did you find this useful? Give us your feedback
Citations
474 citations
287 citations
Cites methods from "Matching-based preprocessing algori..."
...) Approximate MWM algorithms are used as a heuristic preprocessing step in several sparse linear system solvers [Duff and Gilbert 2002; Hagemann and Schenk 2006; Olschowka and Neumaier 1996; Schenk et al. 2007]....
[...]
...Approximate MWM algorithms are used as a heuristic preprocessing step in several sparse linear system solvers [Duff and Gilbert 2002; Hagemann and Schenk 2006; Olschowka and Neumaier 1996; Schenk et al. 2007]....
[...]
166 citations
Cites background from "Matching-based preprocessing algori..."
...Furthermore, these techniques have been successfully transferred to the symmetric case [24, 26], allowing modern state-of-the-art direct solvers [59, 60, 62] to be orders of magnitudes faster and more memory efficient than ever, finally leading to symmetric indefinite sparse direct solvers that are almost as efficient as their symmetric positive definite counterparts....
[...]
...With the invention of fast matchings-based algorithms [51], which improve the diagonal dominance of linear systems, the situation has dramatically changed and the impact on preconditioning methods [7], as well as the benefits for sparse direct solvers [60, 62], has been recognized....
[...]
150 citations
Cites background or methods from "Matching-based preprocessing algori..."
...A downside of the pivot perturbation that is not negligible is that solving with Ki might require an increased number of refinement steps to achieve the requested accuracy in the interior-point optimization algorithm [32]....
[...]
...To find an accurate solution during the Gaussian block elimination process we need to find a permutation matrix P that minimizes fill-in and avoids small pivots during the factorization [32]....
[...]
...These pivoting strategies are mainly based on graph-matching Bunch– Kaufman pivoting that trades fill-in versus stability to make efficient parallelization possible [32]....
[...]
...In this section we extend a PDE-constrained quadratic program that has been used in [32] to compare the triangular solves approach with the augmented approach....
[...]
...In many computational science applications, the numerical factorization phase Ki = LiDiLi of forming the partial Schur-complement contributions Si = B T i K −1 i Bi has generally received the most attention, because it is typically the largest component of execution time; most of the algorithmic improvements [7, 30, 32] in the factorization are related to the exploitation of the sparsity structure in Ki....
[...]
136 citations
References
17,420 citations
7,966 citations
5,629 citations
3,729 citations
Additional excerpts
...1, using Dolan-Moré performance profiles [8]....
[...]
3,176 citations
"Matching-based preprocessing algori..." refers methods in this paper
...The first test set consists of 721 problems from the CUTE [5] collection, as provided by Benson [1] in the AMPL modeling language [20]....
[...]
Related Papers (5)
Frequently Asked Questions (20)
Q2. What is the basic idea of the symmetric a-priori pivoting?
The basic idea of the symmetric a-priori pivoting is to form 2 × 2 diagonal blocks based on the matched off-diagonal entries of the matrix.
Q3. How do the authors maintain the diagonal block structure?
In order to maintain the diagonal block structure, the authors compress the matrix by merging the structure of the rows and columns belonging to a 2 × 2 diagonal block.
Q4. What is the idea of a bipartite maximum weight matching?
The idea introduced in [11] is to determine a bipartite maximum weight matching of the matrix, and to symmetrize it based on the cycle structure of the matrix.
Q5. What are the vectors for the equality and bound constraints?
The vectors λ and z are the Lagrangian multipliers for the equality and bound constraints, and X and Z denote the diagonal matrices with the vector elements of x and z on the diagonal.
Q6. How do the authors enforce the use of the preselected pivot blocks?
In the pivoting variant II the authors will identify 2 × 2 and enforce the use of these preselected pivot blocks by adding additional zero elements to the structure of K .
Q7. How did Duff and Pralet find a symmetric maximum weight matching?
Duff and Pralet [13] showed that the problem of finding symmetric maximum weight matchings in a symmetric matrix is equivalent to finding a non-bipartite matching.
Q8. What is the simplest way to detect structural rank deficiencies?
By matching in AT it is possible to detect certain structural rank deficiencies, which can, for example, stem from redundant constraints in the optimization problem formulation.
Q9. What solvers are used for the comparisons?
The solvers used for the comparisons is the PARDISO solver with the different ordering strategies, as well as the Harwell routines MA27 and MA57 (Version 3.0) [10].
Q10. What is the simplest way to find a global symmetric maximum weight matching?
4.2 Matching variant: constraint matching in ATIf the authors focus on finding favorable 2 × 2 diagonal pivots and assume that only diagonal entries are matched in H , the authors can approximate a global symmetric maximum weight matching by determining a row-perfect maximum weight matching in the constraint block AT only.
Q11. What is the heuristic for choosing x and c in IPO?
The heuristic for choosing δx and δc implemented in IPOPT is described in [41]; it usually attempts first to work with δx = δc = 0 to obtain the pure primal–dual Newton direction for fast convergence, and only if the inertia is not correct with this choice are other values tried.
Q12. What is the case for a perfect maximum weight matching?
In their case the authors are interested in a perfect maximum weight matching, i.e., a matching with maximum weight under the condition that all rows and columns are matched.
Q13. What is the Jacobian of the system of Eq. (2.1)?
Using the fact that the Jacobian of the system of Eq. (2.1) is non-singular at a non-degenerate local solution (x∗, λ∗, z∗) of (1.1), it is possible to derive methods that converge superlinearly toward (x∗, λ∗, z∗); see, e.g., [23].
Q14. Why do the authors not compare the runtimes of the different linear solvers?
Since more than 60% of the CUTE problems are solved in less than 0.1 CPU seconds, the authors do not present a comparison of the runtimes.
Q15. Why is there a surge in interest in solving large linear systems in saddle-point form?
One reason for this surge in interest is due the success of interior-point methods in nonlinear programming, which at their core require the solution of a series of linear systems in saddle-point form.
Q16. What is the heuristic for determining the descent properties of the matrix?
In the context of an SQP-type line search optimization method it is therefore possible to ensure the required descent properties by performing a “trial” factorization for systems of the form (2.4), where the matrix W̃k is modified according to some heuristics, until the inertia of the matrix is correct.
Q17. What is the way to determine the heuristics used in IPOPT?
closer examination of the results shows that NLPs, which are degenerate so that the iteration matrix is singular in every iteration, are not always handled well, because the heuristics implemented in IPOPT to detect such structural degeneracy rely on the detection of singularity of the matrix by the linear solver.
Q18. What is the fastest linear solver for the discretized PDE?
In their final experiments the authors explore the parallel scale-up of the PARDISO code (only using the PARDISO2 method), solving the discretized PDE optimal control problems for instances with several million variables.
Q19. What is the simplest way to enforce the use of the preselected pivot blocks?
In order to enforce the use of these preselected pivot blocks during the LDLT factorization, the authors merge the 2 × 2 column structure in L in such a way that these initial 2 × 2 pivot structure is maintained in the factor L.This is illustrated in Fig.
Q20. Why are the symmetric maximum weight matchings only approximated?
Since these general matching algorithms are much more expensive than bipartite matching algorithms, the symmetric maximum weight matchings are typically only approximated.