Journal Article•DOI•

Convergence analysis of direct minimization and self-consistent iterations

Q: What are the future works mentioned in the paper "Convergence analysis of direct minimization and self-consistent iterations" ?

In particular, this is necessary to extend the convergence theory presented in this paper to infinite-dimensional settings.

Q: How many iterations of the SCF method appear to be converging?

For instance for β = 1 at a = 10.26 Bohrs, the method appears to be converging for almost 20 iterations, up to a reduction in residual of a factor 10−8.

Q: What is the common method of solving the linear eigenproblem?

Solving the linear eigenproblem is then done using iterative block eigensolvers, which can be understood as specialized direct minimization algorithms in the case of a linear energy functional E(P ) = Tr(H0P ).

Q: Why are the matrices P and H not stored?

In condensed-matter physics using plane-wave basis sets to solve the Kohn-Sham density functional theory with local or semilocal functionals, the matrices P and H are not stored explicitly.

Q: What is the smallest eigenvalue of the matrix h?

Since the second smallest eigenvalue of the matrix h is strictly lower than the third one, for α small enough, the unique minimizer P∗ of (4.8) is onM2 and satisfies the strong Aufbau principle, and both the gradient descent and SCF algorithm locally converge to P∗.

Q: What is the reason for the divergence effect?

The cause of this effect appears to be that the divergent modes break the natural inversion symmetry of the crystal in this particular case: the authors have checked that the divergence occurs much sooner if the authors break this symmetry by perturbing the positions of the atoms around their symmetric positions (at 9 iterations by perturbing the position of one atom by 10%).

Q: How many iterations of the SCF method are there?

In the first case, with a = 10.26 Bohrs, the simple (undamped) SCF method appears to be converging for almost 20 iterations, but then diverges, until the density residual stabilizes at a positive value, as predicted in [14] for the Hartree-Fock model.

Eric Cancès, Gaspard Kemlin, Antoine Levitt

02 Mar 2021-SIAM Journal on Matrix Analysis and Applications (Society for Industrial and Applied Mathematics)-Vol. 42, Iss: 1, pp 243-274

TL;DR: In this article, the numerical solution of subspace optimization problems, consisting of minimizing a smooth functional over the set of orthogonal projectors of fixed rank, is studied. But this paper is not concerned with the optimization of subspaces.

read less

Abstract: This article is concerned with the numerical solution of subspace optimization problems, consisting of minimizing a smooth functional over the set of orthogonal projectors of fixed rank. Such probl...

...read moreread less

Summary (2 min read)

Jump to: [1. Introduction] – [2. Optimization on Grassmann manifolds] – [3. Algorithms and analysis of convergence] – [4. Numerical tests] and [5. Conclusion]

1. Introduction

This problem is of interest in a number of contexts, such as matrix approximation, computer vision [1], and electronic structure theory [12, 28, 39, 40, 45, 56], the latter of which being the main motivation for this work.
In the case when E(P ) = Tr(H0P ) for a fixed symmetric matrix H0, one recovers the classical eigenvalue problem H0φi = εiφi.
While the convergence of several SCF and direct minimization algorithms has been analyzed from a mathematical point of view (see e.g. [16, 38, 42, 53, 60, 64] and references therein), the two approaches have not been compared in a systematic way to their knowledge.
Rather, in this paper, the authors aim to focus on the very simplest representative of each general strategy (SCF and direct minimization).

2. Optimization on Grassmann manifolds

The authors focus in this paper on the case of real symmetric matrices, but the study can be easily extended to complex hermitian matrices.
Other models from electronic structure can be considered, such as the discretized Hartree-Fock or Kohn-Sham models, where the energy is of the form E(P ) := Tr(H0P ) + Enl(P ) with H0 being the core Hamiltonian (representing the kinetic energy and the external potential) and Enl a nonlinear energy functional depending on the model (representing the interaction between electrons).
Assumption 2.2. Assumption 2.2 is true for Hartree-Fock models.
To study the first-order optimality conditions, the authors start by recalling some classical results about the geometry of the manifold MN .
The operator Remark 2.6 (Link with the Liouvillian).

3. Algorithms and analysis of convergence

The gradient descent algorithm consists in following the steepest descent direction with a fixed step β at each iteration point.
Note that, by the use of the retraction R and Assumption 3.1, the projection step has no influence on the convergence of the algorithm for β small.
K∗ is positive definite on TP∗MN , for β small enough, the spectral radius r(df(P∗)) of the derivative df(P∗), is less than 1, which concludes the proof.
Second, the bigger ‖Ω∗‖op = εNb − ε1, the more difficult the convergence.
The same analysis holds true for the preconditioned SCF algorithm.

4. Numerical tests

The authors present here some numerical experiments to illustrate their theoretical results, explore their limits and investigate the global behavior of the algorithms.
The results confirm the theory the authors developed in Section 3.3: the gap has a strong influence on the convergence behavior of the SCF algorithm.
For larger values of α, the two alternatives of Lemma 4.1 appear.
In the first case, with a = 10.26 Bohrs, the simple SCF method appears to be converging for almost 20 iterations, but then diverges, until the density residual stabilizes at a positive value, as predicted in [14] for the Hartree-Fock model.
Note that this phenomenon is reminiscent of that observed in Figure 7, where all the modes were not fully excited, making the convergence faster than expected.

5. Conclusion

The authors examined the convergence of two simple representatives in the class of direct minimization and SCF algorithms.
Accelerated algorithms are generally found to follow the trend suggested by their theoretical results, although the authors showed that the Anderson-accelerated SCF algorithm was able to converge quickly even in the presence of a single very small gap.
In quantum chemistry using Gaussian basis sets to solve the Hartree-Fock model or Kohn-Sham density functional theory using hybrid functionals, the rate-limiting step is often the computation of the Fock matrix H(P ).
Direct minimization algorithms effectively merge the two loops of the SCF and linear eigensolver, and should therefore be more efficient.
Another interest of direct minimization algorithms is their robustness, as the choice of a stepsize can be made in order to minimize the energy, unlike the damped SCF algorithm where choosing an appropriate damping parameter is often done empirically.

Did you find this useful? Give us your feedback

Figures (10)

Figure 3 – Definition of A and graphical interpretation of the Aufbau principle and the existence of a gap.

Figure 9 – Results obtained with the gradient descent, damped SCF and ODA algorithm for N = 2, α = 30 and Nb = 100: the limiting points P grad∗ and P SCF∗ lay by construction on the manifold M2, while PODA∗ does not (it only belongs to its convex hull CH(M2)).

Figure 10 – Convergence curves of the density residual as a function of the number of iterations k for Silicon with different lattice constants a.

Figure 2 – Retraction for the damped SCF algorithm.

Figure 1 – Projection on the tangent space for the gradient descent, and retraction to the manifold.

Figure 7 – Convergence of Algorithm 1 and Algorithm 2 for N = 1, α = 50 and Nb = 100.

Figure 5 – Chaotic behavior of the simple SCF map for the energy functional Ec1,1 defined by (4.3) and N = 1 as a function of c1. On the left is a global view of the bifurcation and the right is a zoom on part of the interval on which we observe a chaotic behavior.

Figure 8 – Bifurcation on the energy, the density and the eigenvalues as a function of α for N = 2, Nb = 40.

Figure 4 – Comparison of the convergence of both algorithms depending on ε for two different values of the step β.

Content maybe subject to copyright Report

HAL Id: hal-02546060

https://hal.inria.fr/hal-02546060v2

Submitted on 27 Oct 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Convergence analysis of direct minimization and

self-consistent iterations

Eric Cancès, Gaspard Kemlin, Antoine Levitt

To cite this version:

Eric Cancès, Gaspard Kemlin, Antoine Levitt. Convergence analysis of direct minimization and self-

consistent iterations. SIAM Journal on Matrix Analysis and Applications, Society for Industrial and

Applied Mathematics, 2021, 42 (1), 243-274 (32 p.). �10.1137/20M1332864�. �hal-02546060v2�

CONVERGENCE ANALYSIS OF DIRECT MINIMIZATION AND

SELF-CONSISTENT ITERATIONS

ERIC CANC

ES, GASPARD KEMLIN, ANTOINE LEVITT

Abstract. This article is concerned with the numerical solution of subspace optimization problems,

consisting of minimizing a smooth functional over the set of orthogonal projectors of ﬁxed rank. Such

problems are encountered in particular in electronic structure calculation (Hartree-Fock and Kohn-

Sham Density Functional Theory – DFT – models). We compare from a numerical analysis perspective

two simple representatives, the damped self-consistent ﬁeld (SCF) iterations and the gradient descent

algorithm, of the two classes of methods competing in the ﬁeld: SCF and direct minimization methods.

We derive asymptotic rates of convergence for these algorithms and analyze their dependence on the

spectral gap and other properties of the problem. Our theoretical results are complemented by numerical

simulations on a variety of examples, from toy models with tunable parameters to realistic Kohn-Sham

computations. We also provide an example of chaotic behavior of the simple SCF iterations for a

nonquadratic functional.

1. Introduction

This paper is concerned with the convergence behavior of algorithms to solve the subspace optimization

problem

min

E(P )



P ∈ R

×N

, P

= P = P

∗

, Tr(P ) = N

(1.1)

consisting of optimizing a C

function E : R

×N

→ R over the set of rank-N orthogonal projectors P .

Here P

∗

denotes the adjoint (transpose) of P . This problem can also be reformulated as

min

(

i=1

∗



∈ R

, φ

∗

= δ

∀ i, j ∈ {1, . . . , N}

)

, (1.2)

using an orthonormal basis (φ

)

i=1,...,N

for the subspace Ran(P ). This problem is of interest in a number

of contexts, such as matrix approximation, computer vision [1], and electronic structure theory [12, 28,

39, 40, 45, 56], the latter of which being the main motivation for this work.

Let H(P ) = ∇E(P ). The ﬁrst order conditions for problem (1.1) is

P H(P)(1 −P ) = (1 −P )H(P )P = 0.

Up to an appropriate choice for the orthonormal basis (φ

)

i=1,...,N

of Ran(P ), this yields

H(P )φ

= ε

, (1.3)

which reveals an alternative interpretation of this problem as a nonlinear eigenvector problem (to be

distinguished from nonlinear eigenvalue problems of the form A(ε)φ = 0, where A : R → R

×N

). In

the case when E(P ) = Tr(H

P ) for a ﬁxed symmetric matrix H

, one recovers the classical eigenvalue

problem H

= ε

. At a minimizer of (1.1), the (ε

)

i=1,...,N

are the lowest eigenvalues of H

, counting

multiplicities.

Problems of the form (1.1) are found in the Hartree-Fock and Kohn-Sham theories of electronic struc-

ture [28, 45], both approximations of the many-body Schr¨odinger equation. In this context, the φ

are

(discretized) orbitals, the projector P is the density matrix, and the energy E(P ) includes linear contri-

butions from the kinetic and external potential energy of the electrons, as well as nonlinear terms arising

from electron-electron interaction. Another notable problem of this form is the nonlinear Schr¨odinger

or Gross-Pitaevskii equation for Bose-Einstein condensates [6], where N = 1. In all these cases, the

ﬁrst-order condition (1.3) is interpreted as a self-consistent or mean-ﬁeld equation: the particles behave

as independent particles in an eﬀective Hamiltonian H(P ) (also known as the Fock matrix) involving the

Date: October 27, 2020.

mean-ﬁeld they create. In the rest of this paper, we will work on the formulation (1.1) without specifying

E for generality.

The minimization problem (1.1) is compact but nonconvex: there exists at least one minimizer, but

the minimizer might not be unique, and local minima might not be global ones. Solving this optimization

problem is of considerable practical interest, and algorithms for doing so date back to the early days of

quantum mechanics [26]. The ﬁrst introduced and still most popular approach is the self-consistent ﬁeld

(SCF) method, which, in its original version [48, 54], works as follows: if P

is the current iterate of the

algorithm, P

k+1

is found by solving (1.3) for the ﬁxed matrix H(P

H(P

)φ

= ε

, (φ

)

∗

= δ

with the ε

sorted in non-decreasing order, and building P

k+1

i=1

(φ

)

∗

This algorithm assumes the Aufbau property, which is that at a minimum P

∗

we have P

∗

i=1

∗

with φ

a system of orthogonal eigenvectors associated with the lowest N eigenvalues of H(P

∗

). This

property holds for the (spin-unconstrained) Hartree-Fock model [4] and the Gross-Pitaevskii models

without magnetic ﬁeld [10], usually holds for molecular systems in the Kohn-Sham model, but does not

hold in general for Gross-Pitaevskii models with strong magnetic ﬁelds.

This basic procedure converges for systems where the nonlinearity is weak, but fails to converge

otherwise (see [14] for a comprehensive mathematical analysis of this behavior when the functional E is

a sum of a linear and a quadratic term in P , which is the case for the Hartree-Fock model). A solution

is to damp this procedure, and mix the iterates to accelerate convergence. This gives rise to a variety

of SCF algorithms, among which Broyden-like and Anderson-like mixing algorithms [33, 44, 52, 57], the

Direct Inversion in the Iterative Space (DIIS) algorithm [36, 50, 51], the Optimal Damping Algorithm

[13] (ODA), and the Energy-DIIS (EDIIS) algorithm combining the latter two approaches [37].

A second class of algorithms solves the minimization problem (1.1) directly. The minimization set

{P ∈ R

×N

, P

= P

∗

= P, Tr P = N} is diﬀeomorphic to the Grassmann manifold of the N -

dimensional vector subspaces of R

. This set is naturally equipped with the structure of a Riemannian

manifold, and this allows the use of Riemann optimization algorithms [1, 21]. Direct minimization

algorithms are preferred for the Gross-Pitaevskii model with magnetic ﬁelds [3, 18, 27, 29], for which the

Aufbau principle is not satisﬁed in general. Gradient-type [2, 17, 49, 61, 66], Newton-type [5, 15, 68], and

trust-region methods have also been designed to solve (1.1) for larger values of N. At the time of writing,

direct minimization algorithms are less popular than SCF algorithms in electronic structure calculation,

where N can be very large, but it is not clear whether this is for sound scientiﬁc reasons or because SCF

algorithms have been implemented and optimized for decades in the main production codes, which has

not been the case for direct minimization algorithms.

While the convergence of several SCF and direct minimization algorithms has been analyzed from a

mathematical point of view (see e.g. [16, 38, 42, 53, 60, 64] and references therein), the two approaches

have not been compared in a systematic way to our knowledge. The purpose of this paper is to contribute

to ﬁll this gap, by focusing on very simple representatives of each class, namely the damped SCF iteration

and the gradient descent. We emphasize that neither of these two algorithms is a practical choice as

is. The SCF iteration should be accelerated (for instance using the Anderson acceleration technique),

and the gradient information in direct minimization methods should rather be used as part of a quasi-

Newton method (such as the limited-memory BFGS algorithm [1]). Depending on the exact problem

at hand, all these methods should be preconditioned to avoid issues related to small mesh sizes (which

leads to a divergence of the kinetic energy term) and/or large computational domains (which can lead

to a divergence of the Coulomb energy, or the conﬁning potential). We refer to [63] for a recent review

in the context of the Kohn-Sham equations for solids. Rather, in this paper, we aim to focus on the

very simplest representative of each general strategy (SCF and direct minimization). The investigation

of these two basic algorithms is informative on the strengths and weaknesses of the two classes, and is a

ﬁrst step in the analysis of more complex methods.

The paper is organized as follows. In Section 2, we recall some results about optimization on Grass-

mann manifolds, in particular the ﬁrst and second order optimality conditions, and prove preparatory

lemmas. In Section 3, we present the two algorithms that are in the scope of this paper: a ﬁxed-step

gradient descent and a damped SCF algorithm. We prove their local convergence as long as the step is

small enough and we derive convergence rates. We ﬁnd that the convergence rates depend on the spectral

radius of operators (acting on R

×N

) of the form 1−βJ, with β the ﬁxed step and J = Ω

∗

for the

gradient descent, J = 1 + Ω

∗

−1

∗

for the SCF algorithm, where the operators Ω

∗

and K

∗

are speciﬁed

in the next section. Let us just mention at this stage that the lowest eigenvalue of Ω

∗

is equal to the

spectral gap between the N

and (N +1)

eigenvalues of H(P

∗

), allowing us to analyze the convergence

rates of the algorithms in terms of natural quantities of the problem. This also shows that the damped

SCF algorithm can be seen as a matrix splitting of the ﬁxed-step gradient descent algorithm.

In Section 4, we compare the two algorithms on several test problems. First, we focus on a toy model

for which we can easily tune the gap and observe some fundamental diﬀerences between SCF and direct

minimization algorithms, in agreement with the mathematical results established in Section 3. We also

provide an example of chaos in SCF iterations, complementing the results of [14, 38] in the case of a

non-quadratic objective functional E. Then, we analyse a 1D Gross-Pitaevskii model (N = 1) and its

fermionic counterpart for N = 2, for which we investigate the behavior of the algorithms when the gap

closes. We conclude with an example from electronic structure calculation: a Silicon crystal, in the

framework of Kohn-Sham DFT, where we show in particular that accelerated SCF algorithms are less

sensitive to small gaps than the simple damped SCF. Finally, in Section 5 we draw conclusions and

outline perspectives for future work.

2. Optimization on Grassmann manifolds

We focus in this paper on the case of real symmetric matrices, but the study can be easily extended to

complex hermitian matrices. Let H

= R

×N

sym

be the vector space of N

× N

real symmetric matrices

endowed with the Frobenius inner product hA, Bi

= Tr(AB). Let



P ∈ H



= P



and M



P ∈ H



= P, Tr(P ) = N



From a geometrical point of view, M is a compact subset of H with N

+ 1 connected components

, N = 0, . . . , N

, each of them being characterized by the value of Tr(P ), namely the rank of the

orthogonal projector P , and being diﬀeomorphic to the Grassmann manifold Grass(N, N

) [1]. From

now on, we ﬁx the number of electrons N and we seek the local minimizers of the problem

min

P ∈M

E(P ), (2.1)

where E : H → R is a discretized energy functional, for which some examples are given below.

Example 2.1. As an example, we study a discrete Gross-Pitaevskii model in Section 4.4. Other models

from electronic structure can be considered, such as the discretized Hartree-Fock or Kohn-Sham models,

where the energy is of the form

E(P )

= Tr(H

P ) + E

(P )

with H

being the core Hamiltonian (representing the kinetic energy and the external potential) and E

a nonlinear energy functional depending on the model (representing the interaction between electrons).

For instance, for the Hartree-Fock model,

(P )

Tr(G(P )P ) where (G(P ))

k,l=1

ijkl

∀ i, j = 1, . . . , N

with A a symmetric tensor of order 4. For more details on these models or electronic structure in general,

we refer to [12, 40, 56].

In plane-wave, ﬁnite diﬀerences, ﬁnite elements or wavelets electronic structure calculation codes, the

size N

of the discretized space is in practice much larger than the number N of electrons. Therefore, it

is not practical to store and manipulate the (dense) matrix P . Instead, algorithms work on the orbitals

(φ

)

i=1,...,N

introduced in (1.3). The density matrix P is then recovered as

P =

i=1

∗

All the results in this article are presented in the density matrix framework. However, the algorithms we

study can be expressed in a way that avoids ever forming the density matrix. We refer to [63] for details.

We will need two assumptions for our results.

Assumption 2.2. The energy functional E : H → R is of class C

(twice continuously diﬀerentiable).

Assumption 2.2 is true for Hartree-Fock models. For Kohn-Sham models, it is true when the density

ρ =

i=1

|φ

is uniformly bounded avay from zero, which is the case for instance in condensed phase

systems. Most of the results presented in this article are local in nature, and therefore this assumption

can be relaxed to local regularity.

Assumption 2.3. P

∗

∈ M

is a nondegenerate local minimizer of (2.1) in the sense that there exists

some η > 0 such that, for P ∈ M

in a neighborhood of P

∗

, we have

E(P ) > E(P

∗

) + η kP − P

∗

It is very hard in most practical situations to check this assumption, but it seems to be veriﬁed in

practice. Notable exceptions are systems invariant with respect to continuous symmetry groups, in which

case E(P ) = E(P

∗

) for all P in the orbit of P

∗

along the symmetry group. In this case, the assumption

can not be true, and kP − P

∗

must be replaced by the distance from P to the orbit of P

∗

. Our results

can be extended to this case up to quotienting H by the symmetry group.

Throughout the paper, we will use the following notation:

•

H(P )

= ∇E(P ) is the gradient, and H

∗

= H(P

∗

);

•

K(P )

= Π

∇

E(P )Π

is the Hessian projected onto the tangent space at P , and K

∗

= K(P

∗

)

(the projection Π

is deﬁned below in Proposition 2.4).

2.1. First-order condition. To study the ﬁrst-order optimality conditions, we start by recalling some

classical results about the geometry of the manifold M

Proposition 2.4. M

is a smooth real manifold and its tangent space T

at P ∈ M

is given by

= {X ∈ H | P X + XP = X, Tr(X) = 0} = {X ∈ H | P XP = (1 − P )X(1 − P ) = 0}.

The orthogonal projection Π

on T

for the Frobenius inner product is

∀ X ∈ H, Π

(X) = P X(1 − P ) + (1 − P )XP = [P, [P, X]], (2.2)

where [A, B]

= AB −BA.

This classical result is proved in e.g. [1, Section 3.4]. Using the fact that H = Ran(P ) ⊕ Ran(1 − P )

and the induced decomposition of P ∈ M

and X ∈ H as

P =



0 0



, X =



(X)



, (2.3)

the projection Π

is given by

(X) =



0 (X)

(X)



Here the subscript “o” (resp. “v”) stand for occupied (resp. virtual). The ﬁrst-order optimality condition

at P

∗

is Π

∗

) = 0, which can be formulated as follows:

First-order optimality condition: P

∗

(1 − P

∗

) = (1 − P

∗

= 0. (2.4)

Note that this condition can be rewritten as [H

∗

, P

∗

] = 0, showing that H

∗

and P

∗

can be codiagonalized.

Let (φ

)

16k6N

be an orthonormal basis of eigenvectors of H

∗

associated with the eigenvalues (ε

)

16k6N

sorted in ascending order. Then P =

i∈I

∗

, where I ⊂ {1, . . . , N

}, |I| = N is the set of occupied

orbitals. The minimizer P

∗

is said to satisfy

•

the Aufbau principle if I = {1, . . . , N };

•

the strong Aufbau principle if I = {1, . . . , N} and if in addition ε

< ε

N+1

, in which case

∗

i=1

∗

HTML Viewer

Frequently Asked Questions (14)

Q1. What contributions have the authors mentioned in the paper "Convergence analysis of direct minimization and self-consistent iterations" ?

This article is concerned with the numerical solution of subspace optimization problems, consisting of minimizing a smooth functional over the set of orthogonal projectors of fixed rank. The authors compare from a numerical analysis perspective two simple representatives, the damped self-consistent field ( SCF ) iterations and the gradient descent algorithm, of the two classes of methods competing in the field: SCF and direct minimization methods. The authors derive asymptotic rates of convergence for these algorithms and analyze their dependence on the spectral gap and other properties of the problem. The authors also provide an example of chaotic behavior of the simple SCF iterations for a nonquadratic functional.

Q2. What are the future works mentioned in the paper "Convergence analysis of direct minimization and self-consistent iterations" ?

In particular, this is necessary to extend the convergence theory presented in this paper to infinite-dimensional settings.

Q3. How many iterations of the SCF method appear to be converging?

For instance for β = 1 at a = 10.26 Bohrs, the method appears to be converging for almost 20 iterations, up to a reduction in residual of a factor 10−8.

Q4. What is the common method of solving the linear eigenproblem?

Solving the linear eigenproblem is then done using iterative block eigensolvers, which can be understood as specialized direct minimization algorithms in the case of a linear energy functional E(P ) = Tr(H0P ).

Q5. What is the rate-limiting step in the Hartree-Fock model?

In quantum chemistry using Gaussian basis sets to solve the Hartree-Fock model or Kohn-Sham density functional theory using hybrid functionals, the rate-limiting step is often the computation of the Fock matrix H(P ).

Q6. Why are the matrices P and H not stored?

In condensed-matter physics using plane-wave basis sets to solve the Kohn-Sham density functional theory with local or semilocal functionals, the matrices P and H are not stored explicitly.

Q7. What is the smallest eigenvalue of the matrix h?

Since the second smallest eigenvalue of the matrix h is strictly lower than the third one, for α small enough, the unique minimizer P∗ of (4.8) is onM2 and satisfies the strong Aufbau principle, and both the gradient descent and SCF algorithm locally converge to P∗.

Q8. What is the main reason why direct minimization algorithms are rarely used in condensed-m?

Another interest of direct minimization algorithms is their robustness, as the choice of a stepsize can be made in order to minimize the energy, unlike the damped SCF algorithm where choosing an appropriate damping parameter is often done empirically.

Q9. What is the reason for the divergence effect?

The cause of this effect appears to be that the divergent modes break the natural inversion symmetry of the crystal in this particular case: the authors have checked that the divergence occurs much sooner if the authors break this symmetry by perturbing the positions of the atoms around their symmetric positions (at 9 iterations by perturbing the position of one atom by 10%).

Q10. How many iterations of the SCF method are there?

In the first case, with a = 10.26 Bohrs, the simple (undamped) SCF method appears to be converging for almost 20 iterations, but then diverges, until the density residual stabilizes at a positive value, as predicted in [14] for the Hartree-Fock model.

Q11. What is the spectral radius of the Jacobian matrix?

For the damped SCF algorithm with the ground state of the core Hamiltonian as starting point, surprisingly, the authors observe an asymptotic convergence rate slightly faster than that expected from the spectral radius of the Jacobian matrix 1− βJSCF.

Q12. How many iterations of the density residual (Pn) ?

The authors plot the convergence of the density residual ‖ρΦ(Pn) − ρPn‖2 as a function of the iterations for three values of a, with decreasing gaps.

Q13. How many iterations do the authors expect to achieve convergence for 42?

Thus the authors expect convergence for β < 4ε2, and therefore a critical εc of ≈ 0.158 for β = 10−1 and 0.0158 for β = 10−3, with a number of iterations proportional to 1ε−εc when ε > εc.

Q14. What is the problem with the damped SCF algorithm?

In practice, this issue is solved by preconditioning (see Remark 3.7).• for the damped SCF algorithm, a naive bound would beκ(JSCF) 6 ‖Ω∗‖op 1 + ν−1 ‖K∗‖opη .

Convergence analysis of direct minimization and self-consistent iterations

Summary (2 min read)

1. Introduction

2. Optimization on Grassmann manifolds

3. Algorithms and analysis of convergence

4. Numerical tests

5. Conclusion

Figures (10)

Citations

References

Related Papers (5)

Frequently Asked Questions (14)

Q1. What contributions have the authors mentioned in the paper "Convergence analysis of direct minimization and self-consistent iterations" ?

Q2. What are the future works mentioned in the paper "Convergence analysis of direct minimization and self-consistent iterations" ?

Q3. How many iterations of the SCF method appear to be converging?

Q4. What is the common method of solving the linear eigenproblem?

Q5. What is the rate-limiting step in the Hartree-Fock model?

Q6. Why are the matrices P and H not stored?

Q7. What is the smallest eigenvalue of the matrix h?

Q8. What is the main reason why direct minimization algorithms are rarely used in condensed-m?

Q9. What is the reason for the divergence effect?

Q10. How many iterations of the SCF method are there?

Q11. What is the spectral radius of the Jacobian matrix?

Q12. How many iterations of the density residual (Pn) ?

Q13. How many iterations do the authors expect to achieve convergence for 42?

Q14. What is the problem with the damped SCF algorithm?