↜ Back to index Introduction to Numerical Analysis 2

Part b–Lecture 3

Preconditioned conjugate gradient method

Unfortunately, the basic CG method often converges too slowly for real problems to be practical.

We saw this in Lecture 2, Exercise 3 when taking \gamma = 2, which is an important case when solving differential equations with the Laplacian operator. When taking larger \gamma, like \gamma = 3, the speed of convergence was much faster.

The reason is that the speed of convergence of the CG method for a system Ax = b depends on the condition number of the matrix A: the ratio between the largest eigenvalue of A and the smallest eigenvalue of A (see the Convegence theorem for the CG method on Wikipedia):

The larger the condition number, the slower the convergence speed (the more iterations the CG method needs).

Example 1. In is not difficult to check that the n\times n matrix A with entries A_{ij} := \begin{cases} \gamma, & i =j,\\ -1, & |i - j| = 1,\\ 0 & \text{otherwise}, \end{cases} has eigenvectors (固有ベクトル) of the form w^{(\ell)}_i = \sin(\tfrac{\pi\ell}{n + 1} i), \qquad i = 1, \ldots, n, with a corresponding eigenvalue (固有値) \lambda_\ell = \gamma - 2 \cos(\tfrac{\pi \ell}{n + 1}), for \ell = 1, \ldots, n.

The largest eigenvalue is \lambda_n = \gamma - 2 \cos(\tfrac{\pi n}{n + 1}) = \gamma + 2 \cos(\tfrac{\pi}{n + 1}), and the smallest one is \lambda_1 = \gamma - 2 \cos(\tfrac{\pi}{n + 1}). Their ratio, the condition number, is \kappa = \frac{\lambda_n}{\lambda_1} = \frac{\gamma + 2 \cos(\tfrac{\pi}{n + 1})}{\gamma - 2 \cos(\tfrac{\pi}{n + 1})}.

If n = 1000, we see that the condition number for \gamma = 2 is \kappa \approx 4\times 10^5, while for \gamma = 3 it is only \kappa \approx 5.

In practice, a preconditioned conjugate gradient method (前処理共役勾配法) is usually used to massively speed up the convergence. We modify the linear system so that its matrix has smaller condition number, but its solution is the same as the original system.

Here is the preconditioned conjugate gradient algorithm:

Note the new n dimensional vector z and the change in computing \rho and p. The n\times n matrix M is a preconditioner matrix (前処理行列).

It has to be of the form M = BB^T for some invertible matrix B, and we need to be able to compute M^{-1} r efficiently. Here B^T is the transpose (転置行列) of B.

The trick is to choose a suitable matrix B so that the matrix \hat A := B^{-1} A (B^{-1})^T has a smaller condition number than A. The preconditioned CG algorithm is basically the CG algorithm applied to the system \hat A \hat x = B^{-1} b, whose solution \hat x satisfies \hat x = B^T x, where x the solution of the original system Ax = b.

Finding a suitable M is usually rather difficult and depends on A and there is a lot of research still happening in this field.

SSOR preconditioner

One possible simple preconditioner matrix is provided by the SSOR method. If we write A = D + L + L^T, where D is a diagonal matrix and L is lower triangular with zeros on the main diagonal, M is given as M = \tfrac{\omega}{2 - \omega}(\tfrac 1\omega D + L) D^{-1} (\tfrac 1 \omega D + L)^T, where \omega \in (0, 2) is a carefully chosen constant.

Since M is the product of a lower triangular, diagonal and upper triangular matrices, z = M^{-1} r can be computed very quickly using backsubstitution as in Lecture a4. Indeed, solving M z = r is equivalent to solving \tfrac{\omega}{2 - \omega}(\tfrac 1\omega D + L) y = r for y \in {{\mathbb{R}^{n}}} and then solving D^{-1} (\tfrac 1 \omega D + L)^T z = y for z \in {{\mathbb{R}^{n}}}.

Let us consider the first system. Rearranging terms, it is equivalent to y = D^{-1}\big((2- \omega) r - \omega L y\big) Since L_{ij} = 0 for any j \geq i, we can find y_i in a simple loop for i = 1, \ldots, n since (Ly)_i contains only y_1, \ldots, y_{i - 1}.

Similarly, the second system can be written as z = \omega y - \omega D^{-1} L^T z. Now (L^T)_{ij} = 0 for j \leq i, we can find z_i in a loop with i going down from n to 1 since (L^T z)_i contains only z_{i+1}, \ldots, z_n.

Fortran implementation

Here is one possible implementation of the preconditioned CG method with the SSOR preconditioner in Fortran. The lines with important changes form the previous CG method are highlighted by 👈.

The matrix A is as before A_{ij} := \begin{cases} \gamma, & i =j,\\ -1, & |i - j| = 1,\\ 0 & \text{otherwise}, \end{cases}

implicit none
real, parameter :: eps = 1e-6
real, dimension(:, :), allocatable :: a
real, dimension(:), allocatable :: b, x, r, p, z, Ap
real rho, rho_old, alpha, gam, om
integer k, i, n

read *, n, gam, om

allocate(a(n, n))
allocate(b(n))
allocate(x(n))
allocate(r(n))
allocate(p(n))
allocate(z(n))
allocate(Ap(n))

a = 0
do i = 1, n
    a(i, i) = gam
enddo
do i = 1, n-1
    a(i, i+1) = -1
    a(i+1, i) = -1
enddo

b = 1.

x = 0
r = b
z = precond(a, r, om)                   !  👈
rho = inner(r, z)                       !  👈
p = z                                   !  👈

do k = 1, n
    Ap = mv_mul(a, p)
    alpha = rho  / inner(Ap, p)
    x = x + alpha * p
    r = r - alpha * Ap
    z = precond(a, r, om)               !  👈
    rho_old = rho
    rho = inner(r, z)                   !  👈
    print *, k, sqrt(inner(r, r))
    if (sqrt(inner(r, r)) < eps) exit
    p = z + (rho / rho_old) * p         !  👈
end do

contains 
    ! SSOR preconditioner: computes z = M⁻¹r where M is the SSOR matrix
    ! for matrix A with parameter ω
    function precond(a, r, om) result(z)
        real, dimension(:, :), allocatable :: a
        real, dimension(:), allocatable :: r, z
        real om
        integer i
        allocate(z(size(r)))
        do i = 1, size(r)
             z(i) = ((2 - om) * r(i) - om * inner(a(i, :i - 1), z(:i - 1))) / a(i, i)
        enddo
        do i = size(r), 1,-1
             z(i) = om * z(i) - om * inner(a(i, i + 1:), z(i + 1:)) / a(i, i)
        enddo
    end

    function inner(x, y)
        real x(:), y(:), inner
        integer i
        if (size(x) /= size(y)) then
            stop 'arrays x and y must have the same size'
        end if

        inner = 0
        do i = 1, size(x)
            inner = inner + x(i) * y(i)
        end do  
    end

    function mv_mul(a, x)
        real a(:,:), x(:), mv_mul(size(a, 1))
        integer i, j

        if (size(a, 2) /= size(x)) then
            stop 'the number of colums of a and size x are different'
        end if

        do i = 1, size(a, 1)
            mv_mul(i) = inner(a(i, :), x)
        end do

    end
end

Performance

Using a good preconditioner can lead to a greatly improved performance of the CG method.

Here is an example with n = 1000 and \gamma = 2. The table gives the number of iterations required for convergence with {\varepsilon}= 10^{-6} on my computer, and the actual execution time.

Method No. of iterations Time
CG 500 1.8s
Prec. CG \omega = 1.9 59 0.6s
Prec. CG \omega = 1.9999 7 0.2s

And here is an example with n = 10000 and \gamma = 3.

Method No. of iterations Time
CG 16 11s
Prec. CG \omega = 1.2 5 7.5s
Prec. CG \omega = 1.9 14 20s

We need to choose \omega carefully for the SSOR preconditioner to be useful.

Even if the required number of iterations decreases, the execution time might be worse for the preconditioned CG method. The reason is that applying the preconditioner is an additional work, and the amount of work required to run the SSOR preconditioner (function precond) is comparable to the amount of work required to multiply a vector by the matrix A. So it makes each iteration about twice as slow.

The choice of preconditioner is an important problem in applications.

Exercise 1. Run the above program with n = 1000, \gamma = 2, and with each of \omega \in {\left\{1, 1.1, 1.2, \ldots, 1.9\right\}}, and find the number of iterations required to converge for each of the above \omega. Plot the graph of the number of iterations as a function of \omega. Submit the image file to LMS with your student ID as your title.