A PARALLELIZABLE PRECONDITIONER FOR THE ITERATIVE SOLUTION OF IMPLICIT RUNGE-KUTTA TYPE METHODS LAURENT O. JAY  AND THIERRY BRACONNIER y Abstract. The main diculty in the implementation of most standard implicit Runge-Kutta (IRK) methods applied to (sti ) ordinary di erential equations (ODE's) is to eciently solve the nonlinear system of equations. In this article we propose the use of a preconditioner whose decompo- sition cost for a parallel implementation is equivalent to the cost for the implicit Euler method. The preconditioner is based on the W-transformation of the RK coecients matrices discovered by Hairer and Wanner. For sti ODE's the preconditioner is by construction asymptotically exact for methods with an invertible RK coecients matrix. The methodology is particularly useful when applied to super partitioned additive Runge-Kutta (SPARK) methods. The nonlinear system can be solved by inexact Newton iterations: at each simpli ed Newton step the linear system can be approximately solved by an iterative method applied to the preconditioned linear system. Key words. Iterative methods, linear system, Newton iterations, nonlinear system, parallel implementation, preconditioning, Runge-Kutta methods, sti ness, W-transformation AMS subject classi cations. 34A65, 65F10, 65L05, 65L06, 65Y05 1. Introduction. This article is concerned with the implementation of implicit Runge-Kutta (IRK) methods such as those based on Gauss, Radau, and Lobatto points [20] applied to (sti ) ordinary di erential equations (ODE's). The main diculty is to eciently solve the nonlinear system of equations. For an s-stage IRK method and a di erential system of dimension n, the nonlinear system is of size s n and it is usually solved by simpli ed Newton iterations. A direct decomposition of the s  n  s  n simpli ed Jacobian matrix is generally inecient when s  2. The diagonalization of the RK coecients matrix can drastically reduce the number of operations and it also allows for parallelism. Nevertheless, the presence in general of pairs of complex eigenvalues in the RK coecients matrix for most standard IRK methods not only impairs their parallelism, but also signi cantly increases the decomposition cost of the Jacobian. Moreover, if several distinct IRK methods are used in a partitioned and/or additive way, this diagonalization procedure cannot be applied since the di erent RK matrices generally possess distinct eigenvectors. In this article we propose the use of a preconditioner requiring s independent decompositions of submatrices of dimension n, i.e., whose decomposition cost for a parallel implementation is equivalent to the cost for the implicit Euler method. The preconditioner is based on the W-transformation of the RK coecients matrices discovered by Hairer and Wanner [20]. For sti ODE's the preconditioner is by construction asymptotically exact for methods with an in- vertible RK coecients matrix. The methodology is particularly useful when applied to super partitioned additive Runge-Kutta (SPARK) methods [22]. The nonlinear system can be solved by inexact Newton iterations: at each simpli ed Newton step the linear system can be approximately solved by an iterative method applied to the preconditioned linear system.  Institute for Mathematics and its Applications, University of Minnesota, 514 Vincent Hall, 206 Church Street S.E., Minneapolis, MN 55455, USA (na.ljay@na-net.ornl.gov). The work of this author was supported in part by a research grant of the Holderbank Foundation, Switzerland (Holderbank-Stiftung zur Forderung der wissenschaftlichen Fortbildung). y Department of Computer Science and Engineering, University of Minnesota, 4-192 EE/CS Bldg., 200 Union Street S.E., Minneapolis, MN 55455-0159, USA (braconni@cs.umn.edu). 1 2 L. O. Jay and T. Braconnier In section 2 we give the de nition of IRK methods, we discuss the approximate Jacobian matrix used in the simpli ed Newton iterations, and we succinctly describe the W-transformation. In section 3 we present the preconditioner used for the solution of the linear systems occuring in the simpli ed Newton iterations. The preconditioner is analyzed on the scalar linear test equation y 0 = y in section 4. In section 5 we show how to extend the preconditioner from IRK to SPARK methods. In section 6 we present some numerical results illustrating the behaviour of the considered precon- ditioner using di erent iterative methods. 2. IRK methods, approximate Jacobian, and W-transformation. We consider the system of (sti ) ODE's y 0 = f(t; y)(1) where y = (y 1 ; : : : ; y n ) T 2 R n . The de nition of IRK methods is as follows. Definition 2.1. One step of an s-stage implicit Runge-Kutta (IRK) method applied to the system (1) with initial values y 0 at t 0 and stepsize h reads Y i y 0 h s X j=1 a ij f(t 0 + c j h; Y j ) = 0 for i = 1; : : : ; s;(2) y 1 = y 0 + h s X i=1 b i f(t 0 + c i h; Y i ): The equations (2) de ne a nonlinear system of dimension s  n to be solved for the s internal stages Y i . The numerical approximation at t 0 + h is then given by y 1 . The RK coecients are usually expressed using a Butcher-tableau notation c 1 a 11    a 1s . . . . . . . . . . . . c s a s1    a ss b 1    b s where b = (b 1 ; : : : ; b s ) T is the weight vector, c = (c 1 ; : : : ; c s ) T is the node vector, and A = (a ij ) ij=1;:::;s is the RK coecients matrix. A detailed presentation of the construction of IRK methods can be found in [20, Section IV.5]. We will assume s  2 for the remainder of the article. The nonlinear system can be solved by simpli ed Newton iterations with approximate Jacobian matrix I s I n hA J with J := @f @y (t 0 ; y 0 )(3) where is the tensor product and I m denotes the identity matrix in R m . Each iteration requires the solution of an s  n-dimensional linear system with matrix (3) whose direct decomposition is generally inecient for s  2 as it can be drastically improved by exploiting its special structure. By diagonalizing the RK coecients matrix A S 1 AS =  = diag( 1 ; : : : ;  s ) A preconditioner for the solution of IRK methods 3 the linear system can be tranformed into a decoupled linear system with block-diagonal matrix S 1 I n (I s I n hA J)S I n = I s I n h J = 0 B @ I n  1 hJ O . . . O I n  s hJ 1 C A :(4) Nevertheless, the presence in general of pairs of complex eigenvalues in the RK co- ecients matrix for most standard IRK methods, not only impairs the parallelism, but also signi cantly increases the decomposition cost of (4) [20, Section IV.8]. This cost for such methods should be ideally equivalent to s independent decompositions of submatrices of dimension n, as for diagonally implicit Runge-Kutta (DIRK) methods [1, 10, 23] and multi-implicit Runge-Kutta (MIRK) methods [2, 5, 24] for which the eigenvalues are real. Various iterations schemes have been proposed, some of them requiring the decomposition of only one submatrix of dimension n [8, 9, 16, 17] or of s submatrices of dimension n [21, 27]. These methods do not usually iterate at the linear algebra level and they can be considered as modi ed Newton iterations. Unfor- tunately, none of these methods is asymptotically exact for sti systems, whereas the method presented in this article gives by construction an asymptotically exact result in this situation. In this article we propose a di erent approach aimed at reducing the amount of computations. Instead of solving exactly a linear system at each simpli ed Newton step, we apply an iterative method to a corresponding preconditioned linear system. The use of iterative methods for the numerical solution of sti ODE's were already considered in [3, 7, 12], with an emphasis on preconditioning in [4]. Such inexact Newton methods are generally considered to be among the most ecient ways to solve nonlinear system of equations [11, 25]. We construct a preconditioner requiring s independent decompositions of matrices of dimension n, i.e., whose decomposition cost for a parallel implementation is equivalent to the cost for the implicit Euler method. The preconditioner is based on the W-transformation of the RK coecients matrices discovered by Hairer and Wanner [18, 19]. This transformation is given by X := W T BAW(5) where B = diag(b 1 ; : : : ; b s ) and the coecients of the matrix W are given by w ij = P j1 (c i ) with P k (x) being the k-th shifted Legendre polynomial P k (x) = p 2k + 1 k!  d k dx k  x k (x 1) k  = p 2k + 1 k X j=0 (1) j+k k j ! j + k j ! x j : For recent references and more details about the W-transformation we refer the reader to [20, Section IV.5] and [6, 13]. In the remainder of the article we will assume that X := W T BAW is tridiagonal and D :=W T BW is diagonal and nonsingular; two conditions which are satis ed for most IRK methods of interest, such as Gauss, Radau IA & IIA, Lobatto IIIA & IIIB & IIIC & IIIC  schemes [6, 13, 20, 22]. For 4 L. O. Jay and T. Braconnier these IRK methods the transformed matrix X and the matrix D read X = 0 B B B B B B B @ 1=2  1 O  1 0 . . . . . . . . .  s2  s2 0 s1;s O s;s1 ss 1 C C C C C C C A ; D = diag(1; 1; : : : ; 1; d s );(6) where  k = 1=  2 p 4k 2 1  and the missing coecients s;s1 ; s1;s ; ss ; d s are given in Table 1. Note that the inverse of W is simply given by W 1 = D 1 W T B. We will actually assume the speci c forms (6) in the remainder of the article. Table 1 Missing coecients of the transformed matrix X for some IRK methods,  = 2s1 s1 . IRK method s;s1 s1;s ss d s Gauss  s1  s1 0 1 Radau IA  s1  s1 1 4s1 1 Radau IIA  s1  s1 1 4s1 1 Lobatto IIIA  s1  0 0  Lobatto IIIB 0  s1  0  Lobatto IIIC  s1   s1   2s2  Lobatto IIIC   s1   s1   2s2  3. Preconditioning the linear system. Using the W-transformation (5) in (3), at each simpli ed Newton step we should solve a linear system Kx = b(7) for a block-tridiagonal matrix K = D I n hX J = 0 B B B B B B @ E 1 F 1 O G 1 E 2 F 2 . . . . . . . . . G s2 E s1 F s1 O G s1 E s 1 C C C C C C A (8) with n n blocks given as follows E 1 = I n 1 2 hJ; E i = I n for i = 2; : : : ; s 1; E s = d s I n ss hJ;(9a) F i =  i hJ for i = 1; : : : ; s 2; F s1 = s1;s hJ;(9b) G i =  i hJ for i = 1; : : : ; s 2; G s1 = s;s1 hJ:(9c) A way to solve (7) would be to use the block-LU decomposition [14, 15] of (8) K = 0 B B B B B B @ I n O G 1 H 1 1 I n . . . . . . G s2 H 1 s2 I n O G s1 H 1 s1 I n 1 C C C C C C A 0 B B B B B B @ H 1 F 1 O H 2 F 2 . . . . . . H s1 F s1 O H s 1 C C C C C C A A preconditioner for the solution of IRK methods 5 where the blocks H i are recursively given by H 1 = E 1 ; H i = E i G i1 H 1 i1 F i1 for i = 2; : : : ; s(10) and are assumed to be nonsingular. Subdividing the solution vector x, the right-hand side b of (7), and an intermediate vector y into s n-dimensional subvectors x = 0 B B B B B B @ x 1 x 2 . . . x s1 x s 1 C C C C C C A ; b = 0 B B B B B B @ b 1 b 2 . . . b s1 b s 1 C C C C C C A ; y = 0 B B B B B B @ y 1 y 2 . . . y s1 y s 1 C C C C C C A ; x i ; b i ; y i 2 R n for i = 1; : : : ; s; the linear system (7) can be solved using block forward and backward substitutions: y 1 = b 1 ; y i = b i G i1 H 1 i1 y i1 for i = 2; : : : ; s; x s = H 1 s y s ; x i = H 1 i (y i F i x i+1 ) for i = s 1; : : : ; 1: From (9) and (10) the blocks H i are given in our situation by H 1 = I n 1 2 hJ; H i = I n +  2 i1 h 2 JH 1 i1 J for i = 2; : : : ; s 1; H s = d s I n ss hJ s;s1 s1;s h 2 JH 1 s1 J: Since each block H i for i  2 depends on H 1 i1 the above recursion is not parallelizable. Moreover, we should also assume that all blocks H i are nonsingular, a condition which can actually be violated even if I n hJ is supposed to be invertible for all h  0. In terms of computational cost at each step i for i  2 we should compute JH 1 i1 J . For example if the LU decomposition ofH i1 would be performed, this would require 7n 3 =3 operations. Thus, the total block-LU decomposition of K would require (7s 6)n 3 =3 operations. This is clearly inecient as it would still be a factor from 3:5 to 7 more costly than if the block-diagonal-LU decomposition of (4) would be used (3:5 at best if all the eigenvalues of the RK coecients matrix consist only of conjugate complex pairs, 7 at worst if all those eigenvalues are real). We now present the main idea of this article. Instead of solving (7) directly, we apply an iterative method for example to the left-preconditioned linear system P 1 Kx = P 1 b:(11) We choose the preconditioner P to be given by the approximate block-LU decompo- sition of K based on independent approximations e H i of H i , i.e., we set P := 0 B B B B B B @ I n O G 1 e H 1 1 I n . . . . . . G s2 e H 1 s2 I n O G s1 e H 1 s1 I n 1 C C C C C C A 0 B B B B B B B @ e H 1 F 1 O e H 2 F 2 . . . . . . e H s1 F s1 O e H s 1 C C C C C C C A (12) with e H i := I n i hJ for i = 1; : : : ; s 1; e H s := d s I n s hJ;(13) 6 L. O. Jay and T. Braconnier where 1 = 1 2 ; i =  2 i1 i1 for i = 2; : : : ; s 1; s = ss s;s1 s1;s s1 :(14) Each e H i can be formed and decomposed independently, making these operations fully parallelizable. The coecients i have been chosen so that e H 1 i H i  I n when (I n hJ) 1 (hJ)  I n for all h  h 0 > 0. Hence, if the RK coecients matrix is invertible, i.e., if s 6= 0, the preconditioner is asymptotically exact for sti systems. Note that for the Lobatto IIIA, IIIB, and IIIC  coecients we have s = 0. Since the Lobatto IIIC  methods behave like explicit RK methods, they should not be considered to treat sti terms. For the Lobatto IIIA and IIIB methods the last block is e H s = d s I n and needs not be decomposed. We can interpret the above preconditioner P by obtaining an explicit expression from the above formulas and we get P = 0 B B B B B B B @ e E 1 F 1 O G 1 e E 2 F 2 . . . . . . . . . G s2 e E s1 F s1 O G s1 e E s 1 C C C C C C C A where the blocks e E i are recursively given by e E 1 = e H 1 = H 1 = E 1 ; e E i = e H i +G i1 e H 1 i1 F i1 for i = 2; : : : ; s: We note that the above preconditioner is consistent in the sense that for h = 0 we have P = K = I s I n . It is also asymptotically exact for y 0 = y when jhj ! 1 if s 6= 0 (see also next section). We would like to stress the point that this preconditioner cannot be interpreted as being of the form I s I n h e A J for a modi ed coecients matrix e A as considered in [21, 27]. Note that the linear systems involving the matrices e H i may also be approximately solved by iterative tech- niques, especially if a good preconditioner to I n hJ is available. 4. Linear analysis of the preconditioner. We consider the scalar linear test equation y 0 = y and we denote z := h. The preconditioner presented in the previous section is by construction exact for z = 0 and asymptotically exact for large jzj if s 6= 0, i.e., P 1 (z)K(z)! I s when jzj ! 1, with P (z) and K(z) given below. Here, we consider intermediate values Re(z)  0. The matrix K(z) (8) is given by K(z) = 0 B B B B B B B @ 1 z=2  1 z O  1 z 1 . . . . . . . . .  s2 z  s2 z 1 s1;s z O s;s1 z d s ss z 1 C C C C C C C A : A preconditioner for the solution of IRK methods 7 The left-preconditioned linear system (11) reads P 1 (z)K(z)x = P 1 (z)b where P (z) = 0 B B B B B B @ 1 O g 1 (z) e h 1 1 (z) 1 . . . . . . g s2 (z) e h 1 s2 (z) 1 O g s1 (z) e h 1 s1 (z) 1 1 C C C C C C A  0 B B B B B B B @ e h 1 (z) f 1 (z) O e h 2 (z) f 2 (z) . . . . . . e h s1 (z) f s1 (z) O e h s (z) 1 C C C C C C C A = 0 B B B B B B B @ e 1 (z) f 1 (z) O g 1 (z) e 2 (z) . . . . . . . . . f s2 (z) g s2 (z) e s1 (z) f s1 (z) O g s1 (z) e s (z) 1 C C C C C C C A with function coecients given by f i (z) =  i z for i = 1; : : : ; s 2; f s1 (z) = s1;s z; g i (z) =  i z for i = 1; : : : ; s 2; g s1 (z) = s;s1 z; e h i (z) = 1 i z for i = 1; : : : ; s 1; e h s (z) = d s s z; e 1 = e h 1 (z); e i (z) = e h i (z) + g i1 (z) e h 1 i1 (z)f i1 (z) for i = 2; : : : ; s: A good measure for the quality of the preconditioner is given by (P 1 (z)K(z)) where  denotes the condition number of a matrix. The closer to one is this quantity, the better is the preconditioner. For the 2-stage Lobatto IIIA and IIIB methods we trivially have (P 1 (z)K(z)) = 1 since the preconditioner is exact for those two methods, i.e., P = K for any J . For the 2-stage Lobatto IIIC method we have P 1 (z)K(z) = 1 p 3z 2 (z1)(z2) 2 0 1 + z (z1)(z2) ! : Hence, we get  1 (P 1 (z)K(z)) = max 1 + p 3jzj 2 jz 1j  jz 2j 2 ; 1 + z (z 1)(z 2) !  max 1 + p 3jzj 2 jz 2j  jz 2 2z + 2j ; jz 1j  jz 2j jz 2 2z + 2j ! : For jzj ! 1 we thus have  1 (P 1 (z)K(z)) = 1 + 2 p 3=jzj + O(1=jzj 2 ). In Fig. 1 we have plotted this condition number for purely negative values z = r and purely imaginary values z = ir with r = jzj. In Fig. 2 we give a similar plot for the 5-stage Lobatto IIIB method. 8 L. O. Jay and T. Braconnier 0 10 20 30 40 50 60 70 80 90 100 1 1.5 2 r=|z| c o n di tio n nu m be r z=−r z=ir Fig. 1. Condition number  1 (P 1 (z)K(z)) for the 2-stage Lobatto IIIC method. 5. A preconditioner for SPARK methods. The methodology described in this article to solve the nonlinear system of equations for IRK methods is particularly useful when considering SPARK methods such as the Lobatto IIIA-B-C-C  methods [22]. In this section we consider the following system of (sti ) ODE's y 0 = f(t; y) = M X m=1 f m (t; y)(15) where y = (y 1 ; : : : ; y n ) T 2 R n . Such a decomposition P M m=1 f m (t; y) may come from a splitting and/or a partitioning of f(t; y) into di erent terms. The functions f m (t; y) are supposed to have distinct properties and may therefore be numerically treated di erently. Several motivations were given in [22] to introduce a more general class of methods than IRK methods. The de nition of SPARK methods applied to (15) is as follows. A preconditioner for the solution of IRK methods 9 0 50 100 150 200 250 300 350 400 450 500 1 1.5 2 2.5 3 3.5 r=|z| c o n di tio n nu m be r z=−r z=ir Fig. 2. Condition number  1 (P 1 (z)K(z)) for the 5-stage Lobatto IIIB method. Definition 5.1. One step of an s-stage super partitioned and additive Runge- Kutta (SPARK) method, based on the same underlying quadrature formula (b i ; c i ) s i=1 , applied to the system (15) with initial value y 0 at t 0 and stepsize h reads Y i y 0 h M X m=1 s X j=1 a (m) ij f m (t 0 + c j h; Y j ) = 0 for i = 1; : : : ; s;(16) y 1 = y 0 + h s X i=1 b i f(t 0 + c i h; Y i ): The equations (16) de ne a nonlinear system of dimension s  n to be solved for the s internal stages Y i . The numerical approximation at t 0 + h is then given by y 1 . This system can be solved by simpli ed Newton iterations with approximate Jacobian 10 L. O. Jay and T. Braconnier matrix I s I n h L X m=1 A (m) J m with J m := @f m @y (t 0 ; y 0 ) for m = 1; : : : ; L(17) where L  M and the sti terms are treated with the L rst RK methods. Since in general the RK coecients matrices A (m) possess distinct eigenvectors the diagonal- ization procedure cannot be applied. However, using the same W-transformation for all RK coecients matrices we may assume that X (m) :=W T BA (m) W = 0 B B B B B B B B @ 1=2  1 O  1 0 . . . . . . . . .  s2  s2 0 (m) s1;s O (m) s;s1 (m) ss 1 C C C C C C C C A for possibly distinct (m) s;s1 ; (m) s1;s ; (m) ss , an assumption which is satis ed for the Lo- batto IIIA, IIIB, and IIIC coecients (L = 3) of the Lobatto IIIA-B-C-C  methods (M = 4). We thus obtain a block-tridiagonal matrix (8) with n  n blocks given by (9) for J := P L m=1 J m except for the following blocks E s = d s I n L X m=1 (m) ss hJ m ; F s1 = L X m=1 (m) s1;s hJ m ; G s1 = L X m=1 (m) s;s1 hJ m : As described in sections 3 for IRK methods we can solve (7) using an iterative method on the left-preconditioned linear system (11) with the preconditioner P given by (12). The blocks e H i can be chosen as in (13) except for the last block e H s := d s I n L X m=1 (m) s hJ m with (m) s = (m) ss (m) s;s1 (m) s1;s s1 : As mentioned before, note that we have (m) s = 0 for the Lobatto IIIA and IIIB coecients. 6. Numerical results. The linear system (7) can be solved by an iterative method applied to the left-preconditioned system (11) using the preconditioner de- scribed in section 3. Starting from x 0 := 0, the simplest iterative method is given by the Richardson iterations (PRI) x k+1 := x k + (P 1 b P 1 Kx k ) = (I P 1 K)x k + P 1 b for k = 0; 1; 2; : : :(18) If (I P 1 K) < 1 where  denotes the spectral radius of a matrix then the method converges linearly, otherwise the method diverges [15]. Another possibility is to use iterative Krylov-type methods such as the GMRES method [26]. Note that for such methods, convergence is ensured and the convergence behaviour greatly depends on the spectral distribution of the matrix K or P 1 K depending on whether the pre- conditioner is applied or not. In this section we illustrate the good quality of the A preconditioner for the solution of IRK methods 11 preconditioner. All experiments in this section are done for Lobatto IIIC methods. The matrix J is set as follows J := 0 B B B B B @ 1 : : : 1 2 . . . . . . . . . 1 O n 1 C C C C C A ; where the parameter allows to tune the size of the eigenvalues of J , hence to increase sti ness. Note that the eigenvalues of the matrix I n hJ are all comprised in the interval [ min ;  max ] = [1 + h ; 1 +nh ]. All computations have been done on an SGI IRIX 5.3 workstation. For a block size n = 25 and s = 4 blocks, Table 2 shows some results for h = 10 2 and increasing values of . When increases, the eigenvalues of I n hJ become larger and far apart from 1. Hence, the approximate submatrices e H i become closer to the exact submatrices H i , and therefore P 1 becomes a better approximation to K 1 . This is rst illustrated in the columns labelled  2 (K) and  2 (P 1 K) showing that the condition number of the preconditioned matrix P 1 K becomes closer to 1 as the parameter increases, whereas the condition number of the original matrix K remains large. In the column labelled kK 1 P 1 k 2 we see that P 1 tends to K 1 as increases. In the last column labelled k (PRI) we give the number k of PRI iterations (18) to solve the system (7) for x = (1; : : : ; 1) T , the right-hand side being given by b = Kx. The error tolerance is set to 100    kbk where  is the machine precision. The error is measured by k e x xk 1 where e x is the computed solution. We observe that the number k of PRI iterations decreases as the sti ness parameter increases. Table 2 Some measures of the quality of the preconditioner.  2 (K)  2 (P 1 K) kK 1 P 1 k 2 k (PRI) 10 1 5:40 3:40 9:5  10 1 83 10 2 28:90 10:62 9:8  10 1 77 10 3 156:80 9:53 6:1  10 1 44 10 4 191:35 2:49 3:0  10 2 13 10 6 194:38 1:59 1:6  10 4 5 10 8 194:41 1:58 1:6  10 6 3 To illustrate the improvement that the use of the preconditioner can provide, we have run the non-preconditioned GMRES(m) method and the preconditioned PGMRES(m) method wherem is the size of the Krylov subspace for which the method is restarted. As before the exact solution is chosen to be x = (1; : : : ; 1) T . We have used the same type of matrix J of size n = 200, with s = 10 and parameters h = 10 4 and = 10 3 . As shown in Table 3 for this example, using the GMRES(5) method with the preconditioner P 1 roughly divides the total running time T tot by a third and takes about ten times less iterations (see column labelled k) than the unprecon- ditioned method. In Table 3 T dec corresponds to the time in seconds to compute the 12 L. O. Jay and T. Braconnier decomposition of the preconditioner P , T sol corresponds to the time in seconds for the resolution by the (P)GMRES(5) methods, and T tot is the total computational time. Table 3 GMRES versus PGMRES. T dec T sol T tot k k e x xk 1 GMRES(5) 147:77 147:77 325 2:1  10 13 PGMRES(5) 15:13 37:01 52:14 30 3:9  10 14 Finally, we have applied the direct block-LU method, PRI, and PGMRES(2) with the matrix J of size n = 500, with s = 4 and parameters h = 10 4 and = 10 9 , in order to compare the eciency of the preconditioned iterative methods toward the direct block-LU method. Some results are shown in Table 4. T dec corresponds to Table 4 Comparison between block-LU, PRI, and PGMRES. T dec T sol T tot k k e x xk 1 block-LU 1391:64 3:92 1395:56 2:9  10 14 PRI 156:00 11:96 167:96 3 2:3  10 13 PGMRES(2) 156:00 17:32 173:32 4 2:9  10 13 the time in seconds of the factorization for the direct block-LU method whereas for PRI and PGMRES(2) this corresponds to the time to decompose the preconditioner P . T sol corresponds to the time in seconds for the resolution by the three di erent methods. The preconditioner is good since the eigenvalues of I n hJ are large. Thus only a few iterations (see column labelled k) are needed by the two preconditioned iterative methods to solve the linear system. We can see that for the same level of accuracy, the preconditioned iterative methods take much less time than the direct method (see column labelled T tot ), this is simply because the computational e ort needed to decompose P is much smaller than for the block-LU factorization of K. Obviously, the block-LU decomposition of the block-tridiagonal matrix K is not the optimal way to solve the system, since by diagonalizing the RK coecients matrix we could improve the cost of the decomposition by a factor close to 4. Nevertheless, this is an interesting measure in our context since for the implementation of SPARK methods this diagonalization procedure cannot be applied. It is interesting to note that the simple PRI method is as ecient as the PGMRES(2) method. Since the direct block-LU method provides a residual kK e x bk close to C    kbk for a constant C, for comparison reasons we have set the stopping criterion for PRI and PGMRES(2) to a similar level. For the numerical solution of (sti ) ODE's such an accuracy is not needed, because the stopping criterion within the simpli ed Newton iterations can be relaxed and be based on the preconditioned residual error kP 1 (K e x b)k [4]. Moreover, nonsti components need not be solved very accurately since the Newton iterations on top of the iterative linear solver make these components to converge suciently rapidly. A preconditioner for the solution of IRK methods 13 REFERENCES [1] R. Alexander, Diagonally implicit Runge-Kutta methods for sti ODE's, SIAM J. Numer. Anal., 14 (1977), pp. 1006{1021. [2] C. Bendtsen, Highly stable parallel Runge-Kutta methods, Appl. Numer. Math., 21 (1996), pp. 1{8. [3] P. N. Brown and A. C. Hindmarsh, Matrix-free methods for sti systems of ODE's, SIAM J. Numer. Anal., 23 (1986), pp. 610{638. [4] P. N. Brown, A. C. Hindmarsh, and L. R. Petzold, Using Krylov methods in the solution of large-scale di erential-algebraic systems, SIAM J. Sci. Comput., 15 (1994), pp. 1467{1488. [5] K. Burrage, A special family of Runge-Kutta methods for solving sti di erential equations, BIT, 18 (1978), pp. 22{41. [6] R. P. K. Chan, On symmetric Runge-Kutta methods of high order, Computing, 45 (1990), pp. 301{309. [7] T. F. Chan and K. R. Jackson, The use of iterative linear equation solvers in codes for large systems of sti IVPs for ODEs, SIAM J. Sci. Stat. Comput., 7 (1986), pp. 378{417. [8] G. J. Cooper and J. C. Butcher, An iteration scheme for implicit Runge-Kutta methods, IMA J. Numer. Anal., 3 (1983), pp. 127{140. [9] G. J. Cooper and R. Vignesvaran, A scheme for the implementation of implicit Runge-Kutta methods, Computing, 45 (1990), pp. 321{332. [10] M. Crouzeix, Sur l'approximation des equations di erentielles operationnelles lineaires par des methodes de Runge-Kutta, PhD thesis, Univ. Paris VI, France, 1975. [11] R. S. Dembo, S. C. Eisenstat, and T. Steihaug, Inexact Newton methods, SIAM J. Numer. Anal., 19 (1982), pp. 400{408. [12] C. W. Gear and Y. Saad, Iterative solution of linear equations in ODE codes, SIAM J. Sci. Stat. Comput., 4 (1983), pp. 583{601. [13] S. Geng, Construction of high order symplectic Runge-Kutta methods, J. Comput. Math., 11 (1993), pp. 250{260. [14] J. A. George, On block elimination for sparse linear systems, SIAM J. Numer. Anal., 11 (1974), pp. 585{603. [15] G. Golub and C. F. van Loan, Matrix computations, Johns Hopkins Studies in the Math. Sci., The Johns Hopkins Univ. Press, Baltimore and London, Third Edition, 1996. [16] S. Gonz  alez-Pinto, C. Gonz  alez-Concepci  on, and J. I. Montijano, Iterative schemes for Gauss methods, Computers Math. Applic., 27 (1994), pp. 67{81. [17] S. Gonz  alez-Pinto, J. I. Montijano, and L. R  andez, Iterative schemes for three-stage implicit Runge-Kutta methods, Appl. Numer. Math., 17 (1995), pp. 363{382. [18] E. Hairer and G. Wanner, Algebraically stable and implementable Runge-Kutta methods of high order, SIAM J. Numer. Anal., 18 (1981), pp. 1098{1108. [19] , Characterization of non-linearly stable implicit Runge-Kutta methods, in Numerical inte- gration of di erential equations, vol. 968 of Lect. Notes in Math., 1982, pp. 207{219. [20] , Solving ordinary di erential equations II. Sti and di erential-algebraic problems, vol. 14 of Comput. Math., Springer, Berlin, Second Revised Edition, 1996. [21] W. Hoffmann and J. J. B. de Swart, Approximating Runge-Kutta matrices by triangular matrices, BIT, 37 (1997), pp. 346{354. [22] L. O. Jay, Structure-preservation for constrained dynamics with super partitioned additive Runge-Kutta methods, SIAM J. Sci. Comput., (1995). To appear. [23] S. P. Nrsett, Semi-explicit Runge-Kutta methods, Tech. Report 6/74, Dept. of Math., Univ. of Trondheim, Norway, 1974. [24] B. Orel, Parallel Runge-Kutta methods with real eigenvalues, APNUM, 11 (1993), pp. 241{250. [25] W. Rheinboldt, Methods for solving systems of nonlinear equations, SIAM Classics in Appl. Math., SIAM, Philadelphia, Second Revised and Expanded Edition, 1998. To appear. [26] Y. Saad and M. H. Schultz, GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 7 (1986), pp. 856{869. [27] P. J. van der Houwen and J. J. B. de Swart, Triangularly implicit iteration methods for ODE-IVP solvers, SIAM J. Sci. Comput., 18 (1997), pp. 41{55.