Affiliation: | Air Pollution Laboratory, Danish Agency of Environmental Protection, Risø National Laboratory, DK-4000, Roskilde, Denmark Mathematical Software Group, CRAY Research Inc., 1345 Northland Drive, Mendota Heights, MN 55120, U.S.A. Danish Computing Centre for Research and Education, UNI * C, Region Lyngby, DK-2800, Lyngby, Denmark Department of Chemical Physics, The H.C. Ørsted Institute, University of Copenhagen, DK-2100, Copenhagen Ø, Denmark |
Abstract: | Computations involving symmetric, positive definite and band matrices are kernel operations in the numerical treatment of many models arising in science and engineering. It is desirable to achieve a high level of performance when such operations are to be carried out on a vector processor. If the operations are performed by rows or columns (as in the EXTENDED BLAS subroutines), then the loops are vectorized but the speed of computations, measured in Mflops, is not very high, because the arrays involved are normally short. Therefore the computations should be organized by diagonals. Furthermore, some special devices are to be applied in order to unrol the loops. Finally, one should be careful with the storage scheme. It is demonstrated that if (i) the computations are organized by diagonals, (ii) the main loops are unrolled and (iii) the storage scheme is such that the work with some zero-elements is avoided, then the speed of computations is nearly the same as that obtained in the computations with dense matrices. If a particular vector machine is in use (in our case a CRAY X-MP computer), then the speed can be increased further by (iv) coding some basic operations in machine language and (v) using the different processors of the vector computer in parallel. The efficiency of the exploitation of the special features of the particular computer that is to be used is also illustrated by numerical examples. Kernel subroutines performing matrix-vector multiplications are described. Representative tests are used to demonstrate the efficiency of these kernels. |