Chilton::ACL::Applications of Computers

13. Linear Algebra, M Woodger B Sc

SYNOPSIS

General remarks. Programme checks.
Simultaneous Linear Equations. Gaussian Elimination.
Inversion of Matrices.
Latent roots and vectors of matrices.
General manipulation of matrices: compound problems: need for an interpretive scheme.

1 - INTRODUCTION

1.1 In numerical work linear algebra means any calculations restricted to operations upon arrays of numbers like

a11         a12        a13        a14
a21         a22        a23        a24
a31         a32        a33        a34

in which multiples of one row or column are added to others. Such arrays are called matrices, and a single row or column is often referred to as a vector. We may use symbols A, x for matrices and vectors respectively. Linear algebra arises naturally in a very wide field of applications, nearly always through consideration of sets of simultaneous linear algebraic equations, that is to say equations such as

a11 u  +  a12 v  +  a13 w  =  b1
a21 u  +  a22 v  +  a23 w  =  b2       (1)
a31 u  +  a32 v  +  a33 w  =  b3

where one knows the coefficients aij and right-hand sides bi and requires the solution u,v,w. The term linear is borrowed from geometry, in which such equations describe lines, planes, etc.

1.2 The problem of inversion is to determine a matrix

c11   c12   c13
c21   c22   c23
c31   c32   c33

such that the solution of the equations (1) can be written explicitly as

u  =  c11 b1  +  c12 b2  +  c13 b3
v  =  c21 b1  +  c22 b2  +  c33 b3       (2)
w  =  c31 b1  +  c32 b2  +  c33 b3

This is useful when a large number of sets of equations such as (1) have to be solved, in which only the bi are changed between one set and the next.

The symbol A^-1 is used for the inverse matrix of A, which has to be square.

1.3 In the analysis of oscillatory systems, whether electrical or mechanical, one is led to the more difficult latent root or eigenvalue problem.

Here one has to determine a quantity λ such that

a11 u  +  a12 v  +  a13 w  =  λ u
a21 u  +  a22 v  +  a23 w  =  λ v       (3)
a31 u  +  a32 v  +  a33 w  =  λ w

and for n equations there will in general be just n values of λ, real or complex, for which there is a corresponding solution u,v,w, called the latent vector or eigenvector. λ will be related to a natural frequency of oscillation of the system and u,v, w will represent the corresponding mode of oscillation. The matrix A of coefficients has to be square.

1.4 What makes linear algebra so ideally suited to automatic calculation and so tedious by hand is the uniformity of the arithmetical manipulations required - the same simple sequence of additions and multiplications is constantly repeated.

One can store the elements of a row or a column of a matrix in consecutive storage locations and have a relatively small and fast loop of instructions for dealing with each element as it is selected from store.

1.5 Another important advantage is the very satisfactory programmed arithmetic check known as the distributive check which is available. This is achieved by storing the sum s1=a11+a12+a13+b1 together with the numbers a11, a12, a13, b1 and treating it in the same way. Then, in the simplest case, after forming the multiples k a11, k a12, k a13, k b1, k s1 in the course of a calculation, the machine checks that the last one is still equals to the sum of the preceding numbers.

1.6 Taking all digital computers in the country together, it is probably true to say that for about half of their operating time on real problems they are doing linear algebra. Almost the first general purpose programme to be prepared for a new computer is one for solving sets of simultaneous linear algebraic equations, and there is very soon a demand for matrix inversion and latent root programmes as well.

As instances of the origins of such sets of equations may be mentioned the following:

(i) Analysis of elastic structures, such as steel frames of buildings or aircraft wings and fuselages. Given the elastic constants (the aij) and the loading (the bi ) it is required to calculate the consequent deflections (u,v,w).

(ii) Steady flow of heat, fluids, electrical fields, and a wide range of other physical problems are expressible as the solution of partial differential equations of elliptic type.

As far as the computer is concerned these amount to sets of large numbers of linear algebraic equations, usually of rather simple form. The development of very large storage capacity for future computers is stimulated by the need to solve these very large sets of equations, especially in connection with military applications, shock waves, nuclear explosions and the like.

The above two categories have the common property that they arise from the idealisation of a continuous physical system to a discrete mathematical model. Thus in studying the deflection (and vibration) of a beam one replaces it by the simplified model of a number of point masses, elastically interconnected, each representing a finite strip of the beam. The larger the number of representative points the more accurate the idealisation, although engineering accuracy is obtained from surprisingly few points.

Likewise in the heat conduction problem one works with the values of the temperature at the corners of the squares of a regular network covering the region of interest, and there is one equation for each point of the net. Again, the finer the net the more accurate are the results, although adequate estimates of thermal gradients, for example in assessing structural safety factors, may be obtained with a fairly coarse net.

(iii) A further common origin of sets of simultaneous linear algebraic equations is statistics. Any regression analysis, and in particular curve-fitting of data at unequally spaced intervals by the method of least squares, leads to such equations.

2 - METHODS

2.1 Gaussian Elimination

This is the method one learns at school. Starting from equations (1) we eliminate u from the second and third equations by adding -a21/a11 times the first row of the matrix to its second row, and also -a31/a11 times the first row to the last row. This gives a matrix of the form

a11   a12   a13   b1
 0    a22'  a23'  b2'
 0    a32'  a33'  b3'

We now eliminate the second variable v from the third equation by adding -a32'/a22' times the second row to the third row and end up with the matrix reduced to triangular form"

a11   a12   a13   b1
 0    a22'  a23'  b2'
 0     0    a33"  b3"

Finally we determine w = b3"/a33" from the last equation, use this to substitute in the second equation to get

v  =  (b2' -a23' w)/a22'

and use v and w to substitute in the first equation to get

u  =  (b1 - a12 v - a13 w) / a11

This is the second half of the process and is called back-substitution.

It is obvious that the method would break down if the so-called pivots a11 or a22' or a33' were zero, and would cause trouble if any of these were small. A simple precaution, which renders Gaussian Elimination the reliable and universally used numerical procedure that it has become, is to rearrange the equations at each step in the reduction so that the pivot is always the largest number in the first column of the partially reduced matrix. Thus we arrange to begin with that a21 and a31 are not larger (in absolute magnitude) than a11, and at the next step that a32' is not larger than a22' - otherwise we interchange the last two equations. If small pivots still appear the solution is poorly determinable and the equations are said to be ill-conditioned.

Other methods, suitable especially for equations of particular forms, will be dealt with if time permits.

2.2 Inversion of matrices.

This is simply an extension of 2.1 in which one deals simultaneously with right-hand sides

1   0   0
0   1   0
0   0   1

The three solutions so obtained are the three columns of the inverse matrix. During the reduction one keeps all right-hand sides throughout the working, except where storage space can be saved by not storing zeros. The back-substitution is carried out separately for each column of the inverse.

2.3 Latent roots and vectors of matrices.

The variety of methods available here is due chiefly to the very varied requirements for specific purposes.

In vibration problems one has essentially a symmetric matrix (a21 = a12, a31 = a13, a32 = a23) and the roots must all be real; one wants only the largest few roots and corresponding vectors and the roots are not often very close together. In this case the standard method is to start with any set of numbers

u1
v1
w1

and multiply by the matrix to get a second set

u2  =  a11 u1  +  a12 v1  +  a13 w1
v2  =  a21 u1  +  a22 v1  +  a23 w1
w2  =  a31 u1  +  a32 v1  +  a33 w1

normalise this set by dividing through by its largest element (to make the largest one 1), and again multiply by the matrix. This process is repeated until the numbers converge - the result is the latent vector x corresponding to the largest root λ, and the root itself appears as the largest vector element after the matrix multiplication.

One has then to remove this root, i.e. to calculate a new matrix B whose roots other than λ are the same as those of the original matrix A, and either (a); B is smaller than A by one row and column, or (b): B is the same size as A and has a zero root in place of λ.

Method (a) is perhaps commoner; it reduces the volume of computation a little and is applicable directly to unsymmetric matrices A, If the normalised latent vector of A is

1
v
w

v then one forms B by subtracting from the second and third rows of A the multiples v and w respectively of the first row, and then omitting the first row and column. The latent vectors of B have now only two elements and need to be combined with the vector x by a back substitution process to obtain the corresponding latent vectors of A.

Method (b) for symmetric matrices avoids a back substitution step since the latent roots and vectors of B are already appropriate to A. B is formed from A by subtracting λ times 1,v,w from the first row of A, λ v times this from the second row of A, and λ w times this from the last row.

Among other methods of value according to the requirements of the case are those of Jacobi, Lanczos and Givens which will be described if time permits.

3 - MATRIX ALGEBRA IN GENERAL

Occasionally one has already available for the computer the coefficients aij and bi, but it is more usual to find that a variety of simple preliminary calculations are necessary in order to obtain them. These calculations generally fall under the heading of linear algebra, and can be expressed in terms of matrix addition or multiplication, or simplified forms of these operations.

3.1 If the matrix

A  =  a11         a12        a13        a14
      a21         a22        a23        a24
      a31         a32        a33        a34

is added to another one B, which must be of the same shape and size, i.e. of three rows and four columns, the result is got by adding together corresponding elements in the two matrices, thus:

A + B  =  a11 + b11     a12 + b12     a13 + b13     a14 + b14
          a21 + b21     a22 + b22     a23 + b23     a24 + b24
          a31 + b31     a32 + b32     a33 + a33     a34 + b34

If A is multiplied in front by a row vector [p q r], which must have the same number of columns as A has rows, the result is another row vector which is got by adding together p times the first row of A, q times the second row, and r times the third, thus:

[p q r]A = [pa11+qa21+ra31, pa12+qa22+ra32, pa13+qa23+ra33, pa14+qa24+ra34]

We call this row a linear combination of the rows of A with coefficients p, q, r.

If however A is multiplied behind by a column vector

d
e
f
g

this must have the same number of rows as A has columns, and the result is another column vector which is got by adding the multiples of the four columns of A given by d,e,f,g respectively, thus:

A d  =    a11d + a12e + a13f + a14g
  e       a21d + a22e + a23f + a24g
  f       a31d + a32e + a33f + a34g
  g

This column is a linear combination of the columns of A.

The general case of pre-multiplication of A by a matrix B is just the same, but instead of a single row p q r producing a single row product we have each row of B treated in the same way to produce the same number of rows in the product matrix. This can be expressed by saying that BA is a matrix of as many rows as B, each row being a linear combination of the rows of A, the coefficients in the combination being the elements of the corresponding row of B. For this to be possible B must have as many columns as A has rows.

Likewise post-multiplication of A by B can be regarded as the formation of a matrix by columns, each of which is obtained as a linear combination of the columns of A using coefficients from a column of B. B must have as many rows as A has columns (which is the same condition as before).

When A is pre-multiplied by the square matrix

p  0  0
0  q  0
0  0  r

(called a diagonal matrix) the result is

pa11  pa12  pa13  pa14
qa21  qa22  qa23  qa24
ra31  ra32  ra33  ra34

i. e, the rows of A are multiplied by the diagonal elements.

Likewise post-multiplication of A by the diagonal matrix

produces

a11d  a12e  a13f  a14g
a21d  a22e  a23f  a24g
a31d  a32e  a33f  a34g

the columns of A being multiplied by the diagonal elements.

In the special case p=q=r we have all elements of A multiplied by p and call this scalar multiplication of A by p. Finally the matrix may be written transposed, i.e. with columns and rows interchanged thus:

a11  a12  a13
a21  a22  a23
a31  a32  a33
a41  a42  a43

This is often denoted by A'.

3.2 With the aid of the above ideas much of the systematic preliminary processing of numerical data can be effected by linear algebra.

Scaling the columns of a matrix, which corresponds to scaling the variables u, v, w separately in the case of the linear equations (1), is simply post-multiplication by a diagonal matrix composed of the scaling factors. Scaling the rows, which does not affect the result in (1), may be desirable if some rows of coefficients are much smaller than the others, and can be achieved by diagonal matrix pre-multiplication.

Change of origins of u, v, w to p, q, r respectively amounts to subtracting

A  p
   q
   r

from the right-hand side

b1
b2
b3

3.3 As a result of the relatively few kinds of operations involved in the manipulation of matrices the subject lends itself particularly well to the use of interpretive programmes. One can do most of the necessary calculations if one has the following repertoire of functions available:

Input a matrix to location a
Output a matrix from location a
Add the matrix at location a to the matrix at location b, and store the result at location c.
Multiply the matrix at location a by the matrix at location b, and store the result at location c.
Transpose the matrix at location a and store the result at location c.
Pre-multiply the diagonal matrix at location a by the matrix at location b and store the result at location c.
Post-multiply the matrix at location a by the diagonal matrix at location b and store the result at location c.
Extract part of the matrix at location a and store the result at location c.
Compound the matrix at location a with the matrix at location b and store the result at location c.

If the dimensions (numbers of rows and columns) of the matrices are stored with them, each sub-programme called into use by the master programme when interpreting a codeword needs only to be given the addresses of the matrices to be operated on and the address to which to send the resultant matrix. Thus a three address code is convenient, and we can arrange for the addresses to refer to track numbers on an auxiliary magnetic drum store, as with DEUCE. A codeword (a, b, c, r) can mean carry out operation number r using matrix addresses a, b, c.

The distributive checks can be incorporated in the component programmes, but are only possible with precision where fixed point working is used. However, the device of block-floating arithmetic can be used, in which a common exponent is associated with each matrix, the exponents of two matrices being taken into account in deciding the exponent of their sum or product. This preserves the automatic scaling advantages of ordinary floating-point arithmetic. Where multiplication is involved the products are accumulated to double-length (two word) accuracy before shifting and rounding off to single length. This means that each component programme (when multiplication is used) has to carry out the operation twice - once to find how big the largest element of the resultant matrix is and hence the shift after multiplication (and the exponent to be associated with the result), and then again to form and store the single-length matrix.

Because each component operation takes some time, the time of interpretation of each codeword is not so important as with other interpretive schemes. It is at present of the order of one second for DEUCE.