Chilton::ACL::High Speed Automatic Calculating-Machines

8 Coding on Automatic Digital Computing Machines: J H Wilkinson

Although many different types of code for Automatic Computing Machines have been proposed, those which have been adopted fall into two main groups, the one address codes and the three address codes. In this note a brief comparison is given between a typical one address code and the variant of the three address code which is used on the A.C.E. at the National Physical Laboratory, Teddington. One address codes have the following characteristics in common. Their instructions assume the use of a central arithmetic unit (usually called the Accumulator) and refer to one storage position and one operation. Typical instructions are the following:

Take number in memory position S and add it to the number in Accumulator.
Take number in memory position S and subtract it from the number in Accumulator.
Take number in Accumulator and send it to memory position S.

In addition to instructions of this type they have an instruction by means of which a discrimination may be performed. This is usually of the type:

If number in Accumulator is negative obey one instruction, but
If number in Accumulator is positive obey a different instruction.

The basic instructions in the three address codes refer to three storage positions, and consist of instructions of the type:

Take the number in storage position A, add it to the number in storage position B and send the result to storage position C.

This is usually denoted by

      A + B → C

Although there may be an accumulator, instructions do not, in general, implicitly imply its use. A crude comparison of the two types of code might lead one to conclude that the three address code required one instruction only, for every three used by a one address code, since in order to carry out an instruction of the type A + B → C with a one address code, the three instructions:

      A → Accumulator (cleared)
      B → Accumulator ( not cleared)
    Acc → C

are required. This is, however, a false comparison and the example coded below has been specially chosen to show how completely it may break down. The problem which has been chosen is the multiplication of two forty digit binary numbers regarded as integers with signs, (according to the usual module convention) by means of a table of instructions. For convenience a specific one address code has been used, namely that adopted for the machine at Manchester University. For the sake of brevity only these sections of this code and of the A.C.E. code which are relevant to this particular problem will be described.

The Manchester Code. The instructions are based on the use of an Accumulator which can store numbers two words long. In what follows, A is used for the accumulator, regarded as a storage location, while a_l and a_m are used to denote the numbers which are represented by the least significant and the most significant forty digits respectively of the number in A. S is used to denote a general storage position and s the content of that storage position.

Instructions used are:

      S' → A

This sends the number in storage position S, extended by forty copies of its most significant digit, to A.

      2S' → A

This sends twice the extended version of the number in S to A.

      s' + a → A

This adds the extended version of the number in S to the number in A.

      a - s' → A

This subtracts the extended version of the number in S from the number in A.

      a + s → A

This adds the unextended version of the number in S to the number in A.

      a_m + s → A

This adds the unextended version of the number in S to the most significant half of A.

      a_l → S

This sends the least significant forty digits of A to memory position S.

      a_m → A

This sends the most significant forty digits of A to S.

Instructions are normally placed in consecutive positions in the memory, but there are two exceptions to this. The first is the instruction called TEST. When this instruction is reached, the instruction to be obeyed next is the one in the following position, if the number in the accumulator is positive, but is the one in the next but one position if the number in the accumulator is negative. The second of these is the instruction called UNCOND. When this instruction is reached the next instruction to be obeyed occupies the memory position specified by the number in the memory position mentioned in the instruction UNCOND. E.G. the instruction 17 UHCOND would mean, look at the number in memory position 17; suppose it is x; then the next instruction to be obeyed is that in memory position x. The use of the instruction UNCOND therefore requires the use of a further position for the numbers x. Each instruction occupies half a word of storage.

The A.C.E. Code. The variant of the three address code adopted on the A.C.E. was designed partly to overcome the fundamental weakness of the mercury delay line type of storage, namely the long access time. The greater part of the storage of the A.C.E. will consist of long mercury delay lines each of which will store thirty-two forty digit words. Since the pulse repetition frequency is 1 megacycle the access time for this part of the storage is 1280µs, usually called a major cycle. There are in addition a small number of short delay lines each of which stores one word of forty digits; the access time for this part of the storage is 40µs, usually called a minor cycle. The main difference between the A.C.E. and other machines using delay line type of storage is that instructions are not stored in consecutive positions in the long delay lines, but in such relative positions that the next instruction to be obeyed is emerging from the store during the final minor cycle occupied in carrying out the current instruction. The effect of this, as will be seen below, is that instructions may be carried out at a speed of up to one every two minor cycles instead of one per major cycle, which is the maximum speed obtainable with a more conventional code. The inclusion of a small number of short delay lines makes is possible to reap the benefit of the extra available speed. It is found that in most tables of instructions the majority of the operations are concerned with a small group of numbers, and it is these numbers which are stored in the short delay lines during the operation of the table. A unique feature of the A.C.E. coding is that, associated with each instruction, is a timing number t; all instructions belong to one of two types. The first type are known as immediate instructions; for this type the instruction is carried out continuously for t consecutive minor cycles after the instruction has been set up. The second type of instructions are known as deferred instructions; for these there is a waiting period of t minor cycles after the instruction is set up and then the required operation takes place for one minor cycle.

Examples of instructions are:

      A + B → C i 2 (i is for immediate)

If this instruction occupied position m in a long storage tank it would be set up in minor cycle m + 1 and then carried out for two consecutive minor cycles m + 2, m + 3. If the storage positions A, B and C referred to long delay lines, then the effect would be to add the (m + 2)th word of D.L.A. to the (m + 2)th word of D.L.B. and send the answer to the (m+2)th word of D.L.V. in minor cycle (m + 2), and then to add the (m + 3)th word of D.L.A. to the (m + 3)th word of D.L.B. and send the answer to the (m + 3)th position of D.L.C. in minor cycle (m + 3). Since this instruction is completed in minor cycle (m + 3) it means that the next instruction would have to occupy position (m + 3). The instruction

      A + B → d    3 (d is for deferred)

however, would mean the following. If the instruction were in position m, then, as before, the instruction would be set up in minor cycle (m + 1), there would then be a wait for 3 minor cycles and finally in minor cycle (m + 5) the (m + 5)th the word of delay line A would be added to the (m + 5)th word of delay line B and the result sent to the (m + 5)th position of delay line C. Since the instruction is completed in minor cycle (m + 5) the next instruction would have to occupy position (m + 5). Although the timing consideration is of the utmost importance in this type of coding it does not call for much thought in the actual process of coding. Coding is done first, almost without reference to timing, and the addition of this element is performed purely automatically in the detailed numerical coding. The example below should make this clearer. The A.C.E. has a double length accumulator although, of course, it is used in comparatively few instructions compared with the use in one address codes. One other point must be mentioned. In addition to the memory positions, instructions may call on a number of special numbers which are always available; these are the numbers zero, one, 2³⁹ and 20⁴⁰-1; these are usually referred to these as zeros, P1, P40 and ones respectively. Their use is so frequent that it is convenient to have them readily available. Their use is also very common in all other computers, but there these numbers are usually stored specially for each specific problem in which they are required. Discrimination is performed in the following manner. Any instruction may call on the control to send the result of its operation to a special location called DISC (abbreviation for discrimination) by an instruction of the type:

      A + B → DISC     (any type of timing)

The effect of this is, that if the result of the operation A + B is zero then the next instruction is the one it would have been in the usual way, but if the result is not zero then the next instruction is one position further on than it would otherwise have been. The functions which are used in the problem chosen are:

      A + B → C     
      A - B → C     
      A & B → C     (i.e. the logical operation and)

The accumulator may be used in two ways; either the number sent there may replace the number which was there before. This is written

      A + B → Acc     (Acc stands for Accumulator cleared)

Alternatively the number sent there may be added to that already there. This is written

      A + B → Ach     (Acc stands for Accumulator held)

These two ways of using the accumulator are of course analogous to these in one address code.

It should be pointed out that much of the notation used in this note is of a much more extended character than would be used by one who is familiar with the code, when coding for his own convenience.

Coding of problem on the Manchester machine. The multiplication will be performed in the following manner. The multiplicand will always be used in its extended form and therefore its sign will be taken care of automatically. For the multiplier, however, all digits except the first will be treated normally but the most significant digit will, if it is a one, be regarded as though it were a bar one. This clearly takes care of the sign of the multiplier. The general plan will be to build up the product stage by stage using the successive digits of the multiplier, beginning with the most significant. This may be described in a form more in keeping with machine instructions as follows.

R is used to denote the multiplier, D the multiplicand and D' the extended form of the multiplicand.

          Examine most significant digit of R.

     (Digit is 1)                     (Digit is 0)
    Send -D' to A                     Send   0 to A      This deals with the most significant digit of R 
(X)	                  Double A         

                      Double R      (i.e shift R one place to left) 
          Examine most significant digit of R. 
          
     (Digit is 1)                     (Digit is 0) 
     Add D' to A                       Add   0 to A
                 Go back to position (X)

It is clear that the second section of the table will have to be repeated thirty nine times to deal with all the digits of R and therefore we must include some counting process in this section. The process used is to store the number -2 in a given position when this section is started and to double this number for each performance of the section. The table has been devised so that this gives exactly 39 repetitions by appropriate use of TEST. Even with the inclusion of the counting process, however, the coding is far from immediate. Further thought is necessary because of the fact that almost all operations require the use of the Accumulator, including in particular the operation TEST. This means that the accumulating product cannot remain in the accumulator but has to be put out into another part of the store during each performance of the second section. Moreover we must use only those operations which the machine has available. The coding is given below.

The following storage positions are allocated for storing quantities occurring during the performance of the table.

1      R (This is used to store the multiplier in the first place but at subsequent stages 2R, 4R, etc.)
2      D
3     -Pn (it is storing -Pn at the beginning of the (n - l)th 
            performance of the second section).
4     -P1
5      Least significant portion of partial product.
6      Most  significant portion of partial product.
7      Number specifying memory position of first instruction of second section.

INSTRUCTIONS

 1    2x - 4' → A     Puts initial value of -Pn in 3
 2    a_l → 3
 3    R' → A          Instructions 3 to 7 deal with the most significant digit of R.
                      Slight complication arises because R' is in the accumulator
                      immediately after TEST
 4    TEST
 5    (+) A + D'      The notation (+) after an instruction means that it is
 6    (-+) A - D'     obeyed if TEST has shown positive result. (-+) means that
 7    (-+) A - R'     this instruction will be obeyed both for negative and
                      positive results from TEST.
 8    a_l → 5          Stores partial product temporarily in 5 and 6
 9    a_m → 6
10    2 x 3' → A      This does counting process and tests for completion of product
11    TEST
12    (+) FINIS
13    (-) al → 3      Replaces new -Pn in 3. Note -Pn was used instead of +Pn
                      so that FINIS would come immediately after TEST. This
                      saves an instruction.
14    2R' → A         Prepare to examine next digit of R and replaces new R
                      for next time
15    a_l → R
16    R' → A
17    TEST
18    (+) A - D'      Gives contribution  to product from a digit of R. There is
                      the same complication as in the preliminary section.
19    (-+) A + D'
20    (-+) A - R'
21    A + 5           Adds in double the present value of the partial product to
                      the latest contribution.
22    A + 5
23    A_m + 6
24    A_m + 6
25    UNCOND          Return to instruction number 8. This last instruction
                      requires that the number 8 be stored in a memory position

A close inspection of this table will show that its derivation from the first strategic plan is far from immediate, though it is true to say that if one had been content to code it in a more carefree manner i.e. without much consideration to the number of instructions used, it would have been possible to derive the table much more quickly. The total storage space used is 25 × ½ words for the instruction themselves and 7 words for storage of numbers. However, of these the number -P1 might well be used so often in other tables that it would have a special storage position. The total storage would therefore amount to 12½ + 6 i.e. 18½ words. The Manchester machine takes about 1 millisec. for the performance of each instruction and therefore the total time taken would be 7 + 39 × 18 millisecs. - 709 millisecs. approximately. It would therefore scarcely be a practical proposition to code multiplication. If the code adopted on the A.C.E. had been the conventional one address code with instructions stored consecutively the time taken would have been (7 + 39 × 18) major cycles i.e. (709) × 1.28 millisecs. i.e. 900 milliseconds approximately.

Coding for A.C.E. Since the A.C.E.. has no facility for producing, in one instruction, the extended form of a number it is simpler to perform the multiplication another way. The easiest method is to multiply the two numbers together as forty digit numbers without reference to sign and then make the following corrections. If either of the multiplicand or multiplier is negative then add minus the other into the most significant half of the product. The correction should be done first since, as before, we propose to pick out the successive digits of the multiplier by repeated multiplication by two, and therefore we would have lost its most significant digit if we tried to make the corrections last. The general plan may be described as follows:

                    Clear Acc
          Examine most significant digit of R.

     (Digit is 1)                     (Digit is 0)
    Send -D to Acc                   Add   0 to Acc     
    
          Examine most significant digit of D. 
          
     (Digit is 1)                     (Digit is 0) 
     Add -R to Acc                     Add   0 to Acc
(X)                Double Acc
          Examine most significant digit of R.
     (Digit is 1)                     (Digit is 0) 
     Add D to Acc                      Add   0 to Acc
           Double R and return to (X)

As before the table consists of a preliminary part to do the corrections and then a second part which must in this case be repeated 40 times. It is therefore necessary to do a counting process and here it is most convenient to count by starting with a P1 and doubling each time until the P1 becomes a P40. The coding is almost immediate since the three address code can carry out all the operations required quite simply. The numbers which are being used will all be stored in short delay lines and storage for them may be allocated as follows:

1.  R     the multiplier itself in the first place and later its various modified forms.
2.  D     the multiplicand
3.  Pn    storing Pn at the beginning of the nth repetition of the cycle

There is no need to allot extra storage for the partial product because these intermediate values can remain in the accumulator throughout the operation. The coding is given in the first place without reference to time, in order to demonstrate how easily the element is added subsequently. In order to make the instructions simpler to understand they are written in considerable detail.

timing for last instruction to be such that next instruction is X i.e. the first of the repeated cycle.

Only one or two points need to be noticed before the detailed coding is written down. First, in the initial part after discriminating for sign of R we have two alternatives. If R is negative we obey a certain instruction but if R is positive we have in fact nothing to do. If we wish to save time we could use a dummy instruction to take us immediately to that instruction which will follow the negative case. Since the first section of the table is not repeated it is hardly worth saving time here and we merely arrange that in the negative case the next instruction after subtracting D from the accumulator is the same instruction as in the positive case. This wastes 1 major cycle i.e. 1 millisec. Similar remarks apply when discriminating on sign of D in initial part. In the repetitive cycle we are faced with the same problem when discriminating on the sign of R. Here the decision is to use a dummy instruction because the alternative is to wait for 1 major cycle and this wait is repeated 40 times. Nearly all the instructions are of 1 minor cycle duration except those concerned with clearing Accumulator and doubling its contents and these must be 2 minor cycles long.

Detailed Coding.

The numbers refer to the positions occupied in the long delay lines. They have been written in the order in which the coder will derive them.

 1     0's + 0's  →  Acc   i   2     Puts initial value of -Pn in 3
 4     R & P40    → DISC   i   1     go to 6 if R positive, to 7 if R negative
 7 (-) 0's - D    →  Ach   d  29     timing chosen to reach same position as
                                     positive case to avoid dummy order
 6(+-) D & P40    → DISC   i   1
 9 (-) 0's - R    →  Ach   d  29
 8(+-) P1 + zeros →    3   i   1
10     Acc + zero →  Ach   i   2
13     R & P40    → DISC   i   1
15 (+) 0's + 0's  →  Ach   i   2     timing chosen to arrive at same position as in negative 
                                     case. Instruction adds zero to Acc and is therefore a dummy.
                                     It is the equivalent of UNCOND in Manchester code
16 (-) 0's + D    →  Ach   i   1

18(+-) 3 & P40    → DISC   i   1     go to 20 if Pn is not P40.
                                     Go to 21 if Pn is P40. 21 is FINIS
20 (+) 3 + 3      →    3   i   1
22 (+) R + R      →    R   d  18     timing chosen so that next instruction is No. 10
                                     ie beginning of cycle

It will be seen from the above that the table takes about 2 major cycles for the corrections and one major cycle for each performance of the repeated cycle i.e. 42 major cycles or about 50 millisecs. This compares with the 900 millisecs. which would have been taken on the same machine using the same pulse repetition frequency if the coding had been of the conventional 1 address code type with instructions in consecutive positions. Suppose, however, we wished to dispense with an automatic multiplier and were therefore desirous of speeding up coded multiplication even further. If we look at the repetitive cycle in the above coding we find that it only takes a small part of the major cycle; the greater part of the time is wasted in instruction 22 with the express purpose of getting back to instruction number 10. We could, however, make the instruction 22 have the usually timing i 1 which would make the next instruction number 24. Suppose now we put a duplicate copy of instructions 10 to 22 in positions 24 to 36 (i.e. 4 because we would have to come round to the beginning of the delay line again) we would then perform two steps of the multiplication in each major cycle. We observe now that quite a number of the instructions in the repetitive cycle are concerned with counting to see if the process is complete. If the instructions are duplicated there is no need to do the counting in each set because, since the number of digits in the multiplier is determined as forty, we shall know in which copy the process will stop. If we take advantage of this we can get three copies of the repetitive cycle in one major cycle and we can see that discrimination need only be done in the first set because 40 = 3 x 13 + 1. All that is necessary is to start with P27 and advance it one stage in the first set only, there being no counting in the other duplicate sets. This would reduce the time for multiplication to

2 + 40/3 major cycles i.e. 15 major cycles

Although this is a remarkably fast time for a programmed multiplication on a delay line machine we have not regarded it as sufficiently fast to consider the omission of an automatic multiplier. It is interesting to note that even the last mentioned programme for multiplication (i.e. with three copies of the repeated cycle) only needs 24 instructions and therefore only ties up 2/3 of one long delay line. A similar table may be devised for signed division using the ordinary long division method and again 3 copies of the repetitive type will fit into a major cycle giving, as for multiplication, a division time of 15 major cycles. The normal square rooting method has a slightly longer repetitive cycle which can only be repeated twice per major cycle. The best time for square roots is therefore 20 major cycles.

A comparison of the storage required for the table in the A.C.E. code compared with that in the Manchester codes shows that the A.C.E. code uses 17 memory positions (including the short delay lines 1, 2 and 3). While the Manchester machine uses 18½. If we ignore the space occupied in storing numbers the figures are 14 and 12½ respectively. The 3 address code has therefore fallen well below the rather crude efficiency factor of 1.5 mentioned earlier. It is easy to see why this has happened so conspicuously in this example. In the first place we have the rather trivial reason that the most suitable methods on the two machines were different, and the one on the A.C.E. coding had two preliminary corrections instead of one (this gave two extra instructions); the second reason is that hardly any of the instructions in the A.C.E. coding are really making full use of the 3 addresses. Simple problems are rather exceptional in this respect, but from coding a wider range of problems it seems likely that this efficiency factor is little better than 1.2.