Chilton::ACL::Atlas Order Code

Comments on the Atlas Order Code

A R Curtis

27/7/1960

INTRODUCTION

The comments which follow have been compiled fairly hastily. which, it is hoped, may excuse the somewhat dogmatic style in places. However, this is not intended as a watering-down of the opinions expressed.

They reveal a rather different attitude to the organisation of computing on machines of this class than that implicit in the Ferranti approach. It is claimed:

that this attitude is equally valid,
that so far as possible the customer should be given the choice, and
that this attitude would be more suitable for A.E.A. applications.

There seems to be fairly wide-spread agreement within the A.E.A. about this. Undoubtedly this attitude is conditioned by familiarity with IBM equipment, but that does not detract from its validity; IBM equipment is often extremely convenient to use.

If the A.E.A. were to purchase an ATLAS with an inadequate core store, the Ferranti concept of time-sharing would have to be looked into. However, there are indications that even with time-sharing the efficiency measured as (time spent doing arithmetic) / (total time occupied with programs) would be low for the type of work envisaged. It is hoped that a later paper might throw some light on this.

This paper is intended to serve as a basis for discussion. It is regretted that it has been thought necessary to go into so much detail, but the author believes this to be essential, and that experience with Mercury has shown this, if misunderstanding is to be avoided and a reasonable compromise reached between what users would like and what the manufacturers can supply.

The version of the order code referred to is that dated 4.7.60.

A ACCUMULATOR FUNCTIONS

1. It is stated that some Extracode routines may address individual characters of a word. No basic instructions for doing so have been given. The programmer may wish to use them; and in any case we should like to know the complete basic code.

2. If al had its own exponent:

double precision operations would be easier to program (in extracode) (Y_al would be set to (Y_am - 13) at end of floating instrs.);
division giving quotient and remainder would be basic (I know there is one now, but the remainder does not have an exponent);
the very useful operation (basic)
```
am' = al
al' = am
```
could be provided;
the basic operation (5) s' = am; am' = l could become s' = am; am' = al (there should be an unstandardised as well as a standardised form);
when storing the long accumulator
```
s:' = a
```
would give
```
s' = am; (s + 1)' = al
```
so that (s + 1) was a proper floating number with the correct exponent Y_al = Y_am - 13. This form would be assumed by double precision operations;
am, al could be used as separate stores for two of the arguments of a subroutine.

3. The operation (24) should be basic. There should be am' = - am, also a' = - a (both basic).

4. Division Operation (21) is fine for what it is intended to do. Operation (22) is not very carefully defined. We feel that the microprogram for this should be (omitting complications due to signs; the following assumes a > 0, s > 0).

Full image ⇗
© UKRI Science and Technology Facilities Council

The complications to take account of signs are, of course, considerable; this is not intended to be a complete specification, but the above principle should be adhered to.

Operations (103) and (104) are not very precisely defined, either. Whether (104) is what is wanted depends on the exact definition. (103) should be (with the same provisions as above):

Full image ⇗
© UKRI Science and Technology Facilities Council

The interruption on divide check, in both cases, should be (under program control) either as for Eo or a jump to a standard location. The divide check interruptions will presumably also be necessary for operations (21) and (103).

Operation (104) should be basic.

Floating arithmetic needs explaining in at least this much detail.

5. Why should the following operation be Eo?

24, 27, 44-46

It would be better to leave the wrong answer (with Ao set, of course).

6. Will operations 38-43 really apply to Ya, Ys as well as m, xs? This is what is wanted, of course. It would be nice if they could be basic - 38-40 certainly should be.

7. Fixed point multiply and divide. What the programmer really wants is to be able to multiply m.8^Ya by xs.8^Ys and then force the result to have a given exponent (say ba), shifting as necessary:

xa' = m . xs . 8 ^{Ya + Ys - ba} ; Ya' = ba

unstandardised and unrounded. Also to be able to divide xa . 8^Ya by xs . 8^Ys and then force the result to have exponent ba shifting as necessary:

m' = (xa / xs) . 8^{Ya - Ys - ba}; Ya' = ba

rounded but unstandardised. These would both be Eo, Ao.

8. There should be a round instruction: add one to the m.s. bit of 1 (unstandardised, Ao) and possibly the same thing standardised, Eo. The first should be basic.

9. Operations 50-51, which will mainly be used for adding integers to A, should be available unrounded.. The purpose of 52-53 is obscure, but perhaps the same comment applies to them. Perhaps al1 of them would be more useful as e.g. a' = a + s(13) (q).

10. Should (63) be (qr)?

11. Operations 71, 73, 75 would not in general be as useful as

a' = a + n (q) etc.

Note: Easy if' Ya < 13; operation not wanted if Ya > 13.

12. How are accumulator shifts accomplished? With given basic codes they are next to impossible (in the sense that almost any operation is next to impossible on a Turing machine). Since the accumulator is used for floating arithmetic, presumably shifts are something fairly basic. Why cannot there be basic operations, unrounded unsigned left and right shifts, with variants extracoded? If these would be only by multiples of 3 places, they would still be useful.

It is difficult to escape the conclusion that there are in fact far more basic operations than we have been told about, and that some of the extracode operations are microprogrammed in terms of them, rather than programmed in terms of the stated basic operations. If so, we should like to know the full basic set and have our choice of them available.

Also, we should like to have programs and estimated timings - quite rough - for extracode operations. The important point is this: no single operation (except perhaps some floating arithmetic operation) is so frequent that it greatly matters whether it is fast or slow (compared. with the store cycle time). But on this computer so many operations are extracoded that, if extracoded operations are slow, the computer might be seriously slowed down when doing anything other than rounded single precision floating arithmetic. So far as shifts are concerned, for instance, it seems quite possible that they are programmed one position per cycle through a loop, in which case they must be slow; then one would normally avoid using shifts (at great inconvenience) as far as possible. But if a similar conclusion applies to all extracoded operations, the power of the computer is considerably reduced.

It should be emphasised that one frequently wants to do other than floating arithmetic operations. Some work, Monte Carlo, involves very little floating arithmetic, but plenty of manipulation of bit patterns, e.g., take the logarithm of the number represented by 10 bits from somewhere in the middle of a word, float it, truncate the number part to 10 bits plus sign and the exponent to 5 bits, and store three such 16-bit groups in one 48-bit word. Later unpack them and recover the floating form for use in arithmetic. We have no information to enable us to estimate how long this would take on ATLAS, or even whether it is practicable. It is not sufficient to say that this could be provided as an extracode operation if required - one does not know exactly what is required, until the stage of detailed coding of a problem is reached, and then one has to use the order code of the machine as given. What the programmer needs in the way of logical operations is flexibility.

To comment on the actual shift operations suggested: The purpose of 87,89 is obscure; they produce a number whose value depends on n, the operand, and on m . 8^-Ya. The latter seems to have no particular meaning. It is suggested that these two operations should be

m' = m . 8^{(n + Ya)}, Ya' = -n;

and further that this and the other operation in this group

(88, 90) m' = m . 8^(Ya-n), Ya' = n

should be available

unstandardised, unrounded, on xa instead of m;
standardised, unrounded, on xa instead of m;
unstandardised, rounded;
standardised, rounded.

It is thought that the unstandardised forms would be the more useful.

13. Operation (98) - presumably a minus sign has been omitted.

14. The fixed point operation of adding to and subtracting from l, with carry into m, and of setting in l with sign propagation up through m, should be available, and if not actually basic should not take unduly long to execute.

15. The extracode operation

a' = z->' = am . s + z-> (q)

is of extreme importance and should be provided ( it would be nice also to have

a' = z->' = s(ba + bm + n) . s(bm + n) + z-> (q),

where it is hoped the notation is self-explanatory, as this is what is really wanted). If this and no other double precision operation is available, linear algebra can get by very happily most of the time.

B ACCUMULATOR TESTING FUNCTIONS

16. It is not known what is meant by e in operations 8-13. if it is a misprint for l, it is felt that m needs the most intensive treatment.

17. There should be instructions for comparing am with a store register. It is suggested that the most useful basic form is

ba' = ba, am > a
ba' = ba + 1, am = s
ba' = ba + 2, am < s

followed by normal augmentation of c. The comparison to be fixed point basically (i.e. on m - xs), with an extracode operation which first tests Ya - Ys and only tests the fractions if Ya = Ys; thus the basic form would apply to fixed numbers and the extracode to standardised floating numbers. No attempt need be made to cope with the possibility of unstandardised floating numbers. Of course, if both were basic, there would be no objection.

It would be convenient also to have a similar operation but with the skips the other way round, i.e. leave ba unchanged for am < s and add two to it for am > s; but it is not really necessary to have both.

18. Provision should be made for program control of Eo (and divide check) action. There should be "disable" instructions and "enable" instructions:

Disable Eo - jump to (bm + n) storing c in ba if Eo occurs.
Enable Eo - interrupt as described if Eo occurs.

Similarly for divide check, which should be provided. These would, of course, be extracoded. It is not considered that they should necessarily be compatible with time-shared operation, but no fixed store program should upset the record of the jump addresses and ba's.

C B FUNCTIONS

19. It is not clear why operations 0105 and 0125 should give circular shifts, but surely ordinary shifts should also be available? It seems it should only be a matter of whether a gate is open or closed. The non-circular shift seems simpler as it is a pure copying operation with no arithmetic. Similar remarks apply to 0143 and 0163.

20. The group 0164, 0165, 0166 seems strangely incomplete. If it were being completed by Extracode this would be understandable, but it does not seem to be.

21. Shifting. Shifts are frequently wanted, and should not take unduly long to execute. The vast number of basic operations ATLAS takes to simulate them is an indication that the wrong ones have been taken as basic. (Mercury is seriously deficient in this respect, also). The ones needed by the programmer are mainly the non-circular shifts, both arithmetic and logical, but as the circular ones lose no information and as it is not too difficult to simulate the others from them, it might be acceptable to have circular shifts as basic. But they should be by n places, not by one; and, if F1 means "shift ba up n places circularly" and F2 down, then the extracode operations for non-circular logical shifts should be no slower than the combinations of basic operations

F1   ba   0   n
0127 ba   0   -2ⁿ
and
F2   ba   0   n
0127 ba   0   2^24-n - 1

Otherwise, the effect of these combinations should be achieved by basic operations. It seems likely to be a close decision whether there should not be a set of 6 basic shift operations, up or down with circular, logical or arithmetic.

It is not necessary that, e.g. an instruction to left-shift -3 places should result in a right-shift of 3; it is equally satisfactory that it should result in a left-shift of (2²⁴ -3) giving zero answer for a non-circular shift and (in this example) 13 shifts for a circular shift (2²⁴ - 3 = 13, mod 24.); or even that some other convention hold, so long as it is sensible and definite. For example it might be that the l.s. 7 bits of n would be interpreted as a positive integer s(n = s, mod 128) and that the shift would be by s places. Then -3 would be interpreted 125 = 5 mod 24, so that a non-circular shift would give zero and a circular shift would give a net shift of 5 places. Behaviour when n > 23 must be specified precisely.

There would seem to be no reason why a shift by a number of places comparable with the length of a register should take greatly longer than an addition in the same length register.

0125   122   0   0       Shift Ba 6 bits up
0122   119   0   6       decrease n by 6
0214   126 119   5*      Exit if n = 0
0216   126 119  -3*      Jump if n > 0
0163   122   0   0       Shift Ba down one bit
0124   119   0   1       Increase n by 1
0215   126 119  -2*      Jump until n = 0
1121     0   0   0       Exit.

The above program is not, of course, very fast; but this is because the operation ought to be basic.

22. The operations E4, E5 and perhaps E1 could very well be basic.

23. Multiplication and Division

How are these accomplished? By use of the Acc. or by program in the B registers? How long will they take?

E10:ba' = ba . n in 20 basic instructions looks pretty slow - can we be given timings for this (and all extracode operations)? Is the lack of proper shifts in B registers an obstacle to making this basic? Or is it really done by a microprogram, perhaps quite fast?

For division, one assumes a program, since if it was done via the Acc. there is no reason why it should take many more instructions than multiplication, The single instruction given is not very adequate. One would like to have the remainder available, either as part of the answer to this (in B121 or B119 perhaps) or from a separate instruction - indeed, the remainder wou1d probably be more frequently wanted than the quotient if the instruction was reasonably fast. Also, division from a store register should be possible.

A point not always appreciated about operations of this kind is worth making here: If the operation is available, programmers w1ll use it for all sorts of unexpected purposes, e.g. the remainder from a division can be tested to perform some extra operation every nth time round a loop. The conscientious programmer will avoid uses like these if the operation is slow, preferring less convenient but faster means (he knows that this particular point will not make much difference, but he has a habit of avoiding all wasteful procedures). The less conscientious will just use all the available operations, fast or slow, as most convenient, and will produce slower programs. Either way, the effective performance of the machine, measured by speed or programmability, is less than it seems.

24. The meaning of E13-15 is not clear - is aml the integral part of am? If so, why not of a? The comparable operations

ba' = (l.s. half of m) + n

etc, while easy to code, might be worth providing.

25. It is assumed that E16 produces the position of the m.s. non-zero bit in m, but this is not made clear. Does the sign bit count? Does E17 produce the number of ones in am, or in m? While this operation is not likely to be used much, the latter form is more likely to be what is wanted.

The operations

ba' = position of m.s. 1 in n
ba' = number of 1's in n

might be useful.

26. Why do E18, E19 take so many instructions: Surely all that is needed is to follow E10, E11 with

0163    122    0    0
0163    122    0    0
0163    122    0    0
0127    122    0    2²¹-1
0113    122    0    (z)
0127    122    0    2²⁰
0122    122    0    0
1146    122    0    (z)

If proper shifts were basic, it would be much simpler.

27. An operation for entering a closed subroutine should be provided. The operation

ba' = c + 8; c' = n

mentioned in earlier versions would do, although

ba' = c; c' = n

would be just as good. It is very well worth while making this only one instruction to the programmer, even if it may be felt to be hardly worth including in the extracode list from a speed point of view. The principle of including a jump might be usefully adopted in other cases where the address is only made relevant rather artificially, e.g. as well as

ba' = aml + n and ba' = Rba - n
there might be
ba' = aml; c' = n and ba' = Rba; c' = n
where
Rba = ba shift right one position.

28. It is noted that

0121    0    0    0

is intended for use as a dummy instruction. There should be a proper dummy, the address and b digits of which are irrelevant.

D B TESTING FUNCTIONS

29. These are fine as far as they go. However, a somewhat wider range is desirable, allowing for a decrement part so that counting need not be by successive registers. Let e.g. functions 0488, 0688 not interpret their last 6 bits, nor the 7 ba bits, but instead treat these 13 positions as a positive decrement d:

04 d  bm  n:c' = n, bm > 4d; decrease bm by 4d
06 d  bm  n:c' = n, bm < -4d; increase bm by 4d.

Another useful addition to this range would be instructions of the type

c' = n, bm θ d, where θ is >, <, =, ≠

But this would mean allocating more bits to distinguish this special type of instruction, thus decreasing the size permitted for d; this would probably be acceptable - 10 bits for d might well be enough.

E B MISCELLANEOUS FUNCTIONS

It is agreed that these can all be extracode. it would be nice to have some clarification of what they do.

31. Functions of the type

c' = c + 1 if (handswitches) ∧ n ≠ 0,
otherwise
c' = c + 2

enabling a following jump instruction to branch if a particular handswitch, for example, is down) would be extremely valuable.

F NOTES ON SOME BASIC ACCUMULATOR FUNCTIONS

32. It is not correct to treat the smaller of two numbers whose exponent differ by 16 (or 32) or more as zero in floating addition. Consider e.g.

½ . 8⁰ - ½ . 8 ^-20 (qr)

In octal the fractions, appropriately shifted, are

.4000 0000 0000 0000 0000 0000 0000 .......
.0000 0000 0000 0000 0000 4000 0000 .......
.3777 7777 7777 7777 7777 4000 0000 .......

and the correct result should be

.3777 7777 7777 7

after ATLAS style rounding

The instruction (8) will give

.4000 0000 0000 0

which is in error by 1 in the last place.

This design fault on Mercury had to be put right so that the normal method of taking the integral part of a floating number would work. But in any case it should not be allowed in a new machine.

Even if one number would be shifted right out of the register for floating addition, nevertheless the computer should be made to behave as though its sign bit was repeated all the way down, and also if the result is to be subtracted the ones complement should then be used. Only if one argument is zero should no effect be obtained both in adding and subtracting it.

33. It is probably necessary, but is felt to be a pity, that l had to be cleared before operations 9-14, especially 12-14. Note that the remarks above apply to these operations too, with 32 in place of 16.

34. In (18) what happens if negating am necessitates re-standardisation - is the l.s. octal digit which is shifted down into l used in the multiplication?

Again, in (20), may not Ac occur in the initial negating? Or is the description of these functions inexact; is it really a case of forming the negative product directly?

If am is negated initially, surely we could have

am' = - am ( and perhaps a' = -a?) as basic.

35. see comment (4) re. operations (21), (22).

36. Accumulator test - should there not be basic jumps on whether m is or is not zero, as well as xa?

G BASIC B FUNCTIONS

37. It is noted that some function codes are not assigned, while others are shown as same as ..., where usually it is clear that the 0040 bit is not decoded. It is hoped that this is not going to lead to a shortage of available function codes for suggested extensions to the basic list - since this bit is sometimes decoded, could these functions not become unassigned?

H INPUT - OUTPUT

38. Drum. The arrangements for simulating a one-level store can be positively disadvantageous. it is hoped to produce shortly a detailed analysis of a three-dimensional diffusion program which illustrates this. Meanwhile we should like the following operations to be at the disposal of the programmer.

Bring block m now on drum to core store over-writing block n. Do not release sector containing block m.
Bring block m now on drum to core store page n. Do not release sector containing block m.
As a. but releasing the sector.
As b. but releasing the sector.
As a. but first saving present contents of page on first available sector.
As b. but first saving present contents of page on first available sector.
As c. but first saving present contents of page on first available sector.
As d. but first saving present contents of page on first available sector.

In all cases, core page to be locked out but return to program for any waiting time. drum transfer routine to be capable of stacking these instructions and carrying out transfers in most efficient manner. If the program addressed a locked out sector it would hold up the computer (no time-sharing) apart from interrupts for input-output equipment.

Can computation continue during the 2 msec needed for the transfer?

39. Tape The variable block length facility described sounds extremely complicated. What is wanted are instructions of the type:

Read (Write) next m words on tape mechanism t to (from) cores
starting at register n; with action on completing a choice of:
take another such order (stacked) or
stop tape at end of next block. reading transfers to allow a 
non-transmitting mode for spacing over information.

It is suggested that the arrangement described, in which the parameters are put in Ba, Ba+1, etc., is clumsy, and that a more useful arrangement would be an instruction:

"Interpret s as a tape instruction for mechanism Ba" where location s would contain (a) n in the address part, (b) m in bits 44-24, (c) a code for read/write, continue/stop, transmit/non-transmit, in bits 47-45. The tape program would interrupt on completion, and it the continue bit was set would interpret (s + 1) similarly without stopping the tape. Note that the Ba bits in the instruction, not the contents ba of Ba, would be equal to t. It is not thought likely that reference to tapes by reel number would be so convenient as by number of the mechanism on which they are mounted, as the reel number will not usually be known. The logical mechanism number t would be converted to the actual mechanism number t* by table lookup, and the table entries will be available for alteration by program. This will not lead to trouble as we would not expect to time-share (if off-line input/output is to be simulated by time-sharing it can use conventional logical mechanism numbers" ≥ 32 which would not be used by other programs). There should be an instruction "Is mechanism t in operation?"

The program should assign core store blocks to be used as buffers for variable length transfers. Fixed (single block) transfers, which are carried out directly, should be as specified, except that a mechanism number should be specified rather than a reel number, and should be of the form

( Operation) t   bm    n

where bm is a modifier for n, the core page number, and t is the mechanism number. Instructions, numbers 9-14, would be replaced by those suggested above.

The meaning of instructions, numbers 15-16, 19-22 is not entirely clear. First, what is a "code word"? There is need for a recognisable special type of record on tape (corresponding to the IBM "end-of-file"), and it should be recognisable by any off-line apparatus and should cause a halt when printing or punching from tape, and should be produced when card reader runs out when writing tape from cards (more precisely, when reader stops because of empty hopper, and START is pressed to run out remaining cards from feed instead of loading more cards first to continue). Simulated off-line input/output should do this, and it would be nice if agreement could be reached with ICT that their 1301 would do it. What the mark should be is difficult to say - IBM solve this problem by having distinguishable Hollerith and binary modes on tape, and using an illegitimate Hollerith character. But any character could occur in binary information. This is a serious problem to which a solution is highly desirable. When read, the special mark should cause immediate cessation of reading, with an indicator set that can be tested by program. It should therefore be simple, so as to be easy to recognise without elaborate testing on everything read. Possibly the answer might be to write the ones complement of the check sum instead of the check sum itself. For input/output also, an end-of-record (end-of-card, end-of-printed line) might be desirable, as a precaution against getting out of step in the long blocks (512 words holds enough information for about a printed page). This is easily provided, as the information would be Hollerith. so an illegal character can be used. However, agreement with the ICT 1301 would be nice.

Secondly, what does "read until code word" mean?

It is not clear exactly what "the complete label stored in block 0" of a tape is. Instructions to the operator can usefully be of the kind:

"Remove the reel now on tape mechanism t, label it ....... and file it."
"Load the reel labelled ... on tape mechanism t."
"Load a common tape on tape mechanism t"
"Remove the reel now on tape mechanism t, label it ......, print it offline, and file it."
"Remove the (already labelled) reel now on tape mechanism t, and file it."
"Return the reel now on tape mechanism t to the common pool."

with, obviously, a few variants. But here a label means what is written on a piece of paper stuck to the reel; it is for operator's use, not computer use, in identifying the tape. It would normally be 8 Hollerith characters, and might well also be written in block 0 of the tape for computer use but then it would occupy only one word. If this is understood, then 20, 21, 22 and perhaps 19 become comprehensible, but others would be needed.

File protection: How are tapes protected against writing? An over-riding protection consisting of some physical operation on the reel (as with IBM tapes), not merely the mechanism, is desirable; and if the reel is not itself protected the program should be capable of setting and unsetting protection for the mechanism.

Instructions such as "space tape on mechanism t n blocks forwards/backwards" are desirable.

Is the program interrupted each time a word is to be transferred to or from tape, or is a store cycle "sneaked" by hardware?

I DISTRIBUTION LIST

Ferranti: Dr S Gill; Mr P D Hal; Mr G G Scarrott
Harwell - Theoretical Physics: Dr W M Lomer; Dr J Howlett; Mr A R Curtis; Mr E York; Dr I C Pyle; Dr K W Moreton; Mr E B Fossey; Mr J Gabriel; Dr K J Roberts
Harwell-Electronics: Mr E H Cooke-Yarborough
Winfrith: Dr G N Lance; Mr I C Pull
Aldermaston: Dr Corner
Risley: Dr G Black
Manchester University: Prof T Kilburn; Mr R A Brooker; Mr R B Payne