Chilton::ACL::Famous Bryant Baylis Comparative Fortran Report

FORTRAN - A Comparative Study

P E Bryant and M J Baylis

September 1968

1. Summary
2. Introduction
3. The Machines
4. The Compilers
5. The Programs and Findings
6. Conclusions

1. SUMMARY

This paper discusses the code generated by the Fortran compilers on various computers. It shows how compiling strategy and machine architecture (the two are interactive) can have significant effects on machine performance, and is based on some rather crude experiments. The reader will no doubt discount those parts he finds to be too outrageous. The underlying reason for attempting this study is the authors' wish for well defined criteria in the comparison of computer systems. To put this as basically as possible, when a manufacturer says his next computer is 10 x× Atlas, we would like this to be so well defined that it could be a contract statement in the conditions of sale. Clearly the quality of Fortran object code is only one of many evaluation parameters; all we claim is that this is a small objective study leading some short way towards a major objective.

2. INTRODUCTION

This study arose from thoughts about how to assess computer performance, particularly in a large scientific batch processing environment. In such an environment the important fact is how many jobs can get done in a given time. This throughput figure does not correlate well with add times, Gibson mix times, Post Office Work Unit times or any other existing criteria. The absence of such a standard is surprising, especially so when the published work load profiles from such places as CERN (CDC6600), Science Research Council Chilton, London University, Manchester University (all ICL Atlas), University College, London (IBM 360/65) and many others show a large amount of consistency in the type of jobs done. See for example (1), (2) and (3).

Various attempts to run benchmarks consisting of sets of Fortran programs have been made fairly recently, (4), (5). We consider that some effort should be made to standardise on a comprehensive evaluation technique for such an environment; this work could well be managed by the Ministry of Technology Technical Support Unit.

The main factors affecting throughput of a computer system can be briefly summarised as the architecture and its implementation in terms of hardware and software. Basic weaknesses in the architecture of a computer system are much more apparent in large machines where one expects a sophisticated operating system and powerful compilers. If the hardware provided is not very suitable for these software requirements then such a machine is unlikely to be very competitive or economically viable in a free market.

We do not intend to discuss this further here.

The history of this study is as follows. The eleven Fortran jobs of the CERN mix (4) were obtained and the intention was to run these singly and as a batch on all machines we had access to. It turned out that only four of these jobs were small enough to run unaltered on the machine we were most interested in. These four jobs were run on the ICL Atlas, ICL 1900, CDC6600, Univac 1108, IBM 360/65, ICL System 4/50, and the DEC PDP-10.

After thinking about the results, we concluded that publication of the running times in isolation is at present misleading and of little rational use. What seemed of more immediate interest was a comparison of the compiler techniques and performance. The latter can be measured in terms of how many orders are generated and obeyed to execute the programs. We believe that the effect of machine architecture on Fortran compiler performance can be judged in this way, and although we have only examined four short programs we feel the outcome to be of general interest.

It is worth restating that the report deals with a comparison of only one aspect of Fortran compilers. How this aspect affects machine performance depends strongly on the user environment. Other Fortran characteristics which are clearly important include compiling speed, store and other system component requirements, diagnostic facilities, A.S.A. compatibility and so on. It is also worth emphasising that the most desirable Fortrans can be rendered impotent by badly designed operating systems, and that a good comparative study of operating systems would be very useful at the present time, (6), (3).

The paper was previously issued in draft form; in view of the interest and criticism it has been altered extensively. We would like to record our grateful thanks to the staff at those installations which have tolerated and co-operated with us. Also to state that there may be occasional errors on points of detail in the following discussion and that any opinions specific or implied are the authors personal ones.

3. THE MACHINES

The computers looked at are tabulated below together with some of their characteristics: it is assumed that the reader is roughly acquainted with them.

Word Length	No. Fl Pt Accs	No. Index Registers	Index Registers per Instruction	Address Field (K)	Indirect Addressing	Automatic Index Incrementing	Character Sizes (bits)
Atlas	1	about 90	2	Whole core			6
1900	1	3	1	4			6
360	4	15	2	1			8
1108	16	15	1	64	yes	yes	6
PDP10	16	=15	1	256	yes		6
6600	8	7	1	256			6

Floating point instruction formats

Notes

B: Index Register. In the 1900, the first B is effectively part of the function for floating point instructions.
H: Automatic Index Incrementation bit i.e. lower half of B incremented by top half.
P: Partial word designator. Can be looked at as part of F.
360: short instructions allow Acc to Acc arithmetic
6600: Short instructions are 3 address accumulator, index or address register instructions
6600: 6600 has no store-acc arithmetic

All of the machine have other types of instructions, for short integer arithmetic, testing purposes, input/output etc., in some cases with different formats.
Atlas, 360 and 1900 have a quite sophisticated separate unit for manipulating the indexes, in fact the machines can survive without the floating point unit. The machines have no X ® B or vice versa instructions, which is a handicap.
The 1108 has only simple index arithmetic instructions; however, the X and B registers are the same length (36 bits) and overlap by 4.
On the PDP10 the X and B registers are identical.
The 6600 has little index ari themtic but has X ® B transfers.
On the 1108, 1900 and PDP10 the index registers are also addressed as the first few words of store, as are the accumulators on 1108 and PDP10.
On all the machines the A field can also be used to load literals directly into the index registers.
(Criticism of the hardware here is from a users point of view and bears no relation to the difficulty of hardware implementation).
The great weakness of the 1900 and 360 is the short address field which prevents ease of access to large arrays of data.
The 1900 in particular, and Atlas to a lesser extent, is hampered by having a single accumulator. The result is that some types of program optimisation are impossible.
The 1900 is further hampered by lack of index registers which causes excessive loading of the existing ones.
Atlas, 1900 and 360 would be improved by X ® B instructions. Absence of these instructions, especially in the case of Atlas, prevents the easy use of accumulator results for indexing. The 1108 and PDP10 have very good facilities in this respect.
360 and 6600 are improved by their X ® X instructions which are inherently fast as there are no operand store accesses involved.
The 6600 is severely handicapped by the lack of store into X arithmetic and also the fact that five X registers can only receive information from the store and the other two can only write. Hence statements of the form A = B take 3 instructions.
The Atlas instruction set is hampered by the lack of good index instructions such as shifting, multiplication and division, and also because the accumulator division is poor.
The PDP instructions for clearing store, setting store to 1 or incrementing store by 1 without using the accumulators are powerful, and especially useful in the context of a FORTRAN compiler.
The 360 architecture lends itself to the production of re-entrant code and hence to real time computing.

There are many other facilities on the machines that are not exercised by FORTRAN and perhaps it should be borne in mind that other compilers and especially assembly code can benefit tremendously from some of them.

4. THE COMPILERS

4.1 The Atlas Compiler (London Version) with some notes on Hartran

The Atlas Compiler treats dummy arguments by planting the address in the code. The Hartran version uses index registers to hold the addresses of dummy arguments. Thus the London version can use double indexing for subscripting whereas Hartran is limited to single indexing.

Atlas optimises indexes throughout a statement but does not extract common sub expressions (Hartran does).

No optimisation over statements is done except in the case of DO loops where the DO indexes are kept in index registers and can be used directly. (Not done in Hartran).

Immediate IF successors are not recognised (they are in Hartran).

Integer arithmetic is hampered by lack of good index register arithmetic; hence integer arithmetic is performed in the accumulator and has to be fixed afterwards giving a four instruction overhead.

Lack of accumulator to index register instructions is also notable as it hampers the use of expressions in sub scripting.

In general the code produced by Hartran is comparable to the code produced by the London Compiler except in the case of multiple subscripted arrays, which is achieved by dope vectors in the latter case, saving a lot of effort.

The loading system surrounding the London version is superior to HARTRAN in so far as fewer machine resources are used.

The compiler reflects the small number of man years involved in its implementation and can be rated as reasonably good. The index registers could be used to much better advantage. for optimising the use of small integers in many cases. The machine is hampered in having only one accumulator so that it is difficult to keep useful quantities available. The lack of Ac ® index register instructions is limiting. Any worthwhile optimisation would be expensive in that the design of the machine is more suited to producing good straight-forward compilers.

The object code listing is poor at present in that cross referencing is tedious, and is currently being improved. The Hartran object listing is excellent, but only obtainable as an option which precludes execution.

There are about 3 other Atlas Compilers, one written at Manchester by ICL which is similar in performance to the ones considered, and another which has been written at AWRE for the Atlas II which is thought to have much more optimisation than the others. Cambridge have a version based on the AWRE compiler and there is a little used FORTRAN II compiler written at Manchester.

4.2 The PDP10 Compiler (F4 V002)

The principal criticism of the PDP10 compiler lies in the treatment of dummy arguments. This compiler treats scalar dummies by actually transferring the arguments from the calling routine and planting them in the called routine's private area, with a similar sequence in the epilogue. For scalars this is good, as dummy scalars get treated exactly as private ones. Dummy arrays are treated by transferring the address of the array between the routines and treating the accessing of the array differently. In fact before each array access, the subscript must either be set up in X 2 and the array accessed indirectly (which of course is no hardship if the subscript is a variable) or 2 instructions are required to compute the element address in an index register. The latter approach is taken in the case of array subscripts being loop indexes and hence causes an overhead of 2 instructions. This causes considerable overheads in some types of matrix work. In either case a prologue to plant the addresses of arrays directly in the code would be far preferable. It would of course mean a substantially longer prologue but this would be more than counterbalanced by the shorter code. A subsidiary benefit would be that less special action would be required by the compiler in the treatment of dunmy arrays.

The large order code of the PDP10 is well exercised by the compiler in that statements of the form Y = 0.0 and I = -1 for example are recognised and take only 1 instruction and moreover do not destroy the accumulators.

The treatment of statements takes advantage of the multiple accumulators, in general it prefers accwnulators or indexes in the order 0 upwards. Acc 16 is reserved for the link and Acc 15 is used for DO loops.

There is no optimisation over statements except in the case of the inner loop of any DO loops, in which case the DO loop index is preserved in Acc 15 and used in subscripting. No attempt is made to remove code to outside DO loops.

The compiler represents a fairly simple minded approach to the task. It does take advantage of some of the good hardware features but has a noticeable inefficiency on dummy array arguments as described above.

The hardware makes optimisation over statements an attractive proposition, by preserving useful quantities in the Accumulators and using them more fully. Better use of the accumulators in DO loops would also yield useful savings.

The compiler is good but could be better. The machine is very easy to write good code for. The object listings produced by the compiler are moderate in that the listing is only the symbolic assembly listing and the actual bit patterns are absent. DEC informs us that the bit patterns can be obtained by another listing option but the writers have not investigated this. The compiler operates in 9K of store which is quite an achievement and explains its lack of sophisticated optimisation. We are informed that a version called BASIC which differs only in the restricted source language and produces the same quality object code operates in only 5.5K!

4.3 The 1108 Compiler (IV level F5004.C, and V level F4014)

The 1108 hardware is somewhat similar to the PDP10. The compilers on the machines are however very different in that considerably more effort has been put into the 1108 one. The treatment of dummy arguments is by planting the aJdress in the routines' prologue. Hence the compiler treats all dummy arguments identically. All variables whether dummy, private or common are treated similarly in the code produced. The compiler might be improved if scalar arguments were accessed by indirect addressing, however indirect addressing tends to slow a machine down so some slight speed increase has been achieved and simplicity of approach made at the expense of code in the prologue, and the trade off is not clear. Other prologue options are said to exist.

The treatment of statements in isolation is good and the multiple accumulators are used to advantage.

Optimisation over statements is carried out in some cases. The accumulators tend to be used cyclically and the contents remembered for possible future use. This can easily be frustrated by certain branching instructions. Also, previous statements are not affected by the following ones to be executed thus not deliberately leaving registers set up.

DO loops are particularly well done. The indexes are never actually created unless used in the loop and a counter is kept in fast store (the accumulators and index registers are in fast store and there are also a few extra fast registers accessed as main store which can be used). Redundant DO loops (i.e. two DO loops nested entirely and with no references to the indexes) were observed to be amalgamated into one DO loop. A superb piece of optimisation but of doubtful value. DO loops with indexes used as subscripts only keep their values in index registers to advantage. The compiler also tries to set up complicated subscripts outside DO loops, again to great advantage. Both compilers attempt to reorder code to remove it from DO loops, but the V version appears to be much the more successful in this.

The compiler is an excellent piece of work and has no serious drawbacks. Optimisation to any further depth would probably not be worth the effort. The machine is very easy to compile good code for. The object listings produced by the compiler are excellent. Nothing is known about any other FORTRAN compilers which may exist.

4.4 The 1900 compilers (XFAM and XFAT)

XFAT is a later version of XFAM; we describe XFAM and its limitations first (most of which are due to the machine architecture) and then discuss how XFAT differs.

The greatest drawbacks in XFAM are that constant subscripts are not recognised and that multiple subscripting is done by subroutine, by special purpose subroutines in the case of double subscripts.

The machine being able only to directly address the first 4K of 24 bit words has faced the compiler writers with serious difficulties. The writers have opted for utilising the first 4K of store for holding constants, scalars and pointers to arrays. Since this region is also used for many other purposes, the user is limited to some 1 to 1.5K of floating point scalars (each two words long). The pointers to arrays, also held in the first 4K of store, are doubly indirect in that the pointers are really pointers to array information which in turn points to the array. This has been done to preserve some sort of compatibility with the Algol and other compilers and also to relieve the first 4K of store from being excessively loaded. This approach and the non-recognition of constant subscripts entails an overhead of 5 instructions on each floating point array access. The double subscripting process involves an overhead of 15 instructions. The overhead on more than 2 subscripts is unknown.

The compiler does no optimisation over a statement or over groups of statements in any way. In fact statements are compiled rather primitively in that many preservings of both the accumulator and index registers could be prevented by re-ordering the compiled code. The preserving of computed addresses of array elements would be very valuable, considering the cost of computing them. It is difficult to hold these in the index registers themselves since there are only 3 of them which can be used as modifiers and it would probably be wise to preserve them in main store. This would be particularly valuable in the case of DO loops where the cost of computing the addresses could be taken out of the loops at the expense of some code at the end of the loop for incrementing the addresses, as done for example on the System 4.

Since an instruction address field cannot access array storage directly it is impossible to use the techniques of Atlas and 1108 for passing dummy array arguments. In fact the addresses of the pointers to the pointers to arrays (held in the 1st 4K of store) are passed across and access proceeds as for private arrays. A similar technique to Atlas for scalars is difficult as the program can only be overwritten by indexing. This increases the difficulty of dummy scalar access and presents a one instruction overhead.

Integers are limited to 24 bit accuracy which can cause difficulty in some programs. The XFAM compiler is limited to work in 32K; it is unlikely that extension to larger cores will effect the compiled code adversely.

Optimisation of the compiled code is difficult due to the fact that there is only one accumulator (which prevents the preservation of useful information) and because there are only 3 index registers (which are overworked due to the short address field). A further complication is that a subscript in an index register is effectively destroyed by adding an array base to it for accessing the array so any attempt to keep DO loop indexes in index registers is almost impossible.

The compiler generates a call to a prologue routine before each subroutine which exits via a corresponding epilogue.

The machine has a poor architecture for producing compiled code and makes code optimisation an extremely difficult task. The three main factors influencing this are the short address field, the small number of index registers, and the single accumulator.

The XFAT compiler is a development from XFAM and improves the position in a number of ways. The main improvement is in the technique of array accesses. Double subscripting is managed by direct compiled code. Constant subscripts are recognised and in favourable circumstances double arrays can be treated as single ones, with obvious benefit. This is particularly noticeable in the job GAMMA described later.

Single subscripting of private arrays is reduced by two instructions, or three in the case of constant subscripts. The method used here is to split the base address into two; that is into a line part which fits into the address field of one instruction and a shifted block part which is planted in another; the latter accessing the top part of the array base address. This software paging technique cannot be used for dummy arrays because the base address is not known at load time. It is also not used if the array is indexed by a variable loop index, possibly because there is a danger of overflow if the index is too large.

Forward jumps in XFAT are direct rather than via a jump table as in XFAM.

The routine prologue has been shortened, and the code to deal with dummy arrays removed to subroutines.

There may also be other changes not shown up by our four programs.

The object code listings can be obtained, although the way of doing this is not publicized; they are poor and have no cross-references to the source code, perhaps because they are not intended for the general user.

The 6600 Compiler

The 6600 is unique in there being no accumulator - store arithmetic and in the means of transferring to and from the store. The result is that in general the 6600 requires relatively more instructions to perform its arithmetic, though the extra instructions involve only the accumulators and are thus fast. The machine has ten autonomous arithmetic units and can proceed independently on many instructions. This feature of the machine tends to more than mask any increase in the actual number of instructions obeyed. This machine is difficult to assess by merely studying the code produced and comparisons of this are not always meaningful. The net effect is difficult to analyse especially as the machine is highly sensitive to the order of instructions.

The transmission of arguments between routines is by passing across the addresses of the scalars or arrays in index registers. The small number of index registers forces the compiler to find other means of transferring arguments in excess of 6. This is achieved by the calling routine planting the addresses of the arguments in a table in the called routine. Since a strenuous attempt is made to keep dummy argument addresses in index registers this is quite reasonable but one could envisage that some routines could be written which run inefficiently.

The compiler writer has grave difficulties in obtaining the best from this machine. The compiler has to try to re-arrange the code to an optimal order to avoid clashes in the arithmetic units. Since instructions are of two lengths, 15 and 30 bits, and words are 60 bits, (and 30 bits instructions must not overlap a word) the compiler has to arrange its code to fit in with this restriction and with a minimum of null instructions. The compiler studied achieves both the above objectives.

As for optimisation; there is only optimisation of DO indexes, which are preserved in an accumulator where possible. Optimisation over a statement is done quite well with common indexes and common subexpressions being recognised.

The compiler produces only average listings. It is difficult to tie the various instructions to statements and instruction addresses to identifiers.

The compiler represents a very good attempt at FORTRAN for a difficult machine. The compiler could easily produce disastrous code. Remembering the novel design of the machine and its sensitivity to code compiled for it, it remains to be seen whether this type of architecture is a pattern for the future or not. There is a later version, reputed to have good optimisation, which we have not investigated.

4.6 The 360 compilers H00 and H02

This computer has one similarity with the 1900 in that it has a short address field. The machine has three lengths of instructions, of 48, 32 and 16 bits. The 16 bit instructions are used mainly for accumulator to accumulator arithmetic.

The H02 produces an optimised form of the H00 code. The two compilers are necessary to give bad code fast or good code slowly and it is profitable to look at the H00 compiler first.

Since the 360 can only address 1K of 32 bit words the 1900 solution of reserving the first 4K of store for scalars and constants is impossible, and the 360 overcomes this problem by setting a number of base registers. The base registers include the base of the subroutine, the array of scalars and constants, the link, the argument list etc.

In view of the short address field limitation, arguments cannot be planted in the compiled code by a prologue; instead the address of the argument list is passed across. Since, before any store access, an index register has to be set up to assist the access this is no drawback. The called routine transfers scalars into the routine's private store and returns them in the epilogue.

The code produced by the level H00 compiler is exceedingly simple minded. In fact little or no optimisation is attempted. The only optimisation observed was the recognition of identical IF successors. The code produces 3 instructions before each array access to load the base and load the subscript and adjust it. Advantage is taken of the double indexing and the base is placed in one index and the subscript in the other. The compiler does go to the extent of keeping the few bases permanently loaded (in fact the set mentioned above). This means that all scalars and constants are accessed by one instruction. GO TO statements take 2 instructions as they are achieved indirectly to allow for the case of jumps over 1K words. The great failing of the compiler is the array accessing. Little or no exploitation of the multiple accumulators is made though the multiple index registers are reasonably well used.

The level H02 compiler optimises the code produced by the H00 version in four ways. Firstly the index registers are loaded as early as possible and kept loaded if their contents are useful. In practice this means that array indexes are loaded outside DO loops and an attempt is made to preserve them throughout the loops' duration. In addition to this, such quantities as loop indexes are kept in index registers if required as loop subscripts. The second degree of optimisation is achieved by keeping useful quantities computed in the accumulators for future use. This has the side effect of decreasing the number of store-acc instructions at the expense of acc-acc instructions with the net effect of shortening the program and increasing the speed. The third type of optimisation is that statements or parts of statements in DO loops which are not dependent on the loop index are pulled out of the DO loop. Fourthly, statements are heavily optimised to recognise common subscripts and common subexpressions. A statement in this context is taken as the statements between labels and jumps.

The effect of the optimisation is that the number of instructions obeyed in any piece of code is reduced by an average of 50%. The final code produced takes about the same number of instructions to execute as that on Atlas.

This compiler is the most highly optimised looked at. In fact the machine's hardware gives scope for optimisation and needs it. However, optimisation increases the compile time by a factor of three. It is doubtful whether any further optimisation would obtain much noticeable improvement.

The object code listing is good but is marred by the fact that the source code appears as a block before the object code, thus increasing the difficulty of cross-referencing.

The machine is a difficult one to produce good code for, again handicapped by a short address field.

There are other 360 Fortran compilers that we did not investigate. The G compiler is the most used one and appears to be similar to the system 4 one, that is, rather better than H00. There is now an H03 with even more optimisation than H02. Mention should also be made of Watfor, a customer written core resident compiler designed to produce poor code very quickly and give rapid throughput in a University environment.

The System 4 compiler (Fortran IVL01)

The ICL system 4 is inherited from EEC, design bought from RCA and copied from IBM360, so comments about the 360 architecture apply here. The compiler produces code which is better than the 360 H00 but not as good as the 360 H02. A number of base registers are permanently set up to point to the program, data, constant and temporary storage bases. The program base is changed at convenient intervals through routines rather than only when necessary. The compiler attempts to keep bases of arrays in index registers. Addresses of dummy arguments are passed across in index registers, as on the CDC6600, which eases the prologues task.

Optimisation is good over a statement but rudimentary over groups of statements in so far as only scalar results of previous statements are remembered in the accumulators for possible future use. No re-ordering of code is attempted, and no removal of code from loops. DO loops are well done in that the loop keeps the index both as an integer count and as an integer for subscripting where possible; this saves quite a few instructions in complicated loops. No other Fortran compilers are known to exist for the System 4.

5. THE PROGRAMS AND FINDINGS

We describe here each job and outline points of interest in the compiled code. The number of instructions needed to execute the program is given, parametrically in some cases. This number does not include the master routine where this exists separately as in general it is simply a device for input/output of data/results. Taking the 1108 V number of orders as 1.0, the other compilers are shown as ratios to this. The code space in words and bits is also shown for the programs, again excluding the master routines. Finally, the program is listed statement by statement and the number of instructions to execute each statement is shown for all the compilers. A plus sign against the number of instructions indicates a subroutine entry; the number after a plus, if present, shows how many subroutines are entered.

5.1 IF

      I=-1
      J=0
      K=1
      DO 300 L=1,1
      GOTO 200
100   IF (I)101,200,201
101   IF (J)200,102,201
102   IF (K)200,201,103
103   IF (I)104,200,200
104   IF (J)200,105,200
105   IF (K)200,200,106
106   IF (I)107,107,200
107   IF (J)200,108,108
108   IF (K)109,200,109
109   IF (I)110,200,110
110   IF (J)111,111,200
111   IF (K)200,112,112
112   CONTINUE
      GO TO 300
200   IF(I)201,100,101
202   IF(K)100,101,203
201   IF(J)100,202,101
203   IF(I)204,100,100
205   IF(K)100,100,206
204   IF(J)100,205,100
206   IF(I)207,207,100
208   IF(K)209,100,209
207   IF(J)100,208,208
209   IF(I)210,100,210
211   IF(K)100,212,212
210   IF(J)211,211,100
212   CONTINUE
      GOTO 100
300   CONTINE

This job is an artificial test program, and part of it has been extracted here to illustrate various points. The job consists entirely of IF and GO TO statements.

The 360 H00 compiler produces poor code for this job. Jumps take two instructions, as the compiler is unaware initially of whether the target address in the jump is or is not within the IF range imposed by the base register setting. In the H02, the code optimisation reduces the jumps to single instructions on recognition that they are in range. The system 4 compiler recognises the distance of jumps. The H00 and System 4 load and test a test register for IF statements. H02 keeps I, J and K permanently in index registers.

The code produced by XFAM and XFAT is identical except that XFAM does forward jumps indirectly via a jump table and XFAT does them directly. For jump instructions, the 1900 has a 32K address field, and although this cannot be modified the extra bits in the address field give it good performance for this program.

All compilers except the 1900 and London Atlas recognise immediate IF successors and plant code appropriately. In this example nothing is gained from such recognition except some space. Where overlapped instruction execution is important (e.g. 6600, Atlas) there is advantage to be gained from avoiding jumps.

Multiple accumulators and index registers are not utilized for this program. It is also noticeable that the 6600 is unable to employ its short instructions.

Compiler	Instructions obeyed		Ratio to 1108 V	Code space
Compiler	long	short	Ratio to 1108 V	words	bits × 10²
London Atlas	90		1.4	100	48
Hartran	69		1.0	82	39
PDP-10	69		1.0	75	27
1108 IV	66		1.0	83	30
1108 V	66		1.0	82	30
1900 XFAM	105		1.6	99	24
1900 XFAT	80		1.2	99	24
6600	83	0	1.3	48	29
360 H00	80	62	2.2	116	37
360 H02	44	25	1.0	63	20
System 4	98	15	1.7	111	36

Instruction	London Atlas	Hartran	PDP 10	1108 IV and V	1900 XFAM	1900 XFAT	6600	360 H00	360 H02	System 4
I=-1	2	2	1	2	2	2	2	3	2	2
J=0	2	2	1	1	2	2	2	2	1	2
K=1	2	2	2	2	2	2	2	2	1	2
DO 300 L=1,1	3	6	2	4	2	2	1	1	1	6
GOTO 200	1	1	1	1	2	1	1	2	1	1
100 IF (I)101,200,201	2,4,3	3,2,3	3,2,3	3,2,3	3,4,5	2,3,4	3,2,3	6,4,6	3,2,3	4,3,4
101 IF (J)200,102,201	2,4,3	2,3,3	2,3,3	3,2,4	3,4,5	2,3,4	3,2,4	4,6,6	2,3,3	3,4,4
102 IF (K)200,201,103	2,4,3	2,3,3	2,3,3	3,2,3	3,4,5	2,3,4	3,2,3	4,6,6	2,3,3	3,4,4
103 IF (I)104,200,200	2,3,3	2,2,2	2,2,2	3,2,3	3,4,4	2,3,3	3,2,3	4,4,4	2,2,2	3,3,3
104 IF (J)200,105,200	3,2,3	2,2,2	2,2,2	1,1,1	4,3,4	3,2,3	2,2,2	4,4,4	2,2,2	3,3,3
105 IF (K)200,200,106	4,4,3	2,2,2	2,2,2	2,1,2	3,4,5	2,3,4	3,2,3	4,4,4	2,2,2	3,3,3
106 IF (I)107,107,200	4,4,3	2,2,2	2,2,2	2,1,2	3,4,5	2,3,4	3,2,3	4,4,4	2,2,2	3,3,3
107 IF (J)200,108,108	2,3,3	2,2,2	2,2,2	2,1,2	3,4,4	2,3,3	3,2,3	4,4,4	2,2,2	3,3,3
108 IF (K)109,200,109	3,2,3	2,2,2	2,2,2	1,1,1	4,3,4	3,2,3	2,2,2	4,4,4	2,2,2	3,3,3
109 IF (I)110,200,110	3,2,3	2,2,2	2,2,2	1,1,1	4,3,4	3,2,3	2,2,2	4,4,4	2,2,2	3,3,3
110 IF (J)111,111,200	4,4,3	2,2,2	2,2,2	2,1,2	3,4,5	2,3,4	3,2,3	4,4,4	2,2,2	3,3,3
111 IF (K)200,112,112	2,3,3	2,2,2	2,2,2	2,1,2	3,4,4	2,3,3	3,2,3	4,4,4	2,2,2	3,3,3
112 CONTINUE	0	0	0	0	0	0	0	0	0	0
GO TO 300	1	1	1	1	2	11		2	1	2
200 IF(I)201,100,101	2,4,3	2,4,3	2,3,4	3,2,4	3,3,4	2,3,4	3,2,4	4,6,8	2,3,4	3,4,5
202 IF(K)100,101,203	2,4,3	2,4,3	2,3,4	3,2,4	2,3,5	2,3,4	3,2,4	4,6,8	2,3,4	3,4,6
201 IF(J)100,202,101	2,4,3	2,4,3	2,3,4	3,2,4	2,3,4	2,3,4	3,2,4	4,6,8	2,3,4	3,4,5
203 IF(I)204,100,100	2,3,3	2,3,3	3,2,2	3,2,4	3,3,3	2,3,3	3,2,4	4,6,6	2,3,3	4,6,6
205 IF(K)100,100,206	4,4,3	3,3,2	3,2,2	3,2,4	2,3,5	2,3,4	3,2,4	6,6,4	3,3,2	5,5,3
204 IF(J)100,205,100	3,2,3	2,3,2	3,2,3	3,2,3	3,2,3	3,2,3	2,3,2	6,4,6	3,2,3	5,3,5
206 IF(I)207,207,100	4,4,3	3,3,2	2,2,3	3,2,4	3,4,4	2,3,4	3,2,4	6,6,4	3,3,2	3,3,5
208 IF(K)209,100,209	3,2,3	2,3,2	3,2,3	3,2,3	4,2,4	3,2,3	2,3,2	6,4,6	3,2,3	3,5,3
207 IF(J)100,208,208	2,3,3	2,3,3	3,2,2	3,2,4	2,3,3	2,3,3	3,2,4	4,6,6	2,3,3	5,3,3
209 IF(I)210,100,210	3,2,3	2,3,2	3,2,3	3,2,3	5,3,5	4,3,4	2,3,2	6,4,6	3,2,3	3,5,3
211 IF(K)100,212,212	2,3,3	2,3,3	3,2,2	3,2,4	2,4,4	2,3,3	3,2,4	4,6,6	2,3,3	5,3,3
210 IF(J)211,211,100	4,4,3	3,3,2	2,2,3	3,2,4	2,3,4	2,3,4	3,2,4	6,6,4	3,3,2	3,3,5
212 CONTINUE	0	0	0	0	0	0	0	0	0	0
GOTO 100	1	1	1	1	1	1	1	2	1	2
300 CONTINE	4	4	3	1	6	6	5	6	1	5

ODE:

      SUBROUTINE DYBDX(Y,F,N,X)
      DIMENSION Y(1),F(1)
      DO 1 I=1,N
      F(I)=X*(1.0+X*(0.1+X*0.01))
 1    CONTINUE
      RETURN
      END
      SUBROUTINE DA01A(Y,E,A,B,C,D,N0,X0,DX)
CA RUNGUE-CUTTA-MERSON
C FEB 1963 MCVICAR HARWELL
      DIMENSION Y(1),E(1),A(1),B(1),C(1),D(1)
      N=N0
      Z=X0
      H=DX/3.
      DO 1 I=1,N
      D(I)=Y(I)
 1    CONTINUE
      X=Z
      DO 41 J=1,5
      CALL DYBDX(Y,E,N,X)
      DO 21 I=1,N
      GOTO (11,12,13,14,15),J
 11   A(I)=H*E(I)
      Y(I)=D(I)+A(I)
      GOTO 21
 12   B(I)=H*E(I)
      Y(I)=D(I)+(A(I)*B(I))*0.5
      GOTO 21
 13   B(I)=H*E(I)
      Y(I)=D(I)+(A(I)+B(I)*3.)*0.375
      GOTO 21
 14   C(I)=H*E(I)
      Y(I)=D(I)+(A(I)-B(I)*3.+C(I)*4.)*1.5
      GOTO 21
 15   Z=D(I)
      D(I)=H*E(I)
      Y(I)=Z+(A(I)+C(I)*4.+D(I))*0.5
      E(I)=(A(I)*2.-B(I)*9.+C(I)*8.-D(I))/10
 21   CONTINUE
      GOTO (31,41,33,34,41),J
 31   X=Z+H
      GOTO 41
 33   X=Z+DX*0.5
      GOTO 41
 34   X=Z+DX
 41   CONTINUE
      X0=X
      RETURN
      END

This is a Runge-Kutta type integration program. It consists of a master routine, the main subroutine D0A1A and a small subroutine DYBDX.

DYBDX is interesting in that the function of X in the DO loop can be removed to outside the loop. The 1108 V and 360 H02 do this, with the H02 also finding that it can keep an accumulator loaded with this function for the entirety of the loop.

The 1900 code from XFAM and XFAT is almost identical. There is some space saving in XFAT as it does some of the work in passing arguments by subroutine, whereas XFAM does it directly. XFAT cannot use its improved indexing scheme as the arrays are arguments and the scheme relies on knowing absolute addresses at load time.

The usual factor of around two in improvement by H02 over H00 is noticed. Here this is obtained mainly by optimisation of the loop indices; they are kept in index registers in the correct form for accessing arrays.

The differences between London and Hartran are due to Hartran's better method of passing arguments.

The 1108 V usually passes arguments in the same way as 1108 IV. Here it does not do so, for undiscovered reasons, and its performance suffers correspondingly.

Compiler	Instructions obeyed		Ratio to 1108 V	Code space
Compiler	long	short	Ratio to 1108 V	words	bits × 10²
London Atlas	247+156N		1.4	294	141
Hartran	271+179N		1.6	225	108
PDP-10	238+233N		2.1	262	94
1108 IV	209+163N		1.5	221	79
1108 V	323+112N		1.0	273	98
1900 XFAM	312+451N+12S		4.0	458	109
1900 XFAT	214+446N+34S		4.0	412	98
6600	111+141N+6S	98+140N	2.5	161	96
360 H00	356+365N	86+27N	3.5	507	162
360 H02	413+134N	72+24N	1.4	288	92
System 4	358+226N	96+11N	2.1	252	80

(S = unknown subroutines), (N = program parameter)

Instruction	London Atlas	Hartran	PDP 10	1108 IV	1108 V	1900 XFAM	1900 XFAT	6600	360 H00	360 H02	System 4
SUBROUTINE DYBDX(Y,F,N,X)	13	12	9	10	13	22+	14+	310+	29	28	35
DIMENSION Y(1),F(1)
DO 1 I=1,N	3	6	2	4	12	5	5	1	2	14	6
F(I)=X(1.0+X(0.1+X*0.01))	7	8	9	7	2	17	17	21	12	1	8
1 CONTINUE	4	4	2	1	1	5	5	5	6	1	9
RETURN	0	1	1	0	2	3	2	1	3	3	2
END	2	9	7	3	5	+	+	0	13	13	9
SUBROUTINE DA01A(Y,E,A,B,C,D,N0,X0,DX)	65	25	14	64	92	64+	32+	516+	49	48	43
CA RUNGUE-CUTTA-MERSON
C FEB 1963 MCVICAR HARWELL
DIMENSION Y(1),E(1),A(1),B(1),C(1),D(1)
N=N0	2	2	2	2	2	3	3	4	2	2	2
Z=X0	2	2	2	2	2	3	3	4	2	2	2
H=DX/3.	3	3	3	3	3	4	4	5	3	3	3
DO 1 I=1,N	3	6	2	3	6	4	2	1	2	7	6
D(I)=Y(I)	2	2	6	2	2	14	14	9	10	2	3
1 CONTINUE	4	4	2	1	1	5	9	4	6	1	9
X=Z	2	2	2	2	2	2	2	3	2	2	3
DO 41 J=1,5	3	4	2	2	2	2	2	1	2	8	4
CALL DYBDX(Y,E,N,X)	3	2	5	2	1	1	1	16	4	4	11
DO 21 I=1,N	3	6	2	4	4	4	2	1	2	10	7
GOTO (11,12,13,14,15),J	1	3	4	4	4	2	2	5	8	7	4
11 A(I)=H*E(I)	3	4	7	3	3	15	15	9	11	3	4
Y(I)=D(I)+A(I)	3	4	9	2	2	22	22	11	16	4	3
GOTO 21	1	1	1	1	1	2	1	1	2	1	2
12 B(I)=H*E(I)	3	4	7	3	3	15	15	9	11	3	3
Y(I)=D(I)+(A(I)B(I))0.5	5	6	13	4	4	31	31	21	23	6	5
GOTO 21	1	1	1	1	1	2	1	1	2	1	2
13 B(I)=H*E(I)	3	4	7	3	3	15	15	9	11	3	4
Y(I)=D(I)+(A(I)+B(I)3.)0.375	6	7	14	5	5	32	32	24	24	8	6
GOTO 21	1	1	1	1	1	2	1	1	2	1	1
14 C(I)=H*E(I)	3	4	7	3	3	15	15	9	11	4	4
Y(I)=D(I)+(A(I)-B(I)3.+C(I)4.)*1.5	10	11	19	8	8	41	41	32	32	11	10
GOTO 21	1	1	1	1	1	2	1	1	2	1	1
15 Z=D(I)	2	3	4	2	2	7	7	6	5	2	3
D(I)=H*E(I)	3	4	7	3	3	15	15	8	11	3	3
Y(I)=Z+(A(I)+C(I)4.+D(I))0.5	7	8	16	7	7	33	33	26	25	13	7
E(I)=(A(I)2.-B(I)9.+C(I)*8.-D(I))/10	13	14	21	11	11	42	42	37	33	10	11
21 CONTINUE	4	4	3	3	3	5	5	7	6	7	10
GOTO (31,41,33,34,41),J	1	3	4	4	4	2	2	8	8	7	4
31 X=Z+H	3	3	3	3	3	3	3	5	3	3	3
GOTO 41	1	1	1	1	1	2	1	1	2	1	1
33 X=Z+DX*0.5	4	4	4	4	3	5	5	9	4	3	4
GOTO 41	1	1	1	1	1	2	1	1	2	1	1
34 X=Z+DX	3	1	3	3	3	4	4	6	3	3	3
41 CONTINUE	4	4	3	3	3	5	5	5	6	5	7
X0=X	2	4	2	2	2	3	3	4	2	2	3
RETURN	0	1	1	0	0	3	2	1	2	3	3
END	3	16	8	9	10	+	+	0	16	16	10

5.3 GAMMA

      FUNCTION GAMMA(U)
      DIMENSION CHEBY (2,7)
      X=U
      DIVI=1.0
      FAKTOR = 1.0
 1    IF (X .LE. 2.) GO TO 2
      X=X-1.
      FAKTOR = FAKTOR * X
      GO TO 1
 2    IF (X .GE. 1.) GO TO 3
      DIVI = DIVI * X
      X=X+1.
      GO TO 2
 3    X=X-1.
      CHEBY(1,1)=  0. 00053 96989 58808 2
      CHEBY(1,2)=  0. 00261 93072 82745 8
      CHEBY(1,3)=  0. 02044 96308 23590 4
      CHEBY(1,4)=  0. 07309 48364 14369 7
      CHEBY(1,5)=  0. 27964 36915 78537 8  
      CHEBY(1,6)=  0. 55338 76923 85768 9
      CHEBY(1,7)=  0. 99999 99999 99998 1
      CHEBY(2,1)= -0. 00083 27247 08683 7
      CHEBY(2,2)=  0. 00469 86580 79622 0
      CHEBY(2,3)=  0. 02252 38347 47260 3  
      CHEBY(2,4)= -0. 17044 79328 74746 0
      CHEBY(2,5)= -0. 05681 03350 86193 8
      CHEBY(2,6)=  1. 13060 33572 86556 2
      CHEBY(2,7)=  1. 00000 00000 00000 0
      DO 7 I=1,2
      DO 4 J=2,7
      CHEBY(I,1)= CHEBY(I,1) * X + CHEBY(I,J)
 4    CONTINUE
 7    CONTINUE
      IF ( DIVI .EQ. 0.) GO TO 5
      GAMMA=(FAKTOR/DIVI)*(CHEBY(1,1)/CHEBY(2,1))
      GO TO 6
 5    GAMMA = 1.0
 6    RETURN
      END

This program computes Gamma functions. It does not appear well-written; in particular the iterative loops at the start could be better programmed. A DATA statement would be useful to the routine. We have taken X = 2.5.

The improvement between XFAM and XFAT is due to the better treatment of double subscripted arrays. Constant subscripts are recognised and treated well. On a statement of the form CHEBY(1 ,6) = X, XFAM takes 14 orders, XFAT takes 4. However, machines with full address fields do this in two orders. Improvement on the statement labelled 4 is obtained by using open code in XFAT as opposed to XFAM's subroutines.

London performs better than Hartran because London uses dope vectors for sub scripting and manages to keep its indices in index registers. The differences are almost entirely in the statement labelled 4.

The H02 optimisation is mainly achieved on this statement and the containing loops.

The prologue and epilogue routines of XFAM and XFAT account for about 40 orders; they are omitted here together with the argument routine in XFAT and the 6600 prologue, both of unknown length.

Compiler	Instructions obeyed		Ratio to 1108 V	Code space
Compiler	long	short	Ratio to 1108 V	words	bits × 10²
London Atlas	201		1.2	104	141
Hartran	265		1.6	117	108
PDP-10	200		1.2	92	94
1108 IV	169		1.0	115	79
1108 V	169		1.0	122	98
1900 XFAM	950		5.6	202	109
1900 XFAT	518		3.1	160	98
6600	133	119	1.5	64	96
360 H00	376	43	2.5	132	162
360 H02	139	51	1.1	108	92
System 4	217	13	1.4	160	80

Instruction	London Atlas	Hartran	PDP 10	1108 IV	1108 V	1900 XFAM	1900 XFAT	6600	360 H00	360 H02	System 4
FUNCTION GAMMA(U)	6	8	6	17	20	4+	4+	2+	14	14	27
DIMENSION CHEBY (2,7)
X=U	2	2	2	2	2	3	3	3	2	6	2
DIVI=1.0	2	2	2	2	2	2	2	3	3	2	2
FAKTOR = 1.0	2	2	2	1	1	2	2	2	4	4	2
1 IF (X .LE. 2.) GO TO 2	3,4	3,4	2,3	4,4	2,3	7,9	7,8	6,6	3,3	2,6	3,4
X=X-1.	3	3	2	3	3	3	3	4	3	1	3
FAKTOR = FAKTOR * X	3	3	2	2	2	3	3	4	2	1	2
GO TO 1	1	1	1	1	1	1	1	1	2	1	1
2 IF (X .GE. 1.) GO TO 3	3,4	3,4	2,3	4,4	2,3	7,9	7,9	6,6	4,4	3,3	3,4
DIVI = DIVI * X	3	3	2	3	3	3	3	3	3	3	3
X=X+1.	3	3	2	3	3	3	3	3	3	3
GO TO 2	1	1	1	1	1	1	1	1	2	1	1
3 X=X-1.	3	3	2	3	3	3	3	3	3	3	3
CHEBY(1,1)= 0. 00053 96989 58808 2	2	2	2	2	2	15	4	3	2	2	2
CHEBY(2,1)= -0. 00083 27247 08683 7	2	2	2	2	2	17	4	3	3	3	2
DO 7 I=1,2	3	4	1	4	3	2	2	1	2	2	6
DO 4 J=2,7	3	4	2	1	1	2	2	12	2	6	6
CHEBY(I,1)= CHEBY(I,1) * X + CHEBY(I,J)	5	10	8	4	4	47	19		19	5	6
4 CONTINUE	4	4	2	1	1	5	5	4	6	1	9
7 CONTINUE	4	4	3	3	3	5	5	4	6	3	9
IF ( DIVI .EQ. 0.) GO TO 5	3,4	2,3	2,2	2,2	2,3	7,10	7,9	2,2	4,4	3,3	3,4
GAMMA=(FAKTOR/DIVI)*(CHEBY(1,1)/CHEBY(2,1))	7	7	6	6	6	45	10	8	9	6	6
GO TO 6	1	1	1	1	1	1	1	1	2	1	1
5 GAMMA = 1.0	2	2	2	2	2	2	2	3	2	2	2
6 RETURN	4	10	5	16	20	4	4	6	14	14	10
END

5.4 BESSEL

      SUBROUTINE BESSEL (H,X,CONT)
      DIMENSION G(9),V(13),H(9)
      DO 1 I=1,9
      G(I)=H(I)
 1    CONTINUE
      IF(CONT) 501,111,500
 500  IF(X-0.0015707) 1000,503,503
 503  IF(X-206.5335) 1001,1001,1003
 501  IF(X-0.0016422) 1000,504,504
 504  IF(X-27.4773) 1001,1001,1003
1003  G(1)=0.0
      G(2)=0.0
      GOTO 1002
1000  G(1)=1.0-CONT*X**2/4.
      G(2)=X/2.-CONT*X**3/16.
1002  G(3)=0.0
      G(4)=0.0
      GOTO 111
1001  V(1)=1.2533141373
      V(2)=-0.1566641816
      V(3)=0.0881112782
      V(4)=-0.0913909546
      V(5)=0.1344596228
      V(6)=-0.2299850328
      V(7)=0.379240973
      V(8)=-0.5247277331
      V(9)=0.5575368367
      V(10)=-0.4262632912
      V(11)=0.2184518096
      V(12)=-0.0668097672
      V(13)=0.009189383
      G(9)=CONT
 101  G(4)=2.*AINT(-0.007222*X*X+1.47222*X+5.
      Y=2./X
      G(3)=0.0
      GEE=G(3)
      G(6)=G(3)
      G(2)=1.0
      G(7)=-G(2)
      G(8)=G(2)
 106  G(1)=G(2)*G(4)*Y-G(3)*G(9)
      G(3)=G(2)
      G(2)=G(1)
      IF(G(8))102,112,112
 112  G(7)=-G(7)*G(9)
      G(6)=G(7)*G(3)/G(4)+G(6)
 102  G(5)=2.*G(9)+G(8)
      IF(ABS(G(5)-1.)-0.000001)103,113,113
 113  GEE=G(3)+GEE
 103  G(8)=-G(8)
      G(4)=G(4)-1.0
      IF(G(4)-1.0)116,106,106
 116  GEE=2.*GEE+G(1)
      G(6)=G(6)*G(7)
      G(5)=1.0
      IF(G(9))117,117,107
 117  G(5)=EXP(X)
 107  G(7)=G(5)/GEE
      G(8)=-.7267605*G(9)+3.2732395
      G(4)=ALOG(X)
      GEE=G(4)-0.1159315155
      GEE=0.25*G(1)*GEE+G(6)*G(9)
      G(6)=GEE*G(8)*G(9)
      G(1)=G(2)*G(7)
      G(2)=G(3)*G(7)
      G(3)=G(6)*G(7)
      G(5)=X*G(9)
      IF(G(5)+1.)104,104,105
 105  G(4)=G(3)*G(2)-0.125*Y*G(8)
      G(4)=G(4)*G(9)/G(1)
      GOTO 111
 104  G(6)=V(1)
      DO 120 I=2,13
      G(6)=G(6)+(1.0/X)**(I-1)*V(I)
 120  CONTINUE
      G(4)=EXP(-0.5*G(4)-X)
      G(3)=G(4)*G(6)
      GOTO 105
 111  CONTINUE
      DO 2 I=1,9
      H(I)=G(I)
 2    CONTINUE
      RETURN
      END

This routine computes Bessel functions. We have not counted the total number of instructions obeyed, though this could have been done by corrupting the program to record the passage of events. We did not find time for this.

The program uses single subscripting extensively, with some constant subscripts and private arrays.

The differences between XFAM and XFAT are mainly due to the latter's better single indexing.

H02 does not find so much scope for optimisation with code containing many IF statements.

Compiler	Code space
Compiler	words	bits × 10²
London Atlas	304	146
Hartran	296	142
PDP-10	255	92
1108 IV	294	106
1108 V	306	110
1900 XFAM	1001	240
1900 XFAT	652	156
6600	257	154
360 H00	382	122
360 H02	302	97
System 4	378	121

Instruction	London Atlas	Hartran	PDP 10	1108 IV	1108 V	1900 XFAM	1900 XFAT	6600	360 H00	360 H02	System 4
SUBROUTINE BESSEL (H,X,CONT)	30	12	7	39	44	20+	9+2	9+	24	22	33
DIMENSION G(9),V(13),H(9)
DO 1 I=1,9	3	4	2	2	2	2	2	1	2	9	6
G(I)=H(I)	2	3	4	2	2	14	12	7	9	2	3
1 CONTINUE	4	4	2	1	1	5	5	3	7	1	9
IF(CONT) 501,111,500	2,4,3	2,3,3	2,3,3	4,3,4	4,3,3	5,6,7	4,5,6	3,2,3	4,6,6	3,6,6	3,6,4
500 IF(X-0.0015707) 1000,503,503	2,3,3	3,3,3	3,3,3	4,4,4	4,4,4	6,7,7	5,6,6	6,5,6	5,5,5	4,4,4	3,3,3
503 IF(X-206.5335) 1001,1001,1003	4,4,3	3,4,4	4,3,3	5,4,6	5,4,6	6,7,8	5,6,7	6,5,7	6,6,4	4,4,3	3,3,4
501 IF(X-0.0016422) 1000,504,504	3,4,4	3,3,3	3,3,3	4,4,4	4,4,4	6,7,7	5,6,6	6,5,6	4,4,4	3,3,3	3,3,3
504 IF(X-27.4773) 1001,1001,1003	4,4,5	3,3,3	3,3,3	5,4,5	5,4,5	6,7,8	5,6,7	6,5,6	4,4,4	3,3,3	3,3,3
1003 G(1)=0.0	2	2	1	1	1	7	4	2	2	2	2
G(2)=0.0	2	2	1	1	1	7	4	2	2	2	2
GOTO 1002	1	1	1	1	1	2	1	1	2	1	1
1000 G(1)=1.0-CONTX*2/4.	6	6	6	7	7	15	12	11	7	8	8+
G(2)=X/2.-CONTX*3/16.	11+	10	9	8	8	19+	16+	10	11	9	11+
1002 G(3)=0.0	2	2	1	1	1	7	4	2	2	2	2
G(4)=0.0	2	2	1	1	1	7	4	2	2	2	2
GOTO 111	1	1	1	1	1	2	1	1	2	1	1
1001 V(1)=1.2533141373	2	2	2	2	2	7	4	3	2	2	3
V(2)=-0.1566641816	2	2	2	2	2	9	4	3	3	3	2
G(9)=CONT	2	2	2	2	2	8	5	3	2	2	2
101 G(4)=2.AINT(-0.007222XX+1.47222X+5.	12+	15+	11+	10	10	25+	19+	18	11	10	11
Y=2./X	3	3	3	3	3	4	4	2	3	3	3
G(3)=0.0	2	2	1	1	1	7	4	2	2	2	2
GEE=G(3)	2	2	2	2	2	7	4	3	2	2	2
G(6)=G(3)	2	2	2	1	1	14	8	3	2	1	2
G(2)=1.0	2	2	2	2	2	7	4	3	2	2	2
G(7)=-G(2)	2	2	2	1	1	16	10	3	3	3	3
G(8)=G(2)	2	2	2	2	2	14	8	3	2	3	2
106 G(1)=G(2)G(4)Y-G(3)*G(9)	8	8	7	7	7	39	23	11	13	9	7
G(3)=G(2)	2	2	2	2	2	4	8	3	2	2	2
G(2)=G(1)	2	2	2	2	2	4	8	3	2	2	2
IF(G(8))102,112,112	2,3,3	2,2,2	2,2,2	3,3,3	3,3,3	10,11,11	5,6,6	3,2,3	4,4,4	3,3,3	5,3,3
112 G(7)=-G(7)*G(9)	4	3	3	3	3	24	15	5	6	4	4
G(6)=G(7)*G(3)/G(4)+G(6)	5	5	4	4	4	39	23	9	11	5	6
102 G(5)=2.*G(9)+G(8)	4	4	4	4	4	22	14	7	6	4	4
IF(ABS(G(5)-1.)-0.000001)103,113,113	5,6,6	7+	7+	5,5,5	5,5,5	16,17,17+	12,13,13+	12,11,12	6,6,6	5,5,5	5,5,5
117 G(5)=EXP(X)	2+	5+	2+	2+	2+	10+	7+	10+	7+	5+	3+
RETURN	0	1	1	2	2	3	2	1	3	3	3
END	2	8	7	14	15	+	+	0	13	13	9

Some lines of code have been omitted where they can be deduced from other similar instructions.

5.5 Features of the compiled code

The following questions are listed A, B, C...... and an asterisk in the anSTIer matrix means yes.

Is there any optimisation os subscripts over statement?
Is there optimisation of con@on sub-expressions over a statement?
Is there optimisation over many statements?
Is code removed from DO loops where possible?
Are immediate IF successors recognised?
Are DO loop indices optimised over the loop in any way?
How are array accesses treated?
1. by code;
2. vectors by code, two dimensions and above by subroutine;
3. by code using dope vectors;
4. vectors and two dimensions by code, above by subroutine.
Are constant subscripts recognised?
Is division by a constant recognised and implemented by multiplication?
Is exponentiation by small integers done by repeated multiplication?
Is multiplication and division by powers of two treated either by shifting or exponent arithmetic?
How good is object code listing?
- e=excellent
- g=good
- m=moderate
- p=poor
Treatment of dummy arguments - passed across by prologue:
1. plants addresses in code;
2. passes address of argument list;
3. passes addresses in index registers.

Instruction	London Atlas	Hartran	PDP 10	1108	1900 XFAM	1900 XFAT	6600	360 H00	360 H02	System 4
A		*		*			*		*	*
B		*		*			*		*	*
C			*	*					*	*
D				(*V)					*
E		*	*	*			*	*	*	*
F	*			*					*	*
G	c	a	a	a	b	d	a	a	a	a
H	*	*	*	*		*	*	*	*	*
I		*	*	*						*
J	*	*	*	*			*	*	*
K			*	*?
L	p	e	m	e	p	p	m	g	g	p
M	a	c	b	a	b	b	c	b	b	c

6. CONCLUSIONS

It is apparent from our descriptions of the various compiler strategies, and from the comparative tables, that there is a wide range in quality and performance (and also probably in the ease of implementation).

For these, and indirectly for compilation speed, the Univac 1108 and DEC PDP10 undoubtedly lead the field. The ICL Atlas follows, and the principle reason for the superiority of these three machines is their long address fields. This is not to say that this is a pointer for the future - new machine designs may well emerge and do better with different approaches, but it is worth knowing what they will have to compete against.

The CDC6600 is the most powerful computer we looked at. For Fortran programming, its long address field is partly neutralised by some central processor features, and for what we have measured we place the machine just below Atlas.

The 1BM360 comes next with the ICL System 4, the multiple accumulators and index registers and double indexing to some extent compensating for the short address field.

The ICL 1900 comes last and is an exceedingly difficult machine to produce good code for. The 360, System 4 and 1900 are separated from the other machines by quite a large gap.

The theory that a short instruction length with a small address field leads to shorter programs is not borne out for Fortran compiled code. Measured in terms of number of bits used, there is little significant variation from machine to machine; takng library and data into account the differences are usually negligible. There is a variation in the ratio of red tape to arithmetic instructions, not brought out in the tables, with the short address machines needing rather more organisational instructions than the others.

It is felt that the quality of the compilers is related to the hardware architecture and there are other deductions that can be made from the comparative tables.

It is hoped that this small study will lead to further and better comparisons and evaluations both of compilers and operating systems. Comments, criticism and praise will be welcomed by the authors.

REFERENCES

The CERN 6600 work load. J. Symonds CERN Report 1967.
SRC Atlas Laboratory Annual Reports.
Correspondence. Software requirements of Universities. P. Samet. Computer Journal July 1968.
How fast is the 6600? CERN Report 1967.
A comparison of Atlas, Univac 1108 and CDC6600. P. Hughes. ICS London University Report 1968.
How to succeed in software? S. Michaelson. IFIPs 1968.
Reference Manuals for the machines involved.