Chilton::ACL::Compiling Techniques

Chapter 8: Arithmetic Expressions

8.1 Introduction

One of the major problems in the compilation of scientific languages is the production of efficient object code for arithmetic assignment statements in a reasonable compilation time. For example, the relative efficiency of two Fortran compilers in the execution of a program will depend almost entirely on the code produced for arithmetic statements. The term arithmetic statement is used here to include those statements which have evaluation of arithmetic expressions as part of the statement. For example the Fortran statement:

      IF(A + B*C + D) 10,11,12

is equivalent to:

      T = A + B*C + D
      IF(T) 10,11,12

where T is some temporary storage (possibly the accumulator) allocated by the compiler. In this example, coding the assignment statement will be considered as part of the problem, while the simplified IF statement will not. Similarly:

      DO 10 I = 1,5
      A(I,J) = B(I,J) + C(K,J)

where A,B,C, are 10 by 10 arrays, is equivalent to:

      I = 1
 1    L = 10*J + I + A - 11
      M = 10*J + I + B - 11
      N = 10*J + K + C - 11
      a(L) = a(M) + a(N)
      I = I + 1
      IF(I-5) 1,1,2
 2    CONTINUE

where A,B,C represent the base addresses of the arrays and the function a is used to imply that the statement means add the contents of location M to the contents of location N and store in location L. Here, all but the IF statement will be treated as arithmetic.

Optimisation of the code produced for arithmetic statements is usually criticised for being wasteful of compilation time. The criticism is that finding common sub-expressions and eliminating the excessive use of temporary storage, etc. could be done by the programmer and is therefore not the duty of the compiler writer. This argument might be reasonable if only the actual arithmetic assignment statements were being considered. However, in the augmented set, most languages do not allow the programmer to aid the optimisation of the code produced. The user is not able to point out the similarity between the addressing functions for A(I,J) and B(I,J) for example.

This chapter should therefore be considered as dealing with the arithmetic part of all statement types in the language rather than the simple assignment statement.

8.2 A hypothetical computer

The production of good object code depends to a large extent on the computer concerned. It is difficult to consider this question without having a specific computer in mind. Consequently, a hypothetical computer will be defined which has an order code sufficiently resembling those of genuine computers so that the strategies outlined in this chapter can be easily seen to be applicable fairly generally.

The hypothetical computer will be assumed to have a single accumulator. The order code will be assumed to contain both normalised and unnormalised floating point operations. Numbers will be assumed to be stored in the form b*1Oⁱ where i is an integer and b is a three-figure decimal fraction between 0 and 1. In the normalised form, the leading digit of the mantissa is not zero. For example, the normalised form of 5 is 500*10¹ , not .050*10² or .005*10³ . REAL variables will be stored as normalised floating point numbers. INTEGER variables will be stored with a fixed exponent of 3. Thus a variable having the value 57 would be stored as .057*10³ if it was INTEGER, but .570*10² if it was REAL. The order code consists of the six basic orders:

   L    Load accumulator from storage location.
   ADD  Add contents of storage location to accumulator.
   SUB  Subtract contents of storage location from accumulator.
   MPY  Multiply accumulator by contents of storage location.
   DIV  Divide accumulator by contents of storage location.
   ST   Set contents of storage location equal to accumulator.

These orders may produce a result which is either normalised or unnormalised, depending on whether the letter U or N follows the mnemonic. In the case of a normalised result, the exponent is adjusted until the first digit after the decimal point is non-zero. To do an unnormalised addition or subtraction, the smaller exponent is first made equal to the larger by adjusting the mantissa and then the operation takes place. For example adding .003*10³ to .270*10² will produce .030*10³ . Adding .750*10² to .000*10³ would produce .075*10³ . This could be used, therefore, as a means for producing the standard unnormalised form for INTEGER numbers.

The absolute value of the quantity being taken from, or set into, a store location will be used if the mnemonic has the letter A added at the end of the load or store instruction. Some examples of code generated for expressions involving the INTEGER variables I,J,K and REAL variables B,C,D are as follows:

1      I + J           LU I, ADDU J 
2      D = ABS(B - C)  LN B, SUBN C, STNA D

Notice that STUA would also have been correct here as the previous order has already produced a normalised result.

3      D = B * I       LN B, MPYN I, STN D
4      J = B           LN B, FIX   , STU J

The operation FIX is defined as the set of orders required which will change the normalised form of the accumulator to the standard unnormalised INTEGER representation. It could be, for example, the operation ADDU 0*10³ . This is an important operation and will be left as FIX as this may be quite different on different computers. The assumption will be that it requires the use of the accumulator.

5      I = J * I        LU J, MPYN I, FIX,    STU I
6      I = J * I + K    LU J, MPYN I, ADDU K, STU I

Notice that the addition of K is sufficient to force the result into the standard INTEGER form so that FIX is not required.

These examples give some indication of the variety of code required for special purposes. An abbreviated form of the mnemonic will be used without the relevant N, U qualifier if it is either irrelevant or obvious which should be used.

8.3 An elementary algorithm using an operator table

In Section 7.4 an algorithm was defined for parsing an operator precedence grammar. In Section 7.9 a slightly modified algorithm was given which used an operator table to define any special action required for a particular operator when it was involved in combination defined by the Precedence Table. This operator table could be used to produce code directly a combination occurs. In this case, there is no need to produce a tree representation of the expression as was done in Fig. 7.5. The contents of the accumulator will now be loosely equivalent to the tree pointers E_i produced in Fig. 7.4. The operand position of the stack will be set to contain either the address of the variable in that position or alternatively a special marker denoting the address of the accumulator. If it is necessary to store the contents of the accumulator in a working space during the evaluation, then this will be done by generating the code ST TEMP. The effect on the stack will be to replace the special marker denoting ACC by the location of the temporary variable. This then behaves like any other variable. The allocation of temporary storage will not be dealt with at this point.

On combination, code will be produced dependent on the top two operands and their infix operator. As this will result in the value of the expression being set in the accumulator, the operand position, where previously the tree pointer E_i would be set, now will contain the special marker ACC. At each combination, the possibilities are that one or both of the operands may be variables (denoted by V) and one may be the accumulator (denoted by ACC). Using TOS to represent the top operand of the stack, TOSM to represent the next lower operand of the stack and OP to represent the infix operator involved in the combination, then the code generated is as follows:

If any operand position lower than TOSM in the stack contains the marker ACC then produce code
```
      ST TEMP
```
and change position in operand stack containing ACC to the address of the temporary storage.

Produce code as follows:

   TOSM     OP      TOS    CODE PRODUCED
   V1      +-*/      V2     L V1, OP V2
   ACC     +-*/      V      OP V
   V       +*       ACC     OP V
   V       -/       ACC     ST TEMP, L V, OP TEMP

OP is defined as the mnemonic ADD, SUB, DIV, MPY depending on which of +, -, /, * is equal to OP. The stacking and code generation at each stage is shown in Fig. 8.1 for the expression a * b / (c + d).

Fig. 8.1

The combination caused by the relation + > ) causes the ACC to be stored by rule 1 while the V / ACC causes the other temporary storage. The first storage into temporary workspace was caused by a partially evaluated expression being followed by a sub-expression in brackets which required the use of the accumulator. The second storage into workspace is required because the defined order code of the computer does not have the reverse divide operation:

Set Accumulator equal to contents of storage location divided by accumulator

In many cases these stores into temporary working spaces can be eliminated by delaying the code production until more information about the complete arithmetic expression is available.

8.4 An algorithm for generating code from a tree structure

In Section 7.4 a parser for an operator precedence grammar was defined which produced a binary tree as a C-structure. An example of the type of tree produced was given in Fig. 7.5. The tree structure consists of a set of nodes and at each node is situated an operator. Each node has two pointers (left and right) which point either to terminal identifiers or lower nodes of the tree. These will be called the left and right operand of the tree node. For each node, T, is defined:

      OP.T   the operator of node 
      L.T    the left hand operand of node T 
      R.T    the right hand operand of node T

In the notation in Section 8.3, combination produces the following action:

  1    OP.T  =  operator concerned in combination 
  2    L.T   =  TOSM 
  3    R.T   =  TOS 
  4    TOSM  =  T 
  5    Move stack pointer down one level

The expression (a + b*c)/(f*g - (d + e)/(h + k)) would, for example, produce the tree shown in Fig. 8.2. The circled numbers indicate the order in which the tree nodes are produced by the syntax analysis when generating the C-structure. A routine for producing code equivalent to the tree T could be defined as C(T). In the example, the node marked as 8 is the root of the tree and so C(8) defines the code required to evaluate the expression. The operation C(T) will be defined as one which generates code equivalent to the whole tree with T as root and, if this is a sub-tree of the whole C-structure, it will replace the sub-tree T in the structure by the special terminal symbol ACC. This symbol denotes that at this stage in the coding the value of the sub-tree is resident in the accumulator.

Fig. 8.2

The action required when attempting to generate code for a node T will depend on the operator OP.T and whether the two sub-trees, L.T and R.T, consist of a variable, a node or the special symbol ACC. Therefore, a three by three table is required for each operator denoting the action to be taken for each possibility. Of course, several operators will have identical tables and, for the hypothetical order code given, only two tables are required (one for +,* and the other for -,/). These are given in Table 8.1.

OP.T = +,*	Variable V2	Node	ACC
Variable V1	L V1 OP V2 T = ACC	C(R.T) C(T)	OP V1 T = ACC
Node	C(L.T) C(T)	Either C(R.T) C(T) or C(L.T) C(T)	ST TEMP R.T = TEMP C(T)
ACC	OP V2 T = ACC	ST TEMP L.T = TEMP C(T)	Illegal

OP.T = -,/	Variable V2	Node	ACC
Variable V1	L V1 OP V2 T = ACC	C(R.T) C(T)	ST TEMP R.T = TEMP C(T)
Node	C(L.T) C(T)	C(R.T) C(T)	ST TEMP R.T = TEMP C(T)
ACC	OP V2 T = ACC	Illegal	Illegal

Table 8.1

As before, temporary storage is made available as required and can then be treated in the same way as any other variable. The tables have been written out in their simplest form. In an actual implementation it is quite likely that some short cuts will be taken. For example, the (NODE,ACC) entry in the tables requires the accumulator to be stored, the left hand sub-tree to be coded, and then the (ACC,V) code to be produced. In the table this appears as generating the accumulator to storage instruction followed by re-entering the routine to code the new form of T which is now of the form (NODE,V). In practice, all the code required, other than the coding of the left node, would be defined at the (NODE,ACC) table entry rather than re-entering as given in the table. The main difference in the two tables is due to the order code lacking the reverse divide and negate and add instructions. Consequently, if the node T has two sub-trees, then for addition and multiplication it does not matter which sub-tree is coded first, whereas it is important to code the right sub-tree first for subtraction and division, as this may avoid some unnecessary temporary storages.

The example given in Fig. 8.2 would be coded as follows:

C(8). As both L.8 and R.8 are nodes, the action is given in the (NODE,NODE) entry for the operator division. This is C(7) followed by C(8).
C(7). Again the (NODE,NODE) entry is required. This is C(6) followed by C(7).
C(6). Again the (NODE,NODE) entry is required. This is C(5) followed by C(6).
C(5). Both L.5 and R.5 are variables. The (V, V) entry produces code:
```
  L   h 
  ADD k
```
leaving ACC in place of node 5.
Completing (c) above requires C(6) to be coded next. This now has R.6 equal to ACC. The (NODE,ACC) entry generates the code:
```
  ST  TEMP
```
and replaces ACC by the variable TEMP and then re-enters to generate C(6).
C(6) is now (NODE,V) and requires C(4), the left hand sub-tree, followed by C(6).
C(4) like (d) produces:
```
  L    d
  ADD  e
```
As (g) leaves ACC in place of node 4, the form of 6 is now (ACC,V) and the code
```
  DIV  TEMP
```
is generated and node 6 replaced by ACC
C(7) must now be coded etc.

The complete code produced using the tree structure is compared with the code produced using the elementary algorithm in Table 8.2. By leaving code generation until the complete C-structure is available three store and three load instructions have been removed and, in addition, the number of temporary storage locations used could be reduced from four to one if the allocation scheme for temporary storage feels it is desirable.

  (a+b*c)/(f*g-(d+e)/(h+k))
ELEMENTARY ALGORITHM    TREE ALGORITHM
  L     b                  L     h
  MPY   c                  ADD   k
  ADD   a                  ST    T1
  ST    T1                 L     d
  L     f                  ADD   e
  MPY   g                  DIV   T1
  ST    T2                 ST    T1
  L     d                  L     f
  ADD   e                  MPY   g
  ST    T3                 SUB   T1
  L     h                  ST    T1
  ADD   k                  L     b
  ST    T4                 MPY   c
  L     T3                 ADD   a
  DIV   T4                 DIV   T1
  ST    T3
  L     T2
  SUB   T3
  ST    T2
  L     T1
  DIV   T2

Table 8.2

The unnecessary stores generated by an expression of the form a*b*(c+d) have not been removed. The tree algorithm is required to code a tree consisting of the second * as the root operator. In this (NODE,NODE) situation for producing code, the entry gave two alternatives and, without further analysis, it would be unable to decide which was preferable. In the example, obviously it is preferable to code the right sub-tree, c+d, first. A simple exploration of the tree would have solved this particular case but, in general, it is usually better to attempt to solve the problem before the tree is constructed. The problem arises because the a*b is generated as a tree before the sub-expression c+d appears in the stack. With a string of operands connected by operators with the same precedence class (often called a segment) there is no reason why these have to be combined as soon as they appear in the stack. The relationship * > *, * > /, + > + etc. cause this immediate combination. If these relations were changed to =, then operators of the same class would fill the stack until the complete segment was available when an operator of a different class would cause combination of the whole segment. An expression a*b*c*d, for example, instead of having a fixed structure (((a*b)*c)*d), would defer the decision on structure until the whole segment had been loaded into the stack. A simple algorithm for improving the code generated is to sort the operands so that combination occurs between the non-terminal expressions first followed by the terminals. This could be achieved in the stack by re-ordering the operands so that the highest operands in the stack are the non-terminal expressions and then combining from the top of the stack to form the tree. The expression a*b*(c+d)*e would then, in effect, have the structure (a*(b*(e*(c + d)))).

8.5 More complex algorithms

More sophisticated algorithms than those described above have been used to deal with the problems of long segments of operands combined by operators of the same precedence class. These usually involve defining the C-structure in a more fluid form so that rearrangements can be made easily.

An example of such a structure is the one used in the early Fortran compilers and defined by Sheridan (1959). Basically, the C-structure consists of a set of triples, written (N,OP,D) where N is a number given to the segment, and OP is the operator defined on the variable D. D can either be a variable or a segment number defining a subexpression. The expression a*b*(c+d)*e would produce the triples (0,*,a), (0,*,b), (0,*,1), (0,*,e), (1,+,c), (1,+,d). The advantage of the method is the flexibility of the structure. For example, the individual elements of a segment can be easily reordered. However, production of good code requires quite a large amount of sorting and rearranging of triples. Consequently it is much slower than the simpler tree algorithm.

Other coding algorithms of interest are given by Hawkins and Huxtable (1963); Kanner, Kosinski and Robinson (1965); Randell and Russell (1964); and Graham (1964). Details of a tree algorithm similar to that defined in the previous section are given by Anderson (1965). A modified form of this for a computer with several accumulators is given by Nakata (1967).

8.6 Unary operators and transfer functions

A unary operator is one having only one operand. This is in contrast to *,/ etc. which are binary operators. The most common unary operator is the unary minus in, for example, (-a)*b. The hypothetical order code defined in Section 8.2 allows the load and store instructions to be modified so that the absolute value of the operand can be used if the mnemonic has an A inserted at the end. This modification could be made use of in the type of code generated, and consequently the function ABS should be treated as a special unary. In some computer order codes most of the standard functions such as SIN, COS etc. have special computer instructions associated with them so that again these can be treated specially. The functions to be treated as unary operators will depend entirely on the order code and the sophistication of output code required. The problems of code generation often arise from the inconsistencies of the order code. In the hypothetical order code, the reverse divide and negate and add instructions were missing. Similarly, the absolute value of the operand could be taken only in the load and store instructions.

In the example, arithmetic operations were either normalised or unnormalised. Because of the representation of INTEGER quantities chosen, a number may be changed from INTEGER form to REAL form by simply performing some normalised operation on the number. This transfer function, changing the mode of the number from INTEGER to REAL, will be called the mode transfer function R (this is equivalent to the Fortran function FLOAT). Similarly the transfer function, which produces the integer part of a REAL number as an INTEGER, will be given the name I (equivalent to the Fortran function FIX). Again, problems will arise because one transfer function R is simple to produce and often requires no additional instructions (its effect can be amalgamated with one of the arithmetic operations) whereas the transfer function I may be more difficult to produce and require additional instructions. In this case, it is important to avoid any unnecessary transfers to the mode INTEGER. For example, in Algol, the arithmetic expression (i*j + k*m)*a is of mode REAL where i,j,k,m are INTEGER and a is REAL. The sub-expressions i*j and k*m are of mode INTEGER. One of these must be calculated and stored in working space on a computer with only one accumulator. In a simple approach, as this expression is of mode INTEGER and as the multiplication will have resulted in a normalised floating point number, a transfer function I must be applied before storing in the workspace. However, as a more detailed analysis would show that the final result is of mode REAL, the transfer function is not required. The problem could be further complicated by the sub-expression appearing elsewhere in a position where it was important that the expression was of mode INTEGER. In this case the transfer function would be necessary. These are the kinds of problems facing the code generator.

During the code production, the contents of the accumulator or store location can be in one of three modes:

  F   Normalised floating point 
  S   Unnormalised floating point 
  M   Normalised floating point integer (obtained when multiplying 
                                         two INTEGER quantities)

Similarly arithmetic expressions can have one of the following sign functions operating on them:

  1    + 
  2    -
  3    ABS 
  4    - ABS

8.6.1 Defining code during stack combination

In the simple algorithm defined in Section 8.3, code is generated as the stack is combined. Therefore, the particular form of the code required must also be defined at this point. In the other algorithms described, the code is generated from the C-structure and so details of whether the code should be normalised or unnormalised could be left until then, or alternatively this may be decided at the formation of the C-structure.

The simplest method of keeping the additional information required is to have two additional columns in the stack. The first defines the mode of the operand and the second defines the sign function still to be applied to the result. The aim will be to keep the mode produced as close to the desired mode as possible without generating additional instructions, and similarly attempt to keep the sign function to +. It may be necessary to have a third column defining the mode required of the result. This would either be INTEGER or REAL in the example.

In the simple algorithm defined in Section 8.3, there was only one possible action for each of the different combinations of TOSM, OP and TOS. This action would now be expanded to a four by four table depending on the sign function to be applied to TOSM and TOS. The action required for different operators will be quite different now. A table for the operator * and the case where operands TOSM and TOS are both variables and not ACC is given as an example in Table 8.3.

OP=*	+	-	ABS	-ABS
+	L V1 MPY V2 +	L V1 MPY V2 -	LA V2 MPY V1 +	LA V2 MPY V1 -
-	L V1 MPY V2 -	L V1 MPY V2 +	LA V2 MPY V1 -	LA V2 MPY V1 +
ABS	LA V1 MPY V2 +	LA V1 MPY V2 -	L V1 MPY V2 ABS	L V1 MPY V2 -ABS
-ABS	LA V1 MPY V2 -	LA V1 MPY V2 +	L V1 MPY V2 -ABS	L V1 MPY V2 ABS

Table 8.3

Apart from the code to be generated, each entry contains the sign function acting on the result. Notice that the ABS,ABS entry would require four instructions to generate the correct code including the sign:

   LA    V1
   ST    TEMP
   LA    V2
   MPY   TEMP

In the hope that this can be avoided at a later stage, the equality ABS(a*b) = ABS(a) * ABS(b) is used to leave the function ABS operating on the result. Similar tables for the other operators are required. Stacking and code generation for the expression:

  ABS(a * -b + c* ABS(d)) * ABS(e)

is shown in Fig. 8.3. The complete stacking and combination is shown for the first ABS. The stages showing combination for the other occurrences have been omitted. The first rule of the algorithm which stores the accumulator in temporary workspace would, of course, be modified to take account of any ABS function operating on the accumulator. In this case STA instead of ST would be used, and the ABS part of the sign function removed.

Fig. 8.3

The fourth column which defines the mode would be used to indicate whether a normalised or unnormalised operation is required. In this case, the following tables would indicate the type of operation required where one operand is a variable and the other the accumulator.

+	F	S	M
F	NF	NF	NF
S	NF	US	US
M	NF	US	NM

*	F	S	M
F	NF	NF	NF
S	NF	NM	NM
M	NF	NM	NM

The first letter in the table defines whether the code generated is to be normalised (N) or unnormalised (U) and the second defines the mode of the result. Notice that in the addition table, an unnormalised operation can force the result back to integer mode if the original mode was M. This would happen in generating code for i*j+k where all the variables are INTEGER.

This method does not deal with the mode transfer operators very well. The table for + could be altered, for example, if a REAL result was wanted so that the S,S entry was a normalised operation. However, as before, more details of the whole C-structure are required to get efficient code for mode transfer functions.

8.6.2 Unary operators in the tree algorithm

In this approach the operators R, I, ABS, and unary minus are left as part of the C-structure. They appear in the tree as nodes with only one operand. They can be thought of as modifying the binary node or terminal below. Any implicit mode changes in the arithmetic expression can also be inserted as explicit unary operators in this manner. The mode transfer function (MTF) is defined as the total effect of all modal functions on the node below, and similarly the sign transfer function (STF) is the total effect of the sign unaries on the node.

The sub-trees L.T and R.T are now terminal if the MTF operating on the terminal is still terminal, and similarly for STF. As shown before the operator I, which takes the integral part, may well use the accumulator when acting upon a REAL quantity. Therefore this would not be terminal.

As coding of the tree takes place, the accumulator and temporary storages will contain quantities which resemble what is actually wanted at each point in the coding of the tree. If it is impossible to produce the exact form in the minimum number of orders, then the difference is passed up the tree in the hope that, at a higher level, the actual contents of the accumulator agrees with what is wanted. There are seven different mode transfer functions that will arise. The definitions given here are in terms of the accumulator but also apply for storage locations.

S: the accumulator contains an unnormalised integer and this is required.
F: the accumulator contains a normalised floating point number and this is required.
IM: the accumulator contains a normalised floating point number which is an integer and eventually this is required to be an INTEGER in unnormalised form.
RM: the accumulator contains a normalised floating point number which is an integer and eventually this is required to be a REAL number.
RS: the accumulator contains an unnormalised integer which is eventually required to be a REAL number and therefore normalised.
IF: the accumulator contains a normalised floating point number and an INTEGER number equal to the integral part of this is required eventually.
RIF: the accumulator contains a normalised floating point number and a REAL number, equal to the integral part of this, is required eventually.

This notation is slightly ambiguous as S,F are used both to represent the mode of a variable and also the null transfer function operating on a variable of that mode. The transfer functions I and R have been subdivided into classes depending on the particular form of the accumulator or storage location. Starting at the lower node, L.T or R.T, each transfer function can be applied to the MTF operating on the node to give the complete MTF acting on the node. Table 8.4 gives the resulting MTF given by either I or R operating on one of these classes. The total sign transfer function acting on the node can be built up in a similar manner. Here there are only the four possibilities +, -, ABS, - ABS.

	S	F	IM	RM	RS	IF	RIF
I	S	IF	IM	IM	S	IF	IF
R	RS	F	RM	RM	RS	RIF	RIF

Table 8.4

The type of code required where one subtree is a variable and the other the accumulator can be obtained, as before, by looking up the relevant entry in a four by four sign table and a seven by seven mode transfer table. Really, a separate four by four table is required for each entry in the seven by seven table. However, the number of different possibilities usually turns out to be quite small. Table 8.5 shows the code required for the case (ACC,V) where the sign function on both nodes is +. The code produced would be an ADD instruction. Table 8.5 defines whether this operation should be normalised or unnormalised and the resulting MTF still to be applied to the accumulator ACC which replaces the mode in the tree.

	VARIABLE
	+	S	F	IM	RM	RS	IF	RIF
ACC	S	U S	N F	U S	U RS	U RS	U S	U RS
	F	N F	N F	N F	N F	N F	*	*
	IM	U S	N F	N IM	N RM	U RS	*	*
	RM	U RS	N F	N RM	N RM	U RS	*	*
	RS	U RS	N F	U RS	U RS	U RS	U RS	U RS
	IF	U S	α	α	α	U RS	α	α
	RIF	U RS	α	α	α	U RS	α	α

Table 8.5

The aim has been to keep REAL quantities that are integer in value in an unnormalised form as this is the most useful of the two forms. The entries marked with * are almost certainly non-terminal positions. The accumulator will have to be stored; the integral part of the variable has to be found followed by the addition of the temporary storage containing the original value of the accumulator. When deciding which of the two sub-trees to code first, the R.T sub-tree would therefore have been computed first in this case if the L.T sub-tree was terminal. The entries marked α are probably able to take the integral part of the accumulator in the hypothetical order code without requiring any temporary storage. The unnormalised zero given earlier could be added, for example. This may not be the case for some order codes, and then these entries would also denote non-terminal positions requiring storage of the accumulator.

Most of the handling of mode transfer functions could have been used in the simple stack algorithm described initially. However, this would not normally be done, as the quality of code produced would not warrant it. The handling of mode transfer functions in the tree becomes more important as soon as the algorithm is extended to handle common sub-expressions. A single sub-tree may be pointed at from several places each with different mode transfer functions acting on the common sub-expression represented by the sub-tree. The common sub-expression would be coded and stored in a temporary storage location. The parts of the mode transfer functions not coded would then be left as unary operators between the nodes pointing at the common sub-expression and the temporary storage location.

Fig. 8.4

Consider for example the Fortran expression:

ABSF(FLOATF(-XFIXF(XABSF(J))))* XFIXF(-ABSF(FLOATF(-I*J)))

which after I*J has been coded would leave a tree as shown in Fig. 8.4 with the MTF and STF acting on the accumulator and variable J given in brackets. Calculating the total MTF and STF for the terminals gives:

   MTF for ACC is IM 
   MTF for J is RS 
   STF for ACC is -ABS 
   STF for J is ABS

Looking up the relevant code in the seven by seven table for the multiply operator and the four by four sign table would probably define the code required as

   MPYN   J

with a MTF of RM and STF of -ABS still to be applied to the terminal ACC which replaces this sub-tree.

8.7 Coding simple common sub-expressions

A simple common sub-expression is defined as one in which two or more appearances of the sub-expression are unaffected by any intermediate assignments. For example the arithmetic expression:

  a + b*c + (c*b + a)*f

has the simple common sub-expressions b*c and a + b*c. If the C-structure is a tree, then recognition of identical sub-trees is equivalent to finding simple common sub-expressions. There are two basically different approaches to how this should be done. The first examines the completed C-structure and by a comparison of all nodes will eventually find the common sub-expressions. This is the method used if the C-structure consists of triples as in Sheridan's algorithm. It tends to be slow as a large amount of sorting and examination is required. The second method endeavours to find the common sub-expressions as the C-structure is being generated. This is the method usually adopted with the tree algorithm and the techniques used in an implementation will be described.

It is now necessary to consider more closely how the tree is stored.

As the tree node is completely defined by its two operands and operator, these can be used as a key to look up the node in what is known as a graph table. If the node already appears in the table, then a similar node has appeared before and this must therefore be a common sub-expression. Instead of producing another identical entry, the same entry is used for both occurrences of the common sub-expression. The tree has now become a directed graph as two pointers exist to a node. The graph for a + b*c + (c*b + a)*f is given in Fig. 8.5.

Fig. 8.5

The most likely type of table to use for storing the graph would be a hash table. The use of the stack for the expression above is shown in Fig. 8.6. Table 8.6 shows the graph table as a simple look up table. As each operand is to be loaded into the stack, it is first looked-up in the graph table and, if present, this entry position is loaded into the stack. If not, then a new entry is formed in the graph table. This is made to look like a normal node by having a null left operand and a zero operator. The fourth column shown in the graph table contains the occurrence number which contains the number of times that a pointer to the node has appeared in the graph. It is set to one when the entry is first defined and is only defined for non-terminals.

Fig. 8.6

	L.T	R.T	OP.T	OCCURRENCE NUMBER
1	-	a	0
2	-	b	0
3	-	c	0
4	2	3	*	1 then 2 then 1
5	1	4	+	1 then 2
6	-	f	0
7	5	6	*	1
8	5	7	+	1

Table 8.6

In the example, the stack is filled initially until combination occurs for the operator * causing the entry 4 in the graph table. If the operator is commutative, then the operands are entered in the dictionary with L.T having the lower position in the graph table. At stage (3), combination occurs for the node defined by 3 * 2 (L.T is the entry 3 in the graph table; entry 2 is R.T; the operator is *). This is rearranged at 2 * 3 (* is commutative) and, on look-up, this entry is found to be already in the table at position 4. This is a common sub-expression and the occurrence number is incremented (now set to 2). At stage (4) in the stacking, the node 4 + 1 is equivalent to 1 + 4 (+ is commutative) and this already appears in the graph table as entry 5. The occurrence number of entry 5 is therefore incremented to give the value 2. At this stage the sub-expression a + b*c has been recognised as common and also the expression b*c. As b*c is part of the larger common sub-expression and as it will not be pointed at directly a second time, the occurrence number for entry 4 is decreased by 1 leaving it with the value 1. Explicitly, the action taken when a node is found to be already in the table is as follows:

The occurrence number for the node is incremented by 1 and, if the occurrence number of L.T is greater than 1 then it is decreased by 1, and similarly for R. T.

The algorithm for code generation will now of course have to be modified to take account of common sub-expressions. The function C(T) is the same except that, after generating the code for node T, the occurrence number of T is examined and, if greater than 1, the code

    ST   TEMP

is generated and the tree node changed to the terminal TEMP. Later in the code generation, when alternative paths of the graph arrive at this point, the value of the common sub-expression that is stored in TEMP will be used instead of coding it again. The reason for decreasing the occurrence number of b*c is now obvious, as otherwise an unnecessary ST instruction would have been generated.

The presence of common sub-expressions also influences the choice of paths through the graph when generating code. As shown before, there is a certain amount of flexibility allowed in which sub-tree to code first when both L.T and R.T are nonterminal. However, if one of the sub-trees is a common sub-expression then this needs to be stored in any case so that this would seem to be the sub-tree to code first. For example, E + b*c where E represents a common subexpression would best be coded as:

   C(E) 
   ST     TEMP
   L      b
   MPY    c
   ADD    TEMP

The value of TEMP would then be used whenever code was wanted for the second and subsequent appearances of the common subexpression. However, this is not completely correct as another use of E might appear in the coding of the other sub-tree. Consider for example (a + E)*E. When coding the node T, where OP.T = * and L.T is the expression a + E and R.T = E, it would be wrong to code the right sub-tree first as this would produce the following code:

   C(E) 
   ST     TEMP
   L      a
   ADD    TEMP
   MPY    TEMP

A load operation can be avoided by coding the left sub-tree first:

   C(E) 
   ST     TEMP
   ADD    a
   MPY    TEMP

The general rule is, therefore: if R.T is a common sub-expression and the same common sub-expression appears as part of L.T, then code L.T first, otherwise code R.T first (similarly for R.T being a common sub-expression). It is usually impracticable to do a deep search for the common sub-expression but it might be worth going down a depth of one or two levels in the graph.

8.8 Coding sets of assignment statements

The benefits gained from coding several statements together, rather than separately, are mainly that common sub-expressions can be found over a larger distance and the reloading of values after assignment (if required in the next statement) can be avoided. It may be possible to permute the order of execution of statements to produce better code, although care must obviously be taken that the assigned values are not used in the statements between those permuted.

Sets of assignment statements will be considered which have no path of control into the set other than at the top and which exit at the bottom. The only addition to the tree algorithm, defined so far, is to have a means of linking the separate trees of the individual statements together. An obvious method is to have a statement list which points to the = nodes of the statements as they appear. For example the statements

   a = b*c + d
   e = b*c + f

would produce a graph as shown in Fig. 8.7. The circled numbers denote the position of the statement in the statement list.

Fig. 8.7

8.8.1 Bogus common sub-expressions

Once assignments to variables can occur, it is possible to pick up bogus common sub-expressions unless care is taken. For example:

   a = b*c 
   c = d 
   e = c*b

The expression b*c is not common, as an assignment to the variable c occurs in an intervening statement. Probably the simplest method of avoiding this recognition is to substitute the right hand side of an assignment statement for the left hand variable whenever the left hand variable appears in future statements. This will require an extra entry in the graph table pointing back from the assigned variable to the = node. This will be called the LHS back link (Left Hand Side). The rule adopted is as follows:

Before using any graph table entry to produce a new node, it is first examined to see if the LHS back link is set and, if so, the entry pointed to by the LHS back link is used instead. The example above would produce the tree shown in Fig. 8.8.

Fig. 8.8

8.8.2 Statement rearrangement

Consider the example:

   a = b*c 
   d = e 
   f = a+b

If taken in order, code produced for these statements would be:

    L    b    instead of the optimal code    L    e
    MPY  c                                   ST   d
    ST   a                                   L    b
    L    e                                   MPY  c
    ST   d                                   ST   a
    L    a                                   ADD  b
    ADD  b                                   ST   f
    ST   f

It is necessary to delay the coding of the first statement if it is substituted in a later statement. However, there are dangers in such an action. Consider for example:

   a = b*c 
   c = e 
   f = a+c

where it would be wrong to make any rearrangement in the order. A simple rule to decide whether or not to delay the coding of a statement is as follows:

If a statement V assigns a value to the variable b and a later statement X uses b on the right hand side, then the coding of V can be delayed and included in the coding of X only if none of the intervening statements change any of the variables used on the right hand side of statement V.

This can be implemented by having two additional columns in the graph table for the following entities:

Coupling Bit. This is set on for an = node as soon as the assignment is made and when on implies that there is no reason why this statement has to be coded at this point. The bit is set off as soon as any of the variables on the right hand side of this assignment statement are changed by subsequent assignment statements.
Delay Bit. If a statement V assigns a value to the variable band later b is used in another statement then, when this substitution takes place, if the coupling bit for the = node of V is still on, then the delay bit is set. Code is then produced by taking each statement in turn from the statement list and coding it unless the delay bit is set. No code is produced for the statement in its correct position if the delay bit is set. Instead code will be generated at the same time as the statement that it has been substituted in.

Consider how these bits would be used in the following example:

   a = b*c 
   c = d+g 
   e = b*c 
   f = a

At each production of an = node in the graph, the coupling bit must be set for that statement, and the coupling bits must be set off for all statements that used the assigned variable previous to this. The LHS back link must also be set pointing back to the = node. At each substitution the delay bit is set if possible. The graph table with the additional columns, LHS back link (LHS), coupling bit (CB) and delay bit (DB), is shown for the completed graph in Table 8.7 and the graph itself is given in Fig. 8.9. The coupling bit for the first statement (entry 5) is put on initially but is turned ooff by the following assignment statement (entry 9). However, when the substitution for the variable c occurs in the third statement, the coupling bit for the second statement will still be on so that the delay bit will be set for the second statement. The substitution for the variable 'a' in the fourth statement will not cause the delay bit of the first statement to be set as the coupling bit has already been turned off.

Fig. 8.9

	L.T	R.T	OP.T	LHS	DB	CB
1	-	a	0	5	0
2	-	b	0
3	-	c	0	9
4	2	3	*	5	0
5	1	4	=		0	1 then 0
6	-	d	0
7	-	g	0
8	6	7	+
9	3	8	=		1	1
10	-	e	0	12
11	2	9	*
12	10	11	=			1
13	-	f	0	14
14	13	5	=			1

Table 8.7

Code is generated for statements 1, 3 and 4 in that order. The second statement will be coded as part of the third as the delay bit is on. The code produced would therefore be:

   L     b 
   MPY   c 
   ST    a 
   L     d 
   ADD   g 
   ST    c 
   MPY   b 
   ST    e 
   L     a 
   ST    f

At an assignment, the coupling bits for all statements using the assigned variable on the right hand side must be set off. This can be arranged by having a unique position in a bit vector assigned to each variable as it appears in the sequence of statements. With each tree node can be associated a bit vector with bits set in the positions corresponding to the variables which appear as operands in the complete sub-tree of this node. With each = node is, therefore, a bit vector which defines which variables appear on the right hand side. These can be interrogated at an assignment to see if the assigned variable has been used in any of the statements preceding it in the statement list. To avoid requiring a vector of unknown length, a set length can be defined and the last bit can be used to refer to the whole class of subsequent variables which are used after the other bits of the vector have been allocated. This obviously may cause more inefficient code to be produced, but it could be used as a rescue procedure until the end of the current assignment statement. Instead of producing the C-structure for the next statement, those statements already analysed would be coded.

A set of statements coded together in this way must, of course, have no paths of control arriving at an intermediate point in the sequence as well as at the top. Consequently a labelled statement would cause the termination of the preceding sequence and the start of another, unless information is known about this alternative entry. Sometimes this information is available in the language definition. For example, the control variable in a Fortran DO statement cannot be altered in the body of the DO statement so that, even if several paths exist through the DO, these are known not to alter the controlled variable and, as far as it is concerned, the whole DO statement could be treated as a single sequence.

8.9 Global optimisation

So far, the algorithms described have been dealing with sets of arithmetic statements with the only path of control entering at the top and leaving at the bottom. The next sophistication would be to attempt to produce code for larger pieces of program with a control structure which could be represented by a directed graph and with assignment statements and expressions to be evaluated on the paths joining the nodes. The emphasis is now not so much on finding common sub-expressions but on moving the evaluation of expressions or assignments from one part of a program to a more advantageous one. The simplest example is an expression evaluated within a loop whose components are not assigned inside the loop. It would clearly have been better to have calculated the expression just once outside the loop.

Another example is the case where several branches meet at a point in the program and all but one require an expression to be evaluated. This same expression is required after the conjunction. In this case it would be better to force the evaluation of the expression in the branch where it was not required and remove it from the position after the join. The expression can then be assumed to be available from all branches at the conjunction, and it will not be necessary to recalculate the common sub-expression.

Several attempts to produce general algorithms of this form have been made. None has been completely successful as yet. All tend to be time consuming as the process has to be done recursively on simple graphical forms, and it is not obvious how much more efficient the final code produced by such an algorithm will be. Also the user tends not to be too happy with a code generator which reorders his code to such an extent that it is often quite difficult to find out where a program has reached in execution.