Chilton::ACL::Compiler Compiler 1966

The Brooker-Morris Compiler Compiler on the Bendix G21

F R A Hopgood

1. THE BROOKER-MORRIS COMPILER-COMPILER ON THE G21 (PDF)
2. THE COMPILER-COMPILER BOOTSTRAP (PDF)
3. THE COMPILER-COMPILER BOOTSTRAP (PDF)
4. THE PHRASE STATEMENT (PDF)
5. THE FORMAT STATEMENT (PDF)
- 5.1. Introduction
- 5.2. FORMAT
6. THE ROUTINE STATEMENT (PDF)
7. TRANSPLANT (PDF)
- 7.1. Introduction
- 7.2. TRANSPLANT

THE BROOKER-MORRIS COMPILER-COMPILER ON THE G21

1.1. Introduction

The Brooker-Morris Compiler-Compiler has been implemented in part on the G21. This first chapter will give details of how to use the C-C on the G21, the limitations of the system, peculiarities of the G21 system, etc. It should be pointed out that this implementation was carried out purely as a means of educating the author in the detailed coding of the C-C. Consequently, no claims for efficiency or usefulness are made. In fact, unless the system is run as a 64k program, it is only useful for student examples and familiarizing the user with the system. This, in part, is due to the crude method of implementation.

Following chapters will describe the bootstrapping process used to implement the system from the Atlas version and details of how the fundamental Master Phrases are implemented.

1.2. How to Run a C-C Program

At the moment the Compiler Compiler is situated on a Logical File Type. It requires approximately 60 blocks and a user may either use the standard version or obtain a copy of the system on his own logical file. The current custodian of the system will know the position of the C-C. At the moment it resides on FILE 46/1 of User CS07FH04, but this will probably be changed.

Details of the facilities available in the C-C are given in the paper, "The Compiler Compiler" in the Annual Review of Automatic Programming Vol.III. Knowledge of this paper is assumed and remarks will be made with respect to it. An example of a simple C-C run on the G21 would be:

$$330 AND CS07FH04 HOPGOOD
AN FILE 46/1(12);  FILE 48/1(13); TEXT;
       PHRASE [XYZ] =: AB, CD, EF
       DEFINE COMPILER 13 
AN RUN, (12), TAPE; 
$$

This illustrates the method of bringing down the Compiler-Compiler and also the Master Phrase for redefining an augmented version of the C-C. The 'DEFINE COMPILER' Master Phrase has just one argument which is the number of the logical file that the system is to be dumped on. In the example the phrase XYZ has been added to the C-C.

In order to run the C-C "as a 64k program it is necessary to patch two instructions in the C-C. This can be done by UPDATE and the above program would then be:

$$ 330 AND CS07FH04 HOPGOOD
AN FILE 46/1(12); FILE 48/1(13);  TEXT;
    LXS  12;
    PATCH  12021, 120000  .160000 ;
    TRA 12000;
    PHRASE [XYZ] = AB, CD, EF
    DEFINE COMPILER 13 
AN  RUN, UPDAT, TAPE; 
$$

Note that the C-C defined will now be a 64k version and to convert back to a 32k version requires:

     PATCH  12021, 60000 .67776;

A 64k program cannot, of course, be run from the teletypes in the normal manner.

1.3. Storage Layout

The present storage layout is:

            Input Buffer                 11300 
            Masks and useful constants   11540
            B variables                  11600
            Index                        12000
            G21 op-code skeletons        14000
            Record Store                 15000 to 52000 (approx)
            Stack                        60000
            Chain                        67776 (downwards)

The positions of stack and chain are moved in the 64k version so that about 16k of chain store is available and the record store can increase until it hits the monitor. If the record store does become this large, then it would be necessary to increase the LFT size from 60 to 70 blocks.

At present an interactive scope version is under construction which also uses the space below 11300.

1.4 G21 Characteristics of Bootstrap

The Bootstrap part of the C-C defined in Chapter 2 is machine dependent, and this section will describe G21 features of the bootstrap. This part was written in SPITE using a set of Macros which simulated the Atlas Machine Orders in the original version. It is hoped to be able to use the same macros in the 360-67 version.

1.4.1. Down Sequence

The routine entry sequence on Atlas used the B variables B91, B92, B93, B94 and B99. The G21 version does not destroy these variables. B99 is still used by the END sequence.

1.4.2. Input Routine 238

This routine takes the input and produces an internal form which is equivalent to the G21 internal character codes with the meta-characters defined as follows:

             eol           104
             EOL           107
             comma         105
             [             106
             space         101

The character 10 is used in place of ? and the operators ≤ , ≥ are replaced by < =, > =. The equivalence sign º is written as (=).

Routines added to the normal C-C routines are:

241: Prints B-variables from B40 to B127 (no parameters)
283: Prints store from B125 to B126

Two additional Master Phrases added to turn listing of the input on and off are:

     LIST
     DO NOT LIST

1.5. G21 Plant Routine

The Atlas PLANT basic statement has been replaced by the statement:

LOAD [MNEMONIC][COMMA][ABN][COMMA][ADDRESS][REG?] INTO [B}[SEP]

where

[ADDRESS] = [WORD], · [BFJX][N] + [WORD], ·[BFJX][N]
[BFJX]   = B,F,J,X
[REG]    = [COMMA][ABN]  
[MNEMONIC]= G21  MNEMONIC

The field [ABN] is a node field which is compulsory. The phrase [REG] defining the register part can be ignored. [B] defines the location where the instruction is to be placed and is automatically incremented.

In order to understand the address part of the field it is necessary to first define the G21 fields for the various fields in the data described in Chapter 2.

Field      Bits
  A        0 - 15
  F       16 - 17
  J       18 - 19
  S       20 - 21
  Z       22 - 31

As the higher fields appear in the operation part of an instruction it is necessary to provide.constant locations containing the relevant masks for these bits. The masks for the F and J fields are stored in addresses F₁, F₁, F₂, F₃, J₂, J₄, J₆. Also, it may be required to put an index position or B variable address in an instruction. The dot notation is used to denote that the address itself is required rather than the contents of the address. For example, if B32 = 6 then

LOAD  CAL, 2, B32  INTO Bl

will load the instruction CAL 2 6 while

LOAD  CAL, 2, .B32 INTO B1

will load the instruction CAL 2 /11640

The +[WORD] in the second [ADDRESS] alternative will add the value of word to the address defined in the first part, · [BFJX][N].

1.6. The Phrase [WORD]

This, to a large extent, was Atlas oriented in the definition of constants. This has been replaced by the following definition:

[WORD] = [ADDR], ([ADDR]), -[N], [N][JF?], [JF]
[JF]  =F1,F2,F3,J2,J4,J6

1.7. Limitations of G21 Version

The G21 version of the C-C is approximately equivalent to the Atlas version with the Auxiliary Statements undefined. On the Atlas version all simple forms of the Basic Statements were compiled rather than interpreted. On the G21 all Basic Statements are interpreted. No 'COMPILER' versions of the Basic Statement routines have been added. This was done partly because of storage problems and also because this is the point where G21 dependent routines would have to be added. As the first aim is to have the C-C implemented on the 360-67 this work to a large extent would be wasted.

The Auxiliary Statements have been omitted because, without the COMPILE versions of the Basic Statements, it is impossible to get the routines compiled in the 32k G21 store. These are all written in terms of CC orders and very little work is necessary to add these if a 360-67 version becomes available.

2. THE COMPILER-COMPILER BOOTSTRAP

2.1. Introduction

To initialize the C-C so that it is capable of self-extension and therefore able to build itself up to the required level we need a set of routines, tables and dictionaries which must be provided in a language independent of the compiler. This set is as follows:-

INDEX
INITIALIZATION ROUTINES 150 and 161
INPUT ROUTINES 238 and 261
MASTER ROUTINE 214
MASTER PHRASE DICTIONARY [MP] 130
DOWN and UP 239 and 240
ANALYSIS ROUTINE 215 with its subroutines 222 and 252
ITEM ASSEMBLY ROUTINE 266
VARIOUS MACHINE DEPENDENT ROUTINES

These routines need not be in their final form but must be capable of recognizing and generating ITEM routines. The final forms, of these routines can then be defined as ITEM routines replacing these temporary Versions. However, DOWN and UP will be inserted in their final forms as most of the mechanism is required; also the INDEX and routines 238 and 266 which are machine dependent and will not be changed.

2.2. INDEX

The INDEX is a set of fixed locations which we will denote by X₀, X₁, etc., up to a figure larger than X₃₀₀ and less than X₁₀₂₄. These provide locations for storing fixed global parameters of the system in a similar manner to the B variables. Their main purpose, however, is to contain the entry points to all routines and dictionaries. All routine and dictionary accesses are made via the address in their index position so that movement of a routine only requires changing the contents of the index position. Each routine has a number associated with it which corresponds to its index position, Besides the address of the routines each index entry contains a two-bit flag field F which is used to differentiate between different types of entries:-

F₀: Routines
F₁: Small Routines with simplified entry and exit
F₂: Dictionaries and Phrase definitions
F₃: Built-in Phrase Definitions.

As the Item Assembler is capable of defining all 4 types of entries in the record store but without the mechanism for differentiating between them, the Flag bits for entries being provided by the Item Assembler must be preloaded into the Index. Also, of course, the entry points for the routines in the.bootstrap must be inserted in the Index.

The complete list of entries necessary is as follows:-

INITIALIZATION PARAMETERS

X₀: On Atlas this contains an instruction which causes entry to the routine X150. This is due to Atlas requiring entry at the start of a block initially. If entry is made direct to X150 then this entry can be left empty.
X₂: Current origin of Record Store., i.e., pointer to next available location in the record store area.
X₃: = 300, the head of the Index Chain of unused Index Positions.
X₇: = Pointer to set of instructions loaded before an analysis record to cause the analysis record to be obeyed.
X₉: = 1024, the length of the Index.
X₁₀: = Numerical value of the EOL character. This will be used as a word to test against in the ITEM routines.
X₁₁: = Numerical value of the character zero. It will be assumed that the characters for the remaining digits have numerical values immediately above this. Therefore, for example, the value of character 9 in X₁₁ + 9.
X₁₂: = Numerical value of the $ character (or some other character) which is to be used to signify that the remainder of an input line, including this character, is to be treated as a comment.
X₁₃: = This location is used to store the address of a replacement ITEM while the new ITEM is being compiled.
X₁₄: = The entry pointer B71 is stored here on entering a Small routine. It is all that needs to be saved. Note that in general, a Small routine cannot call other routines.
X₁₅: = 0 for obtaining a listing of the input.
X₁₆: = Pointer to table of Operation Codes on the G-21 . It has been used as a pointer to allow the G-21 PLANT routine to access the parts of a G-21 opcode that are required.
X₁₇: = Stack Origin.
X₁₈: = Maximum Chain Address.
X₁₃₆: = 3, maximum number of lines allowed in a Source Statement.
X₁₄₀: = 10, maximum number of faults to be allowed by [MR],

The routines defined in the bootstrap must also have their entry points loaded in the INDEX. These are 130(F2), 150, 161 (F1), 214, 215, 216(F1), 222, 238, 239, 240, 241(Fl), 245, 248(Fl), 252(Fl), 261(Fl), 266, 275, 283(F1). The ITEM routine also has a subsidiary entry point at position 168 which is used to enter the routine for finding an integer arriving from the input.

In the original Atlas implementation storage was left following the INDEX so that a fast non-relocatable form of the Source Statement Analysis Routine and DOWN sequence could be inserted. This is not essential, but if implemented in this way, then storage should be reserved at this point. As it is not required until the complete Compiler has been produced, we will leave discussion of this until later.

Flag bits for the following positions must be set:-

F1: X170, 177, 183, 187 to 212, 216, 217, 219, 223, 228, 231 to 236, 248, 259, 262 to 265, 267 to 271, 273, 277
F2: X131, 132, 133, 142, 151, 156, 157, 159, 163, 168, 171, 174, 176, 179, 185, 229
F3: X148, 149, 152, 158, 160, 166, 167, 172, 173, 181, 184, 244, 250, 255

Finally the free Index positions are chained together starting at X300. Thus X300 contains 301, X301 contains 302, and so on. It is conventional to leave the last 20 or 30 Index Positions as a set of Small Routine positions for the user and these would not be chained. On Atlas these were positions 1000 to 1024. The last chained Index was therefore X998 which contained 999 and X999 was set to zero to indicate exhaustion of chain. The number of Index Positions depends only on the amount of storage available.

2.3. Initialization Routines

The initialization routine 150 together with its subroutine 161 define the storage layout of the compiler and initialize the Stack and Chain Store.

At this stage all that is required in Routine 150 is:-

Set B76 = Entry point of the DOWN routine.
Call R161
Enter MASTER ROUTINE

In general the user would not be expected to alter R150 but he could alter R161. (For example he might require a longer chain).

Routine 161 defines the positions of the stack, chain store and record store. In a machine with a conventional storage layout it would be reasonable to have:-

At the lowest address position would be space allocated to the 127 B variables followed by the INDEX table. Following this room would be left for the source statement analysis routine and then the bootstrap routines defined above. The record store will then be from this point up the store.
Near the top end of the store will be allocated an area to be used both by the stack and Chain Store. The CC is written so that increasing core addresses correspond to higher locations in the stack. The stack is therefore designed to eat into the chain store which is chained up so that the highest address on the chain will be used first (a description of chains is given in [6]). This does not give us any protection against the stack and chain running into each other but is sufficient until more complex routines can be added for storage allocation and protection.

If we assume the region extends from location N to M then we require:-

B72 = N          Base of working area
B90 = N+1        Stack Pointer
B89 = M-1        Chain Pointer

The Chain should be linked as follows:-

Address   Contents> 
N+3 
N+4         N+1
N+5 
N+6         N+3
...         ...
...         ...
M-4         M-7
M-3 
M-2         M-5
M-1 
M           M-3

In addition Routine 161 sets the following:-

X4 = B90 
X138 = 0         Number of faults detected by Master Routine
X140 = 10        Maximum number of faults allowed by Master Routine
B74 = X232       The entry address of the transplant routine.
B86 = 0          The line number
B87 = 300        The first free Index position
B88 = X2         First free address in record store

2.4. Input Routines

These routines will depend largely on the particular machine available. The Compiler Compiler restricts the number of possible input characters to 127 and these are assigned the numerical values 1 to 127. The correspondence between particular characters and numerical values is immaterial and can be chosen to suit the particular peripheral equipment or internal representation of a machine.

Routine 238 is designed to read the next section of input into a circular chain list (as described in reference [6]). Each chain position should contain a number between 1 and 127 corresponding to the character found on the input. The newline character should be assigned a value. It is unfortunate that some characters may be referred to specifically by their numerical values in the Item routines. The most likely characters involved in this way will be the Newline, Space, Comma and [. It is hoped to eventually remove these by having indirect references to these via the INDEX. The amount of input in a section is up to the user but the Item Routines assume one instruction per section so that having a section equivalent to a line is a reasonable choice initially. R238 exits with the next section in a circular chain list pointed at by B61. In addition the line number B86 is updated.

Routine 261 is used to convert some characters or character strings to unique special meta-syntactic characters required for the analysis of a line with respect to the Master Phrase dictionary. In particular it removes any redundant characters. Space and erase are non-significant for example when analysis takes place with respect to the [MP] dictionary; Also,it sets the numerical value for [ and , to special values which will be needed for checking against. In the original version the position 4 for EOL (End of Line), 5 for [ and 105(8) for COMMA were chosen. On the G21, 104, 106 and 105 octal were chosen, respectively. At the moment I am unclear how necessary this change is. To allow parts of a line to act as comment all characters after and including the one whose numerical value is stored in X12 is ignored ($ will be used initially).

The output form R261 is, in general, a reduced and modified form of the original string. The Routine expects B69 to point to the input chain list and on exit will have built up the reduced line and have B68 pointing at it. The original line is unchanged and may be required if, for example, the statement is not to be analyzed with respect to the Master Phrase Dictionary. It may be worth introducing at this stage some comment convention for ignoring parts of the input section.

2.5. Master Routine and [MP] Dictionary

Initially all that is required of the Master Routine is to be able to recognize the one entry in the [MP] dictionary which is ITEM. However, before the [MP] dictionary can be replaced by its final form it will be necessary to define a routine for replacing ITEM routines. Therefore, a second phrase REPLACE ITEM is also required to be recognized. Using the notation of the instructions in the Item Routine and the form of dictionaries defined in [5] we have for the Master Phrase Dictionary 130:-

H = 56 + J2      M - word
H = 19 + F1      & - word   
H = 7 + F1      & - word
C = I
C = T
C = E
C = M                    
H = 266          Index Position of ITEM Assembler
C = R
C = E
C = P
C = L
C = A
C = C
C = E 
C = I 
C = T 
C = E 
C = M 
H = 164          Index Position of Replace ITEM routine
H = 0 
H = 0

We are now in a position to define the action of the Master Routine 214. Brackets will be used in the Compiler-Compiler convention. For example (B61 + 1) means the contents of the address B61 +1. We have> therefore:-

Enter Input Routine 238
If line empty go to 1 [B61 = (B61+1)]
B85=B69=B61 and enter Conversion Routine 261
B61 = B68 = condensed form of input line
B62 = 130, [MP] dictionary position
B63 = 0, denotes primary entry to analysis routine
Enter Analysis Routine 215
If B64 ≠ 1 then no recognition and we have an error
B79 = B61, unrecognized part of line
Return original line pointed at by B85 and recognized part of line pointed at by B62 back to Main Chain
B77 = B63, analysis record of 'ITEM'
B60 = B77 - 5 (This may not be necessary)
Enter routine whose position in Index is (B77)
If B79 = 0 then no input remains to be analyzed and go to 1
If B79 ≠ 0 set B61 = B79 and go to 5.

2.6. DOWN and UP Routines

These two routines will not be changed later on as most of the mechanism is necessary immediately, and so it is convenient to store them in their final form initially.

There are basically two different entries to. the DOWN routine. The purpose of the DOWN routine in the normal entry is to update the stack before entering the new routine. Back and forward links are inserted in the stack and work space for the new routine is allocated. Sufficient information is stored so that the UP routine can restore the state of the stack to its initial form on exit from the routine.

The subsidiary entry is the TRANSPLANT entry described in [5]. The main nodes of the analysis record are moved into the stack and the associated routine is entered. This will be described in more detail later.

The normal entry is as follows:-

B60 = B90, temporary storage for stack pointer B90
B77 = -1, switch to denote normal entry
B92 = (B70-i) = Routine Number
- The actual macro for entering the DOWN sequence must contain the routine number to be executed. B70 must be the return pointer and B70-i here is used to denote the storage position of the Routine Number. This is machine dependent and the user can make his own choice.
B91 = contents of Index position B92, the routine entry point
If F1 bit of B91 set then small routine. Set B60 = B91 and enter routine in this case. Store old B71 in X₁₄ Set B71 = B91

If not small routine then:-

(B72) = B90            Forward link
(B90) = B70 - B71      Relative position of return in old routine .
(B90+1) = B75          Old routine number
(B90+2) = B60          Backward Pointer
(B90+3) = B72          Origin of workspace in old routine
B71 = B91              Location of current routine
B75 = B92              New routine number
B72 = B90+4            New forward link position and origin of workspace
B73 = B71 + (B71+1)    Location of label directory for routine
B90 = (B71) + B72+1    New stack pointer position having moved over workspace

B60 = B91 and enter routine.

If the DOWN routine is entered for TRANSPLANT then entry is made at 4. In this case B77 will contain the position of the analysis record (≥ 0) and B92 set to Index position of analysis record. In this case between 6. and 7. is inserted:-

6.1  (B72+1) = B77
     (B72+2) = (B77+1) 
     (B72+3) = (B77+2) 
     etc.

until all &'s of top level of analysis record have been copied into stack; that is until a word without Fl set is encountered. The rest of the workspace area up to B90 is then initialized to zero. This will be discussed in more detail later.

The UP routine returns stack to the position before entry to the previously called routine which we are now exiting from. The stack must be adjusted as follows:

B70 = (B72-4)      Recover relative position of return 
B75 = (B72-3)      Routine number
B90 = (B72-2)      Stack Pointer
B72 = (B72-1)      Pointer to storage area
B71 = Contents of Index position B75 = entry position of the old routine.
B73 = (B71+1) + B71 = Location of label directory
Transfer to B71 + B70.

2.7. Analysis

This is only required to recognize ITEM and REPLACE ITEM so far so that it can have the following simplified form:-

1.  B43 = B62
2.  B62 = B61 ∧ A             (A = address mask)
3.  B63 = B88
4.  B68 = B90
5.  B67 = (Contents of Index B43) ∧ A
    B65 = B67 .            •   .
6.  (B90) = {B67 + (B67) } ∧ A
7.  B90 = B90+1
8.  B69 = B90
9.  B67 = B67+1
10. B43 = (B67)
11. If B43 is &, that is F1 bit set then:-
           (B90) = B62 
           (B90+1) = B63 
           (B90+2) = (B65 + B43) ∧ A
           B90 = B90+3
           goto 9
12. If B67 = (B90-1) then recognition and goto 16.
13. B92 = (B62+1) 
    B93 = (B92)
14. If B43 ≠ B93 then no match and
    either B90 ≠ B69 and we set:- B90 = B90-3
                                         B62 = (B90) 
                                         B63 = (B90+1)
                                         B67 = (B90+2)
                                         goto 12
    or B90 = B69 and we set              (B72-2) = B72-4
                                         B64 = 0
                                         B62 = 0
              and exit with no match
15. If B43 = B69 then character is matched and we set B62 = B92 and goto 12.
16. Dictionary entry has been recognized;-
           (B63) = B43 
           B63 = B63+1
           B90 = B68
           B64 = 1
           B69 = B61
           B68 = B62
17. CALL    R252   (Split chain into two sub-chains)
18. B61 = B69
    B62 = B68
    B63 = B88
    CALL R222  (Dual Routine)

The routine 252 splits the recognized and unrecognized part of the input stream into two sub-chains. On entry, B69 points to the whole input chain and B68 points to the last character that has been recognized. On exit, B68 points to recognized chained list and B69 points to the unrecognized part. If recognized part is null then B68 = 0 and if unrecognized part is null then set B69 = 0.

The routine 222 takes on analysis record pointed at by B63 and converts it to the Dual form. In the simple form.of analysis record we have so far it is only necessary to move the one word analysis record to the top of the stack. To speed up the return mechanism the Routine 222 first moves the stack to its position before entry to 222 so that the return on exit causes control to go right back to the Master Routine. Orders required are:-

B90 = B90-4
(B90-1) = (B63)
B63 = B90-1 
(B90) = 0
B72 = B72-5 
(B72-2) = B90+1
Return

2.8. Item Assembly Routine

The ITEM assembly routine was originally designed to compile Atlas Machine Orders. On examining the Compiler-Compiler it was found that the set of orders used corresponded very closely with the set of orders required for manipulating the B variables and with very little change it was possible to remove the Atlas dependent features. It can, therefore, be thought of as an assembler for certain macros which describe operations on the B variables.

The B variable can be thought of as being split into several fields:-

J       a 3-bit field
Z       a subsidiary address field
S       2-bit field
A       main address field
F       2-bit field.

The Z-field need not be as long as the field A and its length is one of the parameters of a particular implementation. It should be at least 7 bits in length. It will be filled by first inserting the information in the A field followed by a move order which will move it to the Z-field. The contents of the Z-field will be extracted in a similar manner.

Logic operations are provided on all fields other than the Z-field.

The syntax of an ITEM routine is as follows:-

The symbol | is used to delimit alternatives and * and ? have their Compiler-Compiler meaning.

<Item Routine> ::= ITEM <N> <EOL> <instruction * ? > 
<N> ::= an integer 
<EOL> ::= newline
<instruction> ::= <label ?> <un1-instruction> <EOL> 
<un1-instruction> ::= <H-instr> | <C-instr> | <I-instr> | <OP-instr> 
<label> ::= / <N> /
<H-instr> ::= H = <H-addr> 
<H-addr> ::= <N><+JSF * ?> 
<+JSF> :: = + J <N> | + S <N> | + F <N>
<C-instr> ::= C = <C>
<C> ::= possible input character
<I-instr> ::= I<N>
<OP-instr> ::= <LOGIC> | <MOVE> | <NORMAL> | <TEST> | <TRANSFER> | <OUTPUT> | <101> | <121> | <124>
<LOGIC> ::= <LOGICOP> , <BA> , 0 , <LOGICADR> | 0165, <BA> , <BM> , <LOGICADR> 
<LOGICOP> ::= 0127 | 0167 | 0121 
<LOGICADR> ::= <N> | F <N> | J <N> | A | M 
<BA> ::= integer between 0 and 126 
<MOVE> ::= <MOVEOP> , <BA> , 0 , 0
<MOVEOP> ::= 0105 | 0106 | 0163 | 0164 
<NORMAL> ::= <NORMALOP > , <BA> , <BM> , <XN> 
<NORMALOP> ::= 0101 | 0104 | 0107 | 0110 | 0113 | 0114 | 0121 | 0122 | 0123 | 0124 | 0145 | 0147 | 1102 | 1117 | 1166 
<XN> ::= X <N> | <N>
<TEST> ::= <TESTOP> , <BA> , <BM>, <XN> 
<TESTOP> ::= 0152 | 0170 | 0172
<TRANSFER> :: = <TRANSFEROP> , 127, 127, L <N>
<TRANSFEROP> ::= 0224 | 0225 | 0226 0227
<OUTPUT> ::= 1064, 0 , <BM> , 0 | 1064 ,0 0, <N> |1065 ,0,0, <N> | 1066 ,0, 0, <C>
<101> ::= 0101 , 127 , <BM>, <XN>
<121> ::= 0121, 127, 127, 1 <N> | 0121, 127, <BM> , <N> | 0121, <BA> , 127, L<N> 
<124> ::= 0124 , 127 , 0 , 1 <N>

N.B. When Z,S fields are present then the A field need only be 10 bits in length.

The semantics of these instructions are:-

2.8.1. <H-instr>

This sets the next store location equal to the value of the expression on the right of the equal sign.

2.8.2. <C-instr>

This sets the next store location equal to the numerical number equivalent to the character defined in the address field.

2.8.3. <I-instr>

This sets the index position <N> equal to the current position in the record store. It, therefore, provides a subsidiary entry point to an ITEM routine.

2.8.4. <OP-instr>

These correspond fairly closely to the numerically equivalent Atlas machine orders. (The <MOVEOP>'s and 1066 are different.)

In the address field we have:-

<N>    meaning the constant integer
X<N>    meaning the location of the N^th index position
F<N>    meaning F field set to N
J<N>    meaning J field set to N
A meaning address mask
M meaning mask for instruction word other than address field

The B-variable 127 on Atlas is also the location counter. The value of L<N> is the relative position of label <N> from the current position of the location counter. Therefore, 121, 127, 127, L<N> is equivalent to setting the location counter equal to its present value + the relative position of L<N> from the present value; that, jump to label <N>. Looking at it this way will perhaps show the reason for the different instructions and also some hint on how to implement them on a particular machine.

2.8.5 <LOGIC>

0127, i , 0 , k   sets Bi = Bi ∧ k
0167, i , 0 , k   sets Bi = Bi ∧ k
0121, i , 0 , k   sets Bi = k
0165, i , j > k   sets Bi = Bj ∧ k

2.8.6 <MOVE>

0105, i , 0 , 0     Z field of Bi moved to A
0106, i , 0 , 0     A field of Bi moved to Z
0163. i , 0 , 0     S field of Bi moved to A
0164. i , 0 , 0     A field of Bi moved to S

2.8.7 <NORMAL>

0101, i , j , G     sets Bi = (Bj+G)
0104, i , j , G     sets Bi = Bi + (Bj+G)
0107, i , j , G     sets Bi = Bi ∧ (Bj+G)
0110, i , j , G     sets (Bj+G) = (Bj+G) - Bi
0113. i , j , G     sets (Bj+G) = Bi
0114. i , j , G     sets (Bj+G) = (Bj+G) + Bi
0121. i , j , P     sets Bi = Bj+P
0122. i , j , P     sets Bi = Bi - Bj - P
0123. i , j , P     sets Bi = - Bj - P
0124. i , j , P     sets Bi = Bi+Bj+P
0145, i , j , G     sets Bi = Bj ∧ (G)
0147, i , j , G     sets Bi = Bi ∧ (Bj+G)
1102, 70, 76, k     sets B70 to return; inserts k into store at this point 
                         and jumps to DOWN-routine
1117, 0 , 0 , 0     terminates execution 
1166, 67, 0 , 0     see TRANSPLANT

2.8.8. <TEST>

These instructions should be followed immediately by a <TRANSFER> instruction. They set the accumulator or some store to positive, negative or zero.

0152, i , j , G    sets accumulator = Bi - (Bj+G)
0170, i , j , P    sets accumulator = Bj + P - Bi
0172, i , j , P    sets accumulator = Bi - Bj - P

2.8.9. <TRANSFER>

Jumps to label defined in instruction depending on accumulator value.

0224, 127, 127, Li   jump to Li if Accumulator = 0
0225, 127, 127, Li   jump to Li if Accumulator ≠ 0
0226, 127, 127, Li   jump to Li if Accumulator ≥ 0
0227, 127, 127, Li   jump to Li if Accumulator < 0

The label position is defined by inserting (i) before the required instruction. Later it will be seen that all routines must be relocatable in the record transfer. For this reason all jumps should be made either relative to the entry point of the.routine (B71) or relative to themselves.

2.8.10.<OUTPUI>

These instructions produce output on a selected device as follows:

1064, i , j , 0    outputs character with numerical value Bj
1064. 0 , 0 , k    outputs character with numerical value k
1065. 0 , 0 , k    outputs newline (this could be extended)
1066. 0 , 0 , D    outputs character D.

2.8.11.<101>

This is an indirect transfer instruction

0101, 127, j , X₀   transfers control to the address given by the contents of Xj
0101, 127, j , G    transfers control to the address given by the contents of Bj+G

2.8.12. <121>

0121, 127, 127, Lj   transfers control to Lj
0121, 127, Bj, 0     transfers control to address stored in Bj
0121, Bj, 127, Lj    sets Bi equal to address of label Lj

2.8.13. <124>

0124, 127, 0 , Lj   transfer control to Lj

The complete action of the ITEM assembler can be described as follows:-

Get Item number and store current value of the record store pointer B88 in the corresponding Index position. Replace remains of ITEM line back on main chain. Set /B54 = Item Number and old Item address in X165.
CALL R238, R261, R215 as in steps 1 to 7. of Master Routine.
If B64 ≠ 0 on return from analysis routine then we have recognized the next Master Phrase and so we must set B79 to point to complete line (i.e., rejoin recognized and unrecognized parts), complete any label references which need to be fixed and exit.
If B64 = 0 then no recognition and this must be the next instruction of the ITEM routine. Assemble and store in B88 region updating B88 so that B88 always points to the next free location in the record store. Replace line back on chain and repeat from 1.

2.9. ITEM Assembler

Although different implementations will be forced to use different methods for the assembler, it is probably worthwhile giving a broad description of the way the G-21 assembler is organized. A complete flow diagram of the routine is given in the flow charts marked ITEM ROUTINE 266 (i), (ii) and (iii).

The outer structure given above is shown on (i). To allow the DELETE ITEM routine itself to be deleted it is necessary to plant the address of the new routine in X13 initially, and only when the routine is complete will this be moved to the corresponding index position.

Two adjacent tables pointed at by B81 and B78 are required for handling labels. The first table pointed at by B81 has 40 entries corresponding to the addresses of labels 0, 1, 2, etc. The second table pointed at by B78 contains the addresses of all the instructions which referred to labels. These instructions are compiled initially with the label number in the address field. At the end of the routine this table is looked at and all the instructions have their correct address parts inserted.

2.9.1. CODE SKELETONS

On the G-21 it was decided to store in a table code pieces for all the possible instructions macros and pick out the required skeleton depending on the instruction. There are 32 instruction opcodes, and these can be stored economically and with quick access by having a 32 entry Hash Table having as key (c/2) * 8 + d where the opcode is abcd. For example, the opcode 0152 has key 22 octal (5/2 = 2; * 8 + 2 = 22)

If two keys pointed at the same entry point then the key had 5 added to it and the process repeated.

The complete table used was:-

OP CODE       POSITION
0110            00
0101            01
1102            02
0113            03
0104            04
0105            05
0106            06
0107            07
1066            10
0121            11
0122            12
0123            13
0124            14
0225            15
0226            16
0127            17
1064            20
0224            21
0152            22
0114            23
0227            24
0145            25
1117            26
0147            27
0170            30
1065            31
0172            32
0163            33
0164            34
0165            35
1166            36
0167            37

This table then gives the position of an entry in two other tables. The first is used as a switch, depending on the type of opcode (shown as 'Jump on type' at bottom of flow diagram (ii)); the second points to the skeleton for the particular opcode.

There are basically two types of skeletons. The first is for the standard opcodes where the pointer is direct to the required skeleton. The second is for the 0101, 0124, 0121 type of instruction where the instruction may have one or more sub-forms. In this case the pointer is to a word containing the length of the first skeleton followed by that skeleton. This is repeated until we.reach the last sub-form.

Each instruction has its skeleton divided, into sub-skeletons grouped in the same way as the sub-forms are grouped. This is not necessary, as, of course, only one set of instructions is required for each opcode. However, in the cases where either the <BM> or address fields are zero, considerable economy of code may be achieved. Therefore, each skeleton is divided into four sections with the lengths preceding them:-

1. BM ≠ 0    Address ≠ 0
2. BM = 0    Address ≠ 0
3. BM ≠ 0    Address = 0
4. BM = 0    Address = 0

On the G-21 the address field of an instruction is not as large as the field of a B variable (as it is on Atlas). Consequently, several instructions have to be changed to alternative instructions when the longer field is required. For example, 0121 with a constant larger than the contents of the address field must be changed to 0101, together with the address constant stored away in an address. A typical example is:-

0121, 90, 0, Fl

which would be changed to:-

0101, 90, 0, R

where R is an address containing the constant F1.

The standard flag settings are stored in an array in the bootstrap.

2.10. Machine Dependent Routines

It is intended that all parts special to a particular machine should be included in the bootstrap. The routines in this section at present are:-

275 DEFINE COMPILER <N>: This has been implemented on the G-21 such that the parameter defines the logical file that the compiler should be dumped on. It is necessary, before dumping, to pack up the dictionaries in the Record Store by a call of routine 243 and to reset the values of X₂ and X₃ to the current values of B88 and B87, respectively. It is then only necessary to dump the INDEX, Record Store and any constants used.
219 PRINT B82 IN OCTAL
248 PRINT B82 IN DECIMAL: Two routines which print the value of B82.
241 PRINT B VARIABLES: This routine is useful for debugging purposes and prints the values of all B variables.
283 PRINT FROM B125 to B126: Another debugging routine. The G-21 version dumps the area of store between Value of B125 and B126.
216 INTEGER MULTIPLICATION AND DIVISION: As implemented on the G-21 and Atlas this is a non-standard routine which has its return address stored in B97 and has its two arguments in B98 and B99. If B97 has the Fl flag set it calculates B99/B98, otherwise B99*B98. The result is returned in B99.
7 TRANSPLANT SKELETON: As will be described later, each statement used in the body of a routine is interpreted unless a compile-version of the relevant statement is available and the statement has simple parameters. The form of interpretation is to insert instructions in the routine which will call the TRANSPLANT routine and follow them by the analysis record for the particular instruction. On Atlas this can be achieved by a single instruction which is stored in X₇. This value is then picked up and inserted in the Record Store as desired. On the G-21 it was necessary to insert a set of instructions, and these were pointed at by X₇. The instructions must set B70 pointing two places before start of analysis record, then jump to address stored in B74 (entry point to transplant routine).

2.11. Conclusion

Once the above has been written for a particular machine it should be possible to write the remainder of the Compiler Compiler in a machine independent form. The next phase is to write a set of ITEM's which will overwrite the crude versions of the routines like the Analysis Routine already defined. In general, the initial versions of the routines need not be relocatable in the Compiler Compiler form as long as the order of changing routines is in the reverse order to the one in which they were defined. For this reason the ordering of the routines given above in the Record Store should be as. follows:

1. Machine dependent routines

2. 161, 214, 261, 215, 252 and 222, in that order.

3. INITIALISATION

3.1 Introduction

Once the Bootstrap has. been written for a particular machine all future routines added to the system can be written as ITEM routines. As stated in the previous chapter, the macro orders are in a machine independent form and so the remainder of the Compiler-Gompiler need only be written once. The ITEM routines added fall into well defined sections. Each section will produce a significant difference in the power of the system. The current section replaces the simplified form of the routines defined in the Bootstrap by their final forms and initialises the dictionaries of the system.

3.2 Deletion and Replacement of Items

The Bootstrap defines two Master Phases ITEM and REPLACE ITEM and provides an analysis routine capable of recognising these two phrases and passing control to the relevant routines. These are not the only basic Master Phrases that we would like in our dictionary and our first aim will be to replace the [MP] dictionary by one containing all the Master Phrases we intend to build in. Therefore our first task is to define the REPLACE ITEM routine (164) and its subroutine 155 which deletes an item from the store. . Routine 155 has one parameter B61 which is the number of the routine requiring deletion.

The action of the REPLACE ITEM routine is straightforward. It first calls the ITEM routine which will compile the new copy of the relevant routine and store the old entry point in X165. This can then be deleted by calling 155 with B61 set to 165.

In general execution would then continue by the REPLACE ITEM routine returning .control to the Master Routine which would look for the next Master Phrase. However if the Master Routine itself has been replaced we would be returning to the deleted old version of the routine. Therefore this special case has to be tested for, in which case the END OF MESSAGE routine is called to reinitialise the compiler.

The routine 155 will delete the routine at index position B61 and return with the length of the deleted routine in B62. As the code used in the routines of the Compiler-Compiler are relocatable without alteration it is only necessary to move routines down over the space left by the deleted routine and alter the relevant Index positions. The only complication is due to the routine 155 itself being one of the routines that may need moving. The action of the routine is as follows:-

Find the length of routine B61. As no record of the routine's length is kept it is necessary to search the index positions to find the starting position of the routine following the routine B61 in the record store.
Move all routines higher up the record store down over the position held by this routine. If the routine 155 is below the deleted routine, the move is trivial. If above, then we must move all routines down including the first half of routine 155. Control is then passed to the front of routine 155 which moves the remainder of itself together with any routines above it in the store.
The Index Position for the deleted routine has its address field set to zero. The Index Positions for all the routines moved have their address fields decreased by the amount B62.
The routines 232 (Transplant) and 239 (Down) for historic reasons have their entry points also stored in B74 and B76 respectively. These must therefore be reset If either of these routines have moved.

The new [MP] dictionary will also define a Master Phrase:-

 DELETE ITEM

This routine will set B61 to the routine number and enter 155 to delete this routine.

3.3 Dictionary Initialisation

A complete description of the structure of dictionaries in the Compiler-Compiler is given in the paper Trees and Routines . The dictionaries are able to reside either in the Record Store or the Chain Store. The most commonly used dictionaries tend to be left in the Chain Store, while the remainder reside in the Record.Store until alterations are required in which case they are moved to the Chain Store. The Analysis Routine is capable of analysing input with respect to dictionaries stored in either the Record or Chain Store. However some routines expect to find dictionaries in one or other of the stores and it is necessary for the Master Routine to ensure that the dictionaries are in their correct positions. Two supervisory lists are therefore provided which point to the dictionaries which are required to be moved:-

1.  CONVENTIONAL LIST OF FORMAT CLASSES 129

This contains a list of the standard format classes MP, AS, BS and SS provided by the system. This can be extended by the user.

2.  LIST OF ADDITIONAL DICTIONARIES TO BE PACKED 169

This is initialised to contain the Class Identifier Dictionary, (CID) 134 and the Double Entry List, 256 which stores the Compile versions of routines.

The Master Routine therefore moves all the dictionaries referenced by 129 and 169 to and from the Chain Store automatically.

Copies of all these dictionaries must be provided therefore before the final form of the Master Routine is added. The BS, AS and SS dictionaries are initialised to empty as is the Double Entry List 256.

We now have the routines required to allow us to replace the original form of the [MP] dictionary and the new version contains the following Master Phrases:-

PHRASE           218
ITEM             266
END OF MESSAGE   227
FORMAT           220
FORMAT CLASS     272
DELETE ITEM      145
REPLACE ITEM     164
DEFINE COMPILER  275
LIST             284
DO NOT LIST      285

The CID Dictionary is initialized to contain the following class identifiers:-

AS               132
ABN              154
AB               153
A                166
BS               131
B                167
PI               181
EOL                4
SS               133
SP                65
SEP              179
GENERATED-P      186
RESOLVED-P       185
LABEL            184
N                149
COMMA            229
0-3              178  (Probably redundant) 
,                 10
ERAZE            127
FD               251
[                 81
OW               173
MP               130

3.4 Dictionary Packing and Unpacking

The routine 243 is added which will pack all the routines defined in the two lists described above into the Record Store. It uses a subroutine 228 which transfers an individual dictionary to the Record Store. The routine 245 and its subroutine 231 do the reverse operation.

The standard form of a dictionary in the Record Store is:-

M &......... I K

where the dotted interior is described in Trees and Routines with the ampersands pointing at the category numbers. The keywords described in Trees and Routines were not implemented. Instead a single K word giving the maximum number of ampersands on the top level of any alternative is added at the end. We have a similar form in the Chain Store with the Index entry pointing at the M word and the link from the K word pointing back to the M word.

The subroutine 231 is entered with B69 pointing at the initial ampersand and B68 at the M word. It will convert all information between and including the outer ampersand and I word to chain form with B67 pointing at the ampersand on return. In addition B66 is left pointing at the K word. The routine 245 works through all the dictionaries in both lists; filling in the M and K words itself. The action for a Double entry list is different. This conversion is done entirely by 245. In the record store the first entry of the dictionary contains length of list followed by a word containing 0 and then the words of the dictionary. In the Chain Store only the words of the dictionary appear with the Index position pointing at the last word.

The action of 243 and 228 is similar but provides the reverse operation. The flow diagrams in the appendix (243, 228) give the details of these routines.

3.5 Analysis Routine

The Analysis Routine 215 is the central routine of the Compiler and provides all the mechanism for the syntactic scan. In the appendix are two flow diagrams. One gives a complete description of the routine while the second gives a simplified form of the routine showing only the main flow. Its purpose is to analyse a string of characters with respect to some class of phrase or dictionary and produce on recognition an analysis record showing how the scan proceeded.

The input parameters to the routine are:-

B61 = address of string to be analysed. This must be in the form of a circular chain with B61 pointing to the last word.

B62 = class number with respect to which the analysis is to take place. The dictionary or phrase may be either in the Record or Chain Store.

B63 = 0 for usual entry

B63 = 1 for a special entry

B63 = 2 for re-entry x

In its use during the bootstrapping of the compiler, the special and re-entry modes are not used. Consequently discussion of these will be left until later.

In the case of a Phrase being defined as Contract Record, the analysis record is deleted. This will be used when the analysis record contains no useful information. In the dictionary this is signified by the M word having the J4 bit set. All tests on J4 deal with contracting out records in the routine.

Another section of the Analysis Routine which can be discussed later in more detail is that involving parameters. It is possible that when we are reading in genuine Compiler-Compiler routines that the input stream may contain phrase identifiers. The section dealing with this (bottom left of flowdiagram (ii)) can be left for the time being.

Looking at the simplified flowdiagram (i), the main paths are as follows:-

For basic symbols a test is made between the character in the dictionary and on the input stream. If a match is found we move down both checking until we get to the end of an alternative. This is known, as an ampersand always points to the category number at the end of an alternative and so we check until we reach this point. If a failure occurs then to get to the next alternative we must reset the original position of the input pointer, find where the last ampersand pointed and reset the pointer to the analysis record being produced. This information is made available by storing in the stack on arrival at any ampersand the three quantities B62, B63 and where ampersand is pointing. Getting to the next alternative is then simply a matter of unloading the top three stack positions.
If the dictionary or phrase contains phrases in its definition as well as basic symbols then when this appears in the dictionary we must suspend our present search on the current level and virtually reenter the Analysis Routine looking for the recognition of the sub-phrase. This could have been implemented in this way. However for efficiency reasons the stacking of levels is done internal to the routine. On descending to attempt recognition of a sub-phrase we store on the stack the origin of the stack for this level (B69), the pointer in the analysis record (B63), the pointer to the dictionary (B67), the M word (B66) and the origin of the dictionary (B65). If recognition at the sublevel is achieved we reset the above values and carry on where we left off. If no recognition then we must apart from recovering these values also go on to next alternative.
On recognition at the top level we must split the recognised and unrecognised parts of the input (Routine 252); change the.Analysis Record to its DUAL form (Routine 222) and exit. The output parameters are
- B61 = pointer to unrecognised part of input.
- B62 =» pointer to recognised part.
- B63 = pointer to analysis record.
- B64 = 0 for no recognition, = 1 for recognition, > 1 for maybe

A flow diagram for the Dual routine is given in the appendix. A complete description of analysis records is given in Trees and Routines.

3.6 Fault Routines

The Analysis Routine and most other routines of the system on recognition of fault conditions enter one of the two fault routines. If some kind of recovery is possible from the error then Routine 258 is entered otherwise 259.

Both routines require a fault number to be set in B65. The action of these routines is mainly up to the implementation. The fault number together with the routine number (B75) give a good indication of the fault. Additional information that could be printed would be the contents of the B variables, the stack and other parts of the Record and Chain Store as desired.

3.7 Master Routine

The Master Routine provides the implicit control structure of the Compiler-Compiler. There are instructions available in the Compiler-Compiler which allows the user to override the implicit control but, until these have been implemented, all control on the global level is provided by the Master Routine. This can be expressed briefly as follows:-

Form line of input and attempt to recognise Master Phrase.
If successful enter the relevant routine and on return repeat the process from 1.
If unsuccessful attempt to recognise a Source Statement.
If successful enter relevant routine and on return repeat from 3.
If unsuccessful try and recognise a Master Phrase again.

Safeguards are of course added to make sure that we do not get into an infinite loop. The Master Routine (i) flow diagram shows the flow for steps 1 and 2 above. Note that the line of input has a sub-scan by Routine 261 before entering the Analysis Routine. This is mainly for the removal of spaces and the setting of unique basic symbols for the meta-syntactic symbols [ and ,. On Atlas different input forms used different representations for the open bracket and it was helpful to have a unique form being passed around. On the G21 these have still been given unique values although I am still not clear how necessary this is now. The second main point is that some routines called on recognition of Master Phrases assume that the dictionaries reside in the Chain Store and so these dictionaries must be put there (if not already there) before entering routine.

The Master Routine (ii) flow diagram gives the flow for steps 3 and 4 above. This is very similar to the flow for recognition of Master Phrases. A separate sub-scan routine 142 is provided and the input is passed through this before calling the analysis routine 135. Routine 135 is basically identical to Routine 215. On Atlas it is compiled in a non-relocatable form for efficiency purposes and also produces the Dual form of the analysis record directly. (This is the reason for the K word on dictionaries.)

The other main difference between the two flows is that there is an assumption that the Master Phrase can be recognised with the available input. In the flow for Source Statements it is possible to get a maybe recognition which will reenter the Analysis Routine after processing further input. The Routine 135 on Atlas will only analyse with respect to dictionaries in the Record Store (again for efficiency). Consequently dictionaries must be packed into Record Store the first time we look for a Source Statement.

A little care must be taken when replacing the Master Routine. As stated before the REPLACE ITEM routine will call the END OF MESSAGE routine in this case. Just before replacing the Master Routine, an interim version of the END OF MESSAGE routine is added. This reenters the Master Routine as though it had been called from Routine 150. Once the Master Routine has been replaced the final version of END OF MESSAGE is added. This packs up dictionaries into the Record Store. This was not required originally as the unpacking had not been done by the original Master Routine due to Routine 245 being replaced by a dummy.

4. THE PHRASE STATEMENT

4.1. Introduction

The next aim is to add the routines necessary for compiling the PHRASE statement. Basically all that is required is the PHRASE routine, associated with the Master Phrase:-

PHRASE  218

This, however, requires quite a large number of subroutines which can be divided into basically two classes. The first set consists of general subroutines for dictionary manipulation and printing which will be equally useful later on. The second set consists of routines, dictionaries and syntax definitions closely associated with the PHRASE statement.

4.2. Dictionary Routines

The complete set of routines required are as follows:-

204  Add word to list
205  Add word to nest
206  Withdraw word from nest
207  Delete list
209  Add list to list
210  Copy linear list to chain
259  Add nil branch to dictionary
141  Look up or enter in a double entry dictionary
225  Merge new entry into dictionary 
224  General dictionary routine

Pictorial descriptions of all these routines apart from the last two are included in the appendix.

Routine 225 can be used to add an entry to the front of a dictionary or alternatively at the end. Its flow diagram is given in the appendix.

The General Dictionary Routine 224 has three main purposes. To lookup, enter or delete entries from a dictionary. The flow diagram (i) gives the outline of the parts of the routine common to all three types of entry. Flow diagrams (ii) and (iii) give the parts which depend on the types of entries. The first part differentiates between the following 6 cases:-

A: The dictionary is empty (B63 = 4).
B: No match was found between a dictionary entry and the given string. However, there is a dictionary entry which forms the stem of the string to be added (B63 = 2).
C: No match, but there is a dictionary entry which has a stem equal to a stem of the string to be added (B63 = 3).
D: No match at all. Not even, a common stem (B63 = 6).
E: A match exists between the dictionary empty and the string to be added (B63 = 0).
F: No match but string to be added is stem of existing entry in dictionary (B63 =1).

The action required then differs depending on whether look-up (B63 = 1), entry (B63 = 2) or deletion (B63 = 3) is taking place.

4.3. Input of the Phrase Statement

The philosophy adopted in the Compiler Compiler, as far as interface with the sub-scan and initial input is concerned, is to take all the input required for a particular statement and then to process the statement independent of further access to the sub-scan level. This is obviously inefficient as far as storage use is concerned unless the amounts of storage required in this way are small and this need not be the case. In the case of the Phrase Statement this, however, is usually so and is probably the simplest and most efficient method of execution.

Two routines are required to read in the whole of the Phrase Statement.

1.  READ NEXT  SECTION          213
2.  DELETE  SUPERFLUOUS EOL'S   265

The .first routine reads in statements until it reaches a line that can be recognized as the next Master Phrase by the Analysis Routine. All the input up to the next Master Phrase is read and built up into a list pointed at by B61. The next Master Phrase is pointed at by B79. Both strings are stored as conventional circular chains. The B61 chain has a character 128 added at the end.

The second routine removes all redundant end-of-line characters which may have been inserted during the process of building-up the body of the Phrase Statement.

Once these two routines have been accessed, the Phrase Statement body in B61 is ready for analysis.

The only input to Routine 213 is the variable B79 pointing at the first line of the Phrase Statement with the characters P, H, R, A, S, E removed.

The Routine 265 enters with B69 pointing at the input to be checked. On exist B69 points to the reduced line.

4.4 Print Routines

The two routines required are:-

177    PRINT NEW FORMAT OR PHRASE
183    PRINT SYMBOL

The first routine will output the words from B68 until B69. Each word is output as a symbol if it has a value less than 128 else the number itself is output surrounded by square brackets. This is followed by the Phrase or Format Index Number which is passed as a parameter stored in B67.

The Routine 183 is a subroutine which prints the value of B69 as a symbol.

4.5. Phrase Routine

The first action of the Phrase Routine, after the complete statement has been input by Routine 213, is to call the Identifier Conversion Routine 230.

Routine 230 is also used by the ROUTINE routine for reducing the body of a routine and, for this purpose, it has to recognize and separate different phrase variables of the same type. In the case of the Phrase Routine entry (differentiated by B78 being non-zero) it has the task of replacing each Phrase Identifier enclosed in square brackets by its position in the index. Characters other than Phrase Identifiers are not changed.

At this point we see the first use of the Analysis Routine as an internal recognizer. This routine has been designed to analyze strings with syntactic structure. The generalized phrase variable falls into this category. As we have not yet implemented the PHRASE statement for defining syntactic structure the syntax has to be built in as ITEM's. The phrase variable can then be checked by the analysis routine for legality. The syntax of the PHRASE variable is as follows:-

PHRASE [PHRASE VAR]   = [[][PHRASE ID] [PHRASE LABEL] [PHRASE INDEX] ]
PHRASE [PHRASE LABEL] = / [N] , NIL
PHRASE [PHRASE INDEX] =  (  [ABN]  ) , NIL

The [PHRASE ID] recognizer is a Built-in Phrase. This means that instead of a conventional dictionary form of entry used by the Analysis Routine to test for recognition, we have a subroutine which itself looks at the input stream and, if it recognizes anything, will produce an analysis tree. Built-in Phrases will, in general, be used when efficient recognition can be done by this non-standard method. The [PHRASE ID] routine has to recognize two basically different kinds of Phrase Identifier. The commonest will be an open square bracket followed by a set of characters terminating with a close square bracket. The second form is open square bracket, integer, close square bracket. This will be used if, for example, a dictionary or phrase has been put in as an ITEM and has no name entered in the CID dictionary. In this case the integer represents the index position.

If, for example, we have the Phrase Identifier [ABC] then the analysis tree produced will be

& 3+F2 A B C

with the ampersand pointing to C. The word containing an F2 flag bit is a Block Word (see Trees and Routines), which indicates that the next 3 characters should be treated as a string and not processed by routines which manipulate analysis records.

The second form of Phrase Identifier [175], for example, produces an analysis record

& N

where N = !75. The differentiation between the two forms is done by noting whether or not the Block Word is present.

The Index Positions of the above Phrases are:-

171 PHRASE VAR     -      '                      -
172 PHRASE ID
174 PHRASE LABEL
176 PHRASE INDEX

Once the call on the Analysis Routine by Routine 230 has produced a recognition then there are two main paths. If the Phrase Identifier is a simple index position then all that is required is to add this value to the output stream and continue.

If we have a genuine Phrase Identifier, it must be looked up in the CID dictionary. If it is already there then its index position is obtained and proceed as above. If not there, then we must assign an index position to this identifier and add it to the CID dictionary. A record of where this Phrase Identifier has been put is printed, its index position cleared and then proceed as before.

The one complication to the straightforward process defined above is the introduction of * and ? endings to Phrase-Identifiers which generate additional Phrase declarations. For example, [ABC *] appearing anywhere implies an additional Phrase declaration

PHRASE [ABC *] = [ABC] [ABC *], [ABC]

Similarly, [ABC ?] implies an additional Phrase declaration:-

PHRASE [ABC ?] = [ABC] , NIL

In the case of a new Phrase Identifier with either ending appearing it is therefore necessary to also insert an entry in CID and assign an index position for the Phrase Identifier without the ending.

Also, the above definitions must somehow be added to the Phrase Definitions presented to the PHRASE routine. This is done by building up a bit of additional Phrase Statements that are generated in this way and then processing these before the Phrase statement we were processing originally. Skeletons for the two types of Phrase definitions to be added are stored in ITEM 139. Processing is done by the call of the Auxiliary Phrase Routine 242.

Once the Identifier Conversion Routine 230 has reduced the Phrase Identifiers to their associated numeric values we can process the Phrase Statement body. Once again the Analysis Routine is used to produce the recognition. The built in syntax is as follows:-

PHRASE [PHRASE BODY]        = [CONTRACT RECORD] [ANY PHRASE ID] = [PHRASE TAIL]
PHRASE [CONTRACT RECORD]    = (C R) , NIL  
PHRASE [PHRASE TAIL]        = [PHRASE ALTERNATIVE] [PHRASE TAIL] , [PHRASE ALTERNATIVE] [128] 
PHRASE [PHRASE ALTERNATIVE] = BUT NOT , [PHRASE ELEMENT] [,] , NIL , [PHRASE ELEMENT]

PHRASE [ANY PHRASE ID] is built-in and just checks for a Phrase Identifier in the next position of the input. The Analysis Record .produced being

& N

PHRASE [PHRASE ELEMENT] is built-in and checks for a sequence of Phrase Identifiers or basic characters terminating on [,] or [128], The analysis record produced for a string of N such symbols is

N+F2 C₀ C₁ ... C_NN

where the C_i are the characters checked.

The positions of these phrases is as follows:-

PHRASE  BODY       151
CONTRACT RECORD    143
PHRASE TAIL       156
PHRASE ALTERNATIVE 157
ANY PHRASE ID      152
PHRASE ELEMENT     158

The remainder of the Phrase Routine consists of building up the Phrase Entry from the sub parts detected by the Analysis Routine using the routines for adding entries to dictionary. The BUT NOT entry has to be added before the first entry, otherwise the process is straightforward. The Contract Record Phrase will have the T₄ bit set in the M word. The K word contains the maximum number of phrase identifiers in any alternative.

4.6. [N], [A] and [B]

These 3 routines are required to recognize the basic variables and constants of the system. They are built-in for efficiency purposes. Although only [N] has been used so far in a positive manner the other two have been used to ensure that a Phrase Index- does not appear. It is not essential to add [ABN], [AB], [A] and [B] at this stage, but the routines for [N], [A] and [B] are coded as a single routine with multiple entry points and will, therefore, all be treated now.

In the case of [N] recognition is successful if an integer with less than 9 digits appears. The analysis record consists of just the integer value. The anlaysis record for [B] is exactly the same. Once the B has been checked, the [N] routine is entered.

[A] is recognized in a similar way, although, the analysis tree produced is different. In this case we have to reset B81 to the maximum value of [A] encountered, so far. The Analysis Record in this case consists of two words. The first contains 1*J₄+F₂ and the second B80 + i for A_i.

B80 by this time will have the number of Phrase Identifiers appearing in.a particular routine so that B80 + i will be the position of A_i in the LSE (List of Selected Expressions). On. execution, the address of A_i is the value in the Analysis Record relative to B72.

4.7. Conclusion

Once the above has been written it will be possible to add Phrase Statements to the Compiler Compiler. Although none will be required at this point it is a convenient point to stop and test before proceeding. The actual definition of Phrases and Formats is not really necessary until after we have introduced the ROUTINE routine.

5. THE FORMAT STATEMENT

5.1. Introduction

The FORMAT statement defines an additional statement to be added to one of the defined classes. A Format Class Dictionary can be thought of as an extendable PHRASE dictionary. In place of all alternatives having to be defined together, we are able to define alternatives as they are required. Due to this similarity the routines required for the implementation of the FORMAT statement are the same as those required for the implementation of the PHRASE statement.

5.2. FORMAT

The Master Routine recognizes the phrase FORMAT and enters the Format Routine (213). This reads in the whole of the Format Statement, converts the Phrase Identifiers using Routine 230 and analyzes the statement with respect to the following built in syntax.

PHRASE [FORMAT BODY] = [PI] = [FORMAT PHRASE] [SERIAL NO] [EOS]

where

PHRASE [SERIAL NO] = [,] [N] , [,] [158] , NIL

and

152   PI
158   FORMAT PHRASE

These two Phrases have been used in compiling the PHRASE statement. The second alternative to the [SERIAL NO] is not documented as far as I can see and probably not implemented. Once the Analysis has been completed correctly, a check is made that the Format Class Dictionary to be altered is in the Chain Store.

If a Serial Number is given, then this is allocated to the Format Routine. If not, then the next free position in the INDEX is allocated.

A count is made of the number of Phrase Identifiers in the FORMAT and this is compared with the last word of the dictionary which is always adjusted to contain the maximum number of Phrase Identifiers +1 in any Format.

As for the Phrase Statement, the routine 225 is used to merge the new entry into the dictionary.

6. THE ROUTINE STATEMENT

6.1. Introduction

So far the implementation has been primarily concerned with the syntactic analysis of statements. The ROUTINE statement provides the means for defining the semantics associated with a particular statement. Once this has been incorporated together with some Basic Statements, the compiler will be in its minimum workable form. Once this point has been reached, extension of the Compiler can be achieved by using its own definitional facilities.

A CC ROUTINE can be divided into two parts: the heading followed by the body. The main purpose of the heading is to define to which Index position this routine is to be attached. This can be done by either specifying the Index position or providing a heading which can be matched against a FORMAT already defined. In this second method parameters can be specified as input parameters to the ROUTINE. In the FORMAT statement for a particular instruction will be Phrase Identifiers which define the syntactic phrases expected at certain points in the FORMAT. In the ROUTINE heading, Phrase Variables will appear at positions corresponding to the Phrase Identifiers in the FORMAT statement. When a particular statement of this type is recognized, then the Phrase Variables in the Routine heading will be given as values in the analysis trees produced during the recognition of the associated Phrase Identifiers in the FORMAT.

The FORMAT statement, therefore, defines the types of phrases expected while the ROUTINE heading states which Phrase Variables are assigned the analysis tree values when a particular statement is recognized.

The ROUTINE body consists of statements in any of the three Format classes:- BS, AS and SS. To initialize the bootstrapping process some basic statements must, of course, be provided external to the general system. These are the BASIC STATEMENTS to be defined as ITEM routines which will be described later.

6.2. Identifier Conversion

Like the PHRASE and FORMAT statements, the whole of the ROUTINE is input initially, and the Phrase Variables are converted using the Identifier Conversion Routine 230 (flow diagrams (i), (ii), (iii)). In the case of the ROUTINE statement Phrase Variables rather than Identifiers appear. These may have labels and indices attached, and the output from the routine is more complex than the Index Position produced in the case of the PHRASE and FORMAT statements. This alternative path is marked by B78 being set to zero on entry. The additional part of the Identifier Conversion Routine is given in flow diagram (iii).

The output for each Phrase Variable is now as follows:-

FIRST WORD: 'A' field set to Index Position of Phrase Identifier. If the Phrase Variable is the * type, then the Index Position is that of the Identifier with the * deleted.
SECOND WORD: 'A' field set to position of Phrase Variable in the L.S.E. A double entry list is kept of the Phrase Variables as they appear. Each new one is allocated the next position in the L.S.E. and entered in the list. (See Trees and Routines.) 'Z' field set to numeric value of index. 'S' field set to 0, 1, 2 depending on whether index is [N], [A] or [B] type.

For example, the Phrase Variable [ZY*/7(A5)] where

a = Index Position [ZY*]
b = Index Position [ZY]
i = Position of [ZY*/7] in L.S.E.

would produce as output:-

1.  A field set to b
2.  A field set to i 
    Z field set to 5 
    S field set to 1

A Phrase Variable without a label is assumed to have a label value of zero.

6.3. Routine Heading

The Routine Heading is recognized by the Analysis Routine where the built-in syntax can be described as follows:-

PHRASE [ROUTINE HEADING] = SMALL R [N], R [N], [COMPILE] [PI] [EQV] [RESOLVED-P]
PHRASE [COMPILE]         = (COMPILER), 'NIL
PHRASE [RESOLVED-P]      = [SET P'] [ANY PI] [RESET P]

The Index Positions associated with these are as follows:-

254 ROUTINE HEADING
249 COMPILE.
181 PI
185 RESOLVED-P
152 ANY PI
244 SET P'
250 RESET P

The first two alternatives are straightforward.

The SMALL Routine is defined as one which only needs the simple form of entry mechanism and cannot have statements in its body which cannot be compiled. It is entered using the simplified non-stacking entry via the DOWN routine.

The [COMPILE] option, if present, states that this version of the routine is designed to produce compiled code for the statement rather than interpret the Analysis Record. More will be said of COMPILE routines later.

The phrase [PI] is built-in and will recognize any Phrase Identifier. Its action is given in the flow diagram. Note that it modifies the syntax for [GENERATED-P] and also [RESOLVED-P]. The latter modification changes [ANY PI] to the particular Phrase Identifier recognized by [PI]. This ensures that the analysis depends on the initial [PI]; that is the remainder of the heading is analyzed with respect to this Format Class.

Once the ROUTINE heading has been recognized a check is made that a match at the top level between the Phrase Identifiers appearing in the Format and the Phrase Variables appearing in the ROUTINE heading. For example, it would be illegal to have:-

FORMAT  [AS] = GET [ABNJ 
ROUTINE [AS] = GET [AB/1]

The Phrase Variable must be of type [ABN]. This check is easily made as the top level should consist of P words. In the case of a COMPILE routine we must allocate an Index position for the routine and enter it in the Double Entry List 256 containing Routine Positions and the corresponding positions of the COMPILE versions.

6.4. ROUTINE Body

The compilation of the ROUTINE body is mainly done by the Routine 253 (Compile Body of Routine).

The compilation of Label declarations and uses is achieved by using two lists. Each time a Label is declared, its numerical value is inserted into the B52 double entry list, together with its address in the ROUTINE body. Each time a Label is used, the Label's numeric value is inserted in the address part of the instruction and the address of the instruction is added to the B53 nest.

Once Routine 253 has compiled the body of the routine, the ROUTINE routine inserts into the first two words of the routine:-

1.  Length of L.S.E.
2.  Length of routine up to Label Directory.

The Label Directory is added at the end of the routine. It consists of a direct look-up table containing the addresses of the labels relative to the start of the routine. The length of the table being the maximum numeric value of the labels. The directory is made by scanning the B52 double entry list.

6.5. Compile Body of Routine 253

Unless a Compile version of the routine associated with a statement exists, a statement in a routine body is interpreted. The compilation of the statement then consists of storing a call of the TRANSPLANT routine followed by the Analysis Record of the statement.

The first action of the routine is to analyze the input to test for the end of the ROUTINE body, the presence of a label or a statement.

If the end has been reached, the address fields of instructions referring to labels must be set to their correct value. The label uses are withdrawn from the B53 nest and looked up in the B52 double entry list to find the correct address. If the label does not appear in the list, then it is undefined and a diagnostic is output.

If a label is declared, its address is added to the B52 double entry list together with the label number.

If neither of the above are present then the next statement must be analyzed, and this can belong to one of the three Format Classes, [BS], [AS] or [SS] and analysis is tried against these in that order. If a [BS] statement has the same FORMAT as one of the statements in the other two classes these can be, recognized by adding an * before the statement.

The built-in syntax can be described as follows:-

PHRASE [LABEL OR END] = [SEP*?] [EOS], [SEP*?] [LABL]
PHRASE [LABL]         = [N]), NIL
PHRASE [SEP]          = [COMMA], [EOL] built in and Contract Record
PHRASE [STATEMENT]    = [BS], [*?] [AS], [*?] [SS]
PHRASE [*?]           = *, NIL

The Index positions are as follows:-

144 LABEL OR END    
146 LABL
148 SEP*?
147 STATEMENT
182 *?

Once a [STATEMENT] has been recognized it may be possible to compile the statement if a Compile version of the Routine associated with the statement exists. The Compile routine is called as long as no parameters exist in the statement. This is done by checking the Analysis Record for P' words. P' words will appear whenever a Phrase Variable is recognized in a [RESOLVED-P]. This can be seen in the flow diagram (ii) for the Analysis Routine 215 (lower left). If a Phrase Variable is recognized then the Analysis Record produced consists of the second word produced by the Identifier Conversion Routine (see 6.2) with an F field set to 2 (P word). If in [RESOLVED-P] then it is a parameter P' in which case the J4 bit is set. This is achieved by [SETP'] setting B44 to J4 + F3 and [RESET P] resetting B44 to F3.

If no P' words are present the COMPILE routine for the statement is entered if it exists and an attempt is made to compile the statement. If it is unable to compile the routine (normally due to the complexity of the statement) then it returns with B54 set to -1. Note that this would be an' error if the ROUTINE heading defined a SMALL routine.

If no COMPILE routine exists for the statement, then the interpretation is achieved by calling:-

217  CONVERT ABSOLUTE &'s to RELATIVE &'s

This routine stores first the call of the TRANSPLANT routine followed by the Analysis Record for the statement. As the Analysis Record is added, the ampersands are changed so that all pointers are relative to the front of the routine.

7. TRANSPLANT

7.1. Introduction

The TRANSPLANT routine is called by each statement that requires interpretation. It moves the Analysis Record across to the Stack entering the values of any Phrase Variables required and then calls the relevant routine for interpreting the statement.

7.2. TRANSPLANT

There are seven different types of words which appear in the Analysis Record. The action is as follows in each case:-

F₀ set. The word is moved with no alteration.
F₁ set. The ampersand is altered to point to the absolute address.
F₂ set. A simple Block word. The Block Word and following protected words are moved without alteration.
F₂ and J₄ set. The B' word. In this case the word following the block word is modified by adding B72 to it. This occurs for example, in A₃ where this has an analysis record B₁' N where N is position of A₃ in L.S.E. The B72 added, therefore, gives the absolute pointer to A in the Stack.
F₂, J₄ and J₂ set. The B'' word. In this case the second word after the Block word is set to B72 and the third word is assumed to represent a variable [ABN] and its value is taken. The analysis record for [LABEL] is of this form. The third word gives the number of the label and B72 defines which routine it belongs to. The analysis record for [PI] is similar.
F₃ set. P word. In the case of a Phrase Variable the correct pointer is obtained by the routine:-
```
233  LOOK-UP
```
In the simple case the contents of the LSE position for this variable are taken. However, it is possible that the Phrase Variable may have an index. In this.case it is necessary to compute the value of the index at this point and then move down the analysis record until we point to the sub-analysis record corresponding to this occurrence of the variable.
F₃ and J₄ set. P' word. The entry is a Parameter to the routine in which case B72 is added so that the entry made points to the L.S.E. for the routine.

Once the TRANSPLANT has taken place, the routine to INTERPRET the instruction is entered via the modified DOWN entry. This will update the stack and store in the L.S.E. for the routine entered the top level &'s; that is, the parameters for the routine entered.