Chilton::ACL::3rd order compiler

The Third-Order Compiler: A Context for Free Man-Machine Communication

R B E Napper

Machine Intelligence 1 Workshop, September 1965

Manchester University

1. THE COMPILER-COMPILER
2. THE SECOND-ORDER COMPILER
3. THE THIRD-ORDER COMPILER
4. A CONTEXT FOR FREE MAN-MACHINE COMMUNICATION
5. REFERENCES

1. THE COMPILER-COMPILER

The Compiler-Compiler of Brooker and Morris is a special-purpose compiler for writing compilers. It was designed specifically to facilitate the writing and development of the Systems compilers for the Manchester Atlas computer, including the new language Atlas Autocode. A compiler is a computer program whose job is to accept any (source) program written in the designed language and translate it into machine code, so that a computer can then execute this program to carry out the particular operation required by the programmer. The compiler is itself a computer program, for which the data is the source program, and it has to be written in some language or other, usually a primitive machine-code assembly language.

In the case of the Manchester Atlas it was considered that as a number of compilers had to be written it would be more convenient to devise a special language for the compiler-writers instead. This special compiler had of course to be written first, using machine code. It was called the Compiler-Compiler, but it can be used in the normal way as a compiler, particularly for data-processing where the data is in the form of language. However it will be described only in its original context, as a compiler for compiling compilers (a compiler-compiler).

The compiler-compiler is fully described in Brooker, MacCallum, Morris & Rohl (1963), which is in effect a manual for a user of the system, and in Napper (1965), which is designed as a self-contained appreciation of the system for the lay programmer.

The language of the compiler-compiler system can be broadly divided into two classes as follows:

Master-instructions. A compiler description is made up of a collection of master-instructions. The most important of these are the PHRASE, FORMAT and FORMAT ROUTINE declarations, which provide a language for language description.
Directives. There is a simple language for use inside format routines, in which the compiler-writer describes the actions required to translate source instructions into the appropriate machine code according to the conventions of the autocode he is implementing. This language includes some special instructions for manipulating source language.

The compiler-compiler automatically incorporates in each compiler machinery for recognising and translating source language. This machinery works in conjunction with the language-description and language-manipulation facilities to provide a powerful mechanism for processing source language.

A PHRASE definition associates a suitably named class word with a phrase or set of alternative phrases, to define the set of symbol strings that are members of the class. The class word can then be used in a phrase or format definition to represent this set. A phrase is a string of symbols, or a class word, or in general a string of symbols and class words which defines a set of symbol strings direct, for example:

PHRASE [EQUALS] = equals, =, [IS] [EQUAL-TO], [IS]
PHRASE [DOES-NOT-EQUAL] = does not equal, ≠, [IS] not [EQUAL-TO], [IS] not
PHRASE [IS] = is, are, was, were, will be
PHRASE [EQUAL-TO] = equal to, same as

A FORMAT is a phrase which defines an instruction type, for example:

FORMAT: set [VARIABLE] = [EXPRESSION]
FORMAT: go to [LABEL] [if, unless] [EXPRESSION] [COMPARES-WITH] [EXPRESSION]

Some class words have recursive definitions, and so the set of possible symbol strings (i.e., instructions) satisfying a format may be infinite.

Each format has an associated FORMAT ROUTINE. When the compiler is operating on source program, the built-in language-recognition machinery deals with successive instructions by testing the head of the symbol string of the rest of the program against the list of formats of the source language. When it finds a matching format it passes control to the associated format routine, which describes what action is then to be taken to translate this next instruction. These routines carry out the backroom organisation required, e.g., building up or searching the list of parameter names of the source program, and in particular they form the appropriate machine code instructions to add to the compiled program.

When a source instruction is matched to a format, a record is kept of the substrings matching each class word in the format. These symbol strings are the natural parameters of the format routines, as they define the particular variant of the format to be dealt with. They are therefore treated as a special kind of variable and they can be referred to inside format routines by special phrase-identifiers. Special phrase-handling directives are provided to operate on these phrase-identifiers, in particular to extract information about their associated symbol string.

In addition, phrase-identifiers can always be included in ordinary instructions in the format routines, provided they match a corresponding class word in the instruction format. In this case the current value of the phrase-identifier is substituted into the instruction when it is obeyed, i.e., when its format routine is entered as a subroutine.

As a trivial example, consider:

FORMAT: note that [VARIABLE] [EQUALS] [EXPRESSION] 
ROUTINE: note that [VARIABLE] [EQUALS] [EXPRESSION] 
set [VARIABLE] = [EXPRESSION]
END

Here the words in square brackets, which in the format are class words, in the routine heading and the routine are phrase-identifiers. If the instruction

note that a is b + c

was met in the source program, the routine would be entered with phrase-identifier values [VARIABLE] = 'a', [EQUALS] ='is', and [EXPRESSION] = 'b + c' and thus the instruction obeyed inside the format routine would be a call for the format routine of the more basic format set. . ., which will compile in effect:

set a = b + c

The language defining machinery is available to the compiler-writer to allow him to define special auxiliary formats and routines for use in format routines, to give an effective subroutine mechanism for his compiler program.

Formats are in fact divided into format classes, and each format definition must state which class it belongs to. There are four format classes built in to the system:

MP (Master Phrase): the set of master-instructions of the compiler-compiler.
BS (Basic Statement): the directive language of the compiler-compiler.
AS (Auxiliary Statement): the set of subroutines of the compiler program.
SS (Source Statement): the source language.

BS and AS instructions can only occur inside routines, SS instructions can also occur inside format routines, but SS format routines are more commonly entered at top level from the built-in recognition machinery during compiling.

The compiler-compiler translates phrases and formats into stored information for subsequent use by the language-processing machinery, and it translates format routines into machine code routines of the compiler program. Inside format routines, BS instructions are usually translated into the appropriate machine code, but for AS and SS instructions unconventional cues are planted to the associated format routines.

2. THE SECOND-ORDER COMPILER

The process of using a compiler-compiler to write a systems compiler to translate source programs is generally done in two separate stages. That is, when the compiler has been developed, it is translated by the compiler-compiler and transferred to magnetic tape by a special master-instruction. It is then a conventional compiler in machine code, stored on magnetic tape, that can be called upon to translate any program written in the designed language.

In developing his compiler the compiler-writer does not want to go through this double process in order to introduce or correct one of the formats of the language and then test it; it is convenient to be able to carry on as part of the same run and compile a section of source program to test the compiler. Equally he does not particularly want to precede each such development run with all the satisfactory phrases, formats and format routines already developed. So it is convenient for him to combine the facilities on hand, so that the incomplete compiler is stored on magnetic tape during development together with a copy of the compiler-compiler. At the beginning of each development run it can then be called down as if it were an ordinary compiler, but the compiler-compiler can first be instructed to translate further phrases, formats and format routines, and to incorporate them into the compiler, before passing control to the compiler to translate a test piece of source program, finally passing control to the compiled source program to obey the test. If however it is required to make an already tested addition or correction to the compiler, instead of compiling a test program, a special master-instruction can be used to put the amended compiler back on tape.

The Brooker-Morris compiler-compiler is designed so generally, using bootstrapping techniques, that this procedure is quite natural to it. It has been stated that the compiler-compiler automatically incorporates language recognition and translation machinery into the compiler program. But of course the compiler-compiler is itself a compiler program, and most of this machinery is equally relevant to its task of compiler-compiling. Therefore the machinery is a carefully integrated whole that is used both when compiler-compiling a compiler and when the compiler is compiling a source program. It is even used when developing the compiler-compiler itself, to add on the less basic facilities (e.g., most of the directive language) using the more basic facilities. In fact a compiler is grown on to (a copy of) the compiler-compiler just as if it was a further development of it - the translated phrases, formats and format routines are not stored separately. When the compiler is fully developed, a special master-instruction removes all the machinery not required for just compiling source program, and the compiler is finally stored as a production compiler.

An interesting question is now raised: what kind of object is the partly developed compiler? It is a compiler, because (if it is sufficiently complete) it can be called upon in the standard way to translate source program. It is equally a compiler-compiler, because the original compiler-compiler is a subset of it, and one could start defining a completely new language for it to translate, provided it did not clash with the language already defined. In fact, apart from looking to see if there is a dictionary of SS formats on hand, it is impossible to tell from the internal structure of the compiler whether or not it is just the original compiler-compiler.

In this case the compiler-cum-compiler-compiler is perhaps best called a Second-Order Compiler, that is, a compiler that can compile any program written in any language, provided the source program is preceded by a definition of the language in the (fixed) compiler-compiler language. It can do all this process in one run, starting with the basic compiler-compiler. Or at any point a subset of source language definition can be stored with it on tape, subsequent calls on it only requiring language outside this set to be defined before compiling the source program; and additions to the subset on tape can be made at any time.

Now the language of a conventional compiler is fixed: that is, the compiler can only translate a program written using the fixed set of SS formats defining the language. Extensions to the language in the program can be done indirectly using the conventional routine mechanism that is usually built-in to the source language. This means that provided the correct format for specifying routines and calling them is adhered to, library routines, or routines specially written by the programmer for his program, can be included in his program and called by instructions that read like basic SS instructions in the language.

However the second-order compiler allows a programmer to extend the language any program is written in indefinitely, by preceding the source description with a definition (in the compiler-compiler's language) of any previously undefined language. In principle each programmer can have his own breeding of the second-order compiler on magnetic tape, comprising the basic compiler-compiler, and a subset of useful basic source instructions (together with the AS routines involved in their definition) which are common to all programmers in his field of work; to this he adds his own personal selection of library and system format routines, maybe with adaptations, together with some private definitions, to give a personal language in which he can most easily express himself when programming. Then for any particular program, any further language particular to the program can be defined before it is used in the source program, as part of the program description. Thus any program can be written in a language uniquely adapted to the program itself, its writer(s) and his field of work.

Current work by the author, described in Napper (1966), is implementing such a system, providing a basic source language subset, and devising suitable behind-the-scenes organisation and establishing certain conventions to enable the ordinary programmer to make simple additions to his language safely and without detailed knowledge of compiler-compiler theory. The language of the basic subset is in English Mode with instructions reading as English and combined in a clause and sentence structure. There is a Mathematical Mode similar to a simple conventional autocode language, but the more interesting use of defining language is expected to be in English Mode. So the organisation provided is such that definitions of the language are as simple as possible, and all English Mode formats will fit automatically into the clause and sentence structure. With this freedom of definition it is possible to write programs which at higher levels read as a computer-independent description of the job being programmed, in free and natural English. Further, the phrases, formats and format routines of the language description, with suitable choice of English class word names, can be made to read as a computer-independent description of format and meaning.

Such natural language is not particularly suited to purely mathematical programming, but in those sections of program where English is a suitable medium of expression, this second-order compiler allows very complicated formats to be built up, which retain complete clarity (in the context of the job on hand) because they correspond to the original description of a more complicated task (including clauses, options of repeated phrases or clauses, or free strings that are implicit routine calls). These can be used safely instead of breaking them down into a set of more basic instructions connected by a control complex.

For example, consider the format:

FORMAT [ORD] = reorder[S]list[S][OF][S][PROPERTY][&PROPERTY*?][S][OF][S]EACH[S]
               [item][WHICH?][LIMITS?][S]in[S][a,de]scending[S]order[,?][S] 
               according[S]to[S][CRITERION][IF-EQUAL..*?][PAUSE]
where [IF-EQUAL..] = ;[S]if[S]equal[,][S]order[S]in[S][a,de]scending[S]order[,?] 
                     [S]according[S]to[S][CRITERION]

[Note: [S] denotes a general space (including comment or noise words), ? denotes an optional phrase, *? an optional phrase maybe repeated; [ORD] is the format class (a subclass of SS) of ordinary English Mode instructions as opposed to conditional clauses.]

This has been used in a program as follows:

Reorder the list of the ALLOCATION, IDEAL-COLUMN-OF-THE-ALLOCATION, and REFERENCE for EACH hole which is affected by the current priority class, from the bottom-hole-in-the-column up to the top-hole-in-the-column, in descending order according to the ((priority-of-the))ALLOCATION; if equal, order in ascending order according to the different between the IDEAL-COLUMN-OF-THE-ALLOCATION to the hole and the current column.

In this particular example which is affected by the current priority class is not one of the formally defined alternative forms of a which clause being the last alternative, the phrase [ANYTHING] which accepts everything up to the next piece of punctuation. Whenever it is required to discriminate the subset of [item]s to be operated on, this alternative is transformed into the call for a conditional clause if the [item] [ANYTHING], e.g., if the hole is affected by the current priority class: . . .. Note however that this transformation is not carried out in the reorder format; it is carried out automatically and generally, at a lower level, as part of the standard loop control machinery, and it is transmitted through the level of the format routine automatically when loop control formats are used, for example REPEAT for EACH [item] [WHICH?] from the current [item] back to the [STARTING-POINT]. It is hoped that such a transformation of the power of expression in programming will enable experienced programmers to write large programs with great accuracy and clarity of description - so that they can be easily understood and even amended with safety by someone other than the writer. However, leaving aside such as yet nebulous and unproven advantages, the second-order compiler provides three significant practical advantages compared to existing autocodes with their conventional routine mechanism:

New units of language can have completely free formats (within reason) whereas in an autocode every routine call must obey a set format. In both cases the format of routine description must be as laid down by the compiler.
The source program compiled using the second-order compiler is more efficient. Every instruction in the second-order compiler's source program will be compiled straight into tailor-made machine code without resorting to a cue-subroutine mechanism unless this is more appropriate; and in a complicated routine with many variations of format (e.g., 'reorder'), only the subset of instructions required to deal with the variant on hand need be compiled, and hence obeyed, whereas if a subroutine is dealing with such a general job, the particular relevant subset must be isolated at execution time.
No declaration of parameter type is required, as the parameters involved are those of the block in which the instruction appears - the parameters referred to here are the conventional parameters of the source program, and they will usually be represented in format routines by phrase-identifier parameters of the compiler program. Directives can be used to establish parameter types if required, and in particular local parameters can have their types set by association, e.g., LOCAL NAME: list-value-being-ordered; type as for [PROPERTY] [OF] [item].

As a simple example, consider the following routine as defined in a conventional autocode:

 
routine form (<type> name sum, <type>: x, <type>: y)
sum=x+y
end

where the <type> has to be specified in each of the three cases (e.g., all real or all integer) - hence the routine could only be used for one combination of types throughout the program.

This routine is called by, e.g.,

form (sum1, a, b)

With the second-order compiler in English Mode, we have:

FORMAT [ORD] = form[S]sum[QUALIFIED?][S]of[S][NUMBER][S]and[S][NUMBER][PAUSE]

where [QUALIFIED?] allows any further symbols up to the next space.

ROUTINE [ORD]≡ form the sum[QUALIFIED?] of [NUMBER/1] and [NUMBER/2]. 
              Set the sum [QUALIFIED?] = [NUMBER/1] + [NUMBER/2] .     END

This format can then be called by, e.g., form the sum:1 of a and b and the effect of the machinery will be the same as if set sum:1=a + b had been written instead.

In both cases x and y or the [NUMBER]s could be any arithmetic expression, but this form of language would not be natural with complicated expressions.

Assuming that all the numbers involved are simple items, then the second-order version would compile direct into the two basic operations, an assignment and an addition. In the autocode version, in the cue the values of a and b and the address of sum1 would be copied, and the instructions obeyed for jumping to the routine and going down another level of the stack; then the two basic operations would be obeyed; then the stack pointer would be reset and control passed back to the block containing the cue. Thus three or four times as many instructions would be obeyed in execution for each call of the routine.

For longer routines the balance of advantage in efficiency moves towards the routine form, as the number of instructions obeyed in the routine begins to swamp those involved in the entry-exit mechanism, and compiling a large block of instructions for each call on the format (instead of just a cue) outweighs the advantage of quicker execution. Then of course a formal routine mechanism in the basic source language of the second-order system can be used, or the format routine could organise its own subroutine to do the bulk of the procedure, compiling a cue and the subroutine on the first use of the format and subsequently compiling only a cue.

It should be emphasised that this (macro) facility for compiling nonbasic instructions direct into machine code causes limited increase in efficiency when used where a routine mechanism is commonly used in a conventional autocode program, to save repeatedly writing out a sizeable block of similar instructions in the program. By implication the block of instructions is likely to take long enough in execution to render the relative time spent in entry and exit insignificant. The facility is however critical when it is used in trivial definitions where the chief object is to write an instruction representing perhaps only one or two basic instructions simply to achieve a greater measure of self-description in the instruction in the context of the job being programmed; the basic language of an autocode must of course be very general and therefore neutral in meaning, and the main opportunity for self-description is usually only in the programmer's choice of parameter names.

3. THE THIRD-ORDER COMPILER

While the compiler-compiler is a great improvement on a machine code assembly language, it only places the compiler-writer in a similar position to the programmer of a conventional autocode. It would be nice to confer on the compiler-writer the same relative advantage that the second-order compiler confers on the ordinary programmer. And whereas it is debatable how far programs containing much mathematical calculation are suited to expression in a nonmathematical language (e.g., English, as encouraged by the author's implementation of a second-order system), there is little doubt that compiler-writing is.

The compiler-compiler can be extended and adapted, just as without it compilers can be; but this entails altering a machine code program (or one written in a primitive language), which is difficult and dangerous even if done by a member of the original compiler-writing team. So in practice the compiler-writer using a compiler-compiler is restricted to the language-processing machinery available. Further, he has to use the fixed (AS) routine mechanism provided, and although it technically enables him to extend his language, in practice (as in current autocodes) extensive use of it is prohibitively inefficient. So he has to program in the simple autocode-type language provided for him. Relative to the current autocode user he has the advantage that routine calls do not have to obey a set format, but the disadvantage that the top-level routine of his program and all the compiler-compiler machinery used in compiling (i.e., that is part of his program) is written in a primitive language and hence is in practice unalterable and its precise effect obscure.

In order to effect the similar revolution in the compiler-writer's lot that the second-order compiler does for the ordinary programmer, we require a third-order compiler. This would provide the compiler-writer with the analogous extra facilities required to make programming easier. In addition it would formally recognise the three modes of operation of the system and provide three suitable basic languages for each, with appropriate facilities for each.

The three modes, with a summary of the existing situation and the more obvious extensions required, are as follows:

1. Compiler-compiler mode.

This reads the definitions of language written in the compiler-compiler language and converts them into machine code sections of the compiler to translate source program. There are only a handful of master-instructions commonly used, e.g., FORMAT, PHRASE and ROUTINE. All the machinery involved in language definition and recognition is written in machine code or primitive autocode (by the compiler-compiler writer). Every nonbasic routine call recognised in a format routine is planted in cue form.

We require from the third-order compiler a wider range of master-formats, with associated routines written in an understandable simple (as opposed to primitive) language, so that the compiler-writer can adapt the machinery he is provided with, and easily define more of his own; in particular it should be easy for him to incorporate existing machinery as subroutines of his new machinery.

For example, it might be convenient to have different forms of ROUTINE format, e.g., for AS routines only used by the compiler-writer, for AS routines which can be used by the source programmer, and for SS routines, one each for ordinary instructions, conditionals, and directives (i.e., instructions that take effect at compile time not run time). In PHRASE definitions, different forms may be required, e.g., to automatically construct full definitions from incomplete specifications; there is an obvious need illustrated in Napper (1966) for a master-format, e.g., KEY-WORD [hole], to construct a phrase definition for the infinite set of composite words containing the word hole, e.g., hole, hole:A, ideal-hole, hole-above, top-hole-in-the-column, BUT NOT whole. It might also be necessary to adapt the language-processing machinery to deal with such special forms.

To revolutionise the compiler-writer's freedom of expression, it is required to be able to enter format routines for AS instructions recognised in compiler-compiler mode, instead of just planting a cue so that they are obeyed interpretively in compiler mode; so that every AS instruction that does not involve a phrase-identifier can be compiled direct into compiler-mode machine code (if required). This is analogous to the tailor-made advantage of the second-order compiler over conventional autocodes in compiling non-basic instructions. To meet this requirement involves a considerable addition to the theory and structure of the second-order compiler. In particular it has the same problem as in the source program analogy, of wanting two different forms for each definition of language - a cue-subroutine form and a macro form depending on whether it is possible or economically desirable to compile a separate set of instructions. It is expected that this problem will be formally solved in the second-order system for source program, so that the same definition can cover all cases in the most effective manner without any special help from the programmer; it is therefore hoped that the analogous problem will prove soluble for the third-order system in compiler mode.

2. Compiler mode.

This reads the source program, recognising successive instructions as members of the SS format class, setting up the appropriate phrase-identifier values for each class word occurring in the format, and entering the appropriate format routine to compile the requisite source program. Compiler mode uses some of the built-in compiler-compiler machinery, but most of the programming involved is in the organisation written by the compiler-writer who has provided the basic subset of source language and established the corresponding conventions with which he can conveniently build up less basic permanent and library routines, and in particular which enable the source programmer to write his own definitions of language easily and safely. The compiler-writer's language is the basic directive subset provided by the compiler-compiler, and his routine mechanism is the AS routine, which although it provides freedom of format is uneconomic if used extensively for small routines.

We require from the third-order compiler a basic English Mode programming language similar to that the author's second-order system provides for the source programmer, to enable the compiler-writer to describe his basic programming in compiler mode clearly and accurately in English. On top of this he can define his own language freely and efficiently using the more powerful AS machinery described for compiler-compiler mode. It is not convenient to use the same language as the source subset, as this can cause confusion where a format routine contains both AS and SS instructions (as the former usually take effect at compile time and refer to the compiler organisation of data, and the latter usually take effect at run time and refer to the source program organisation). Also the conventions of compiling machine code may be different; for example, the compiler's routines may have to be relocatable but the source's may not have to be.

3. Source mode.

Having compiled the source program in compiler mode, the compiler passes control to the source program, which executes the job described by the source program, independent in general of the compiler.

The source programmer already has the benefit of the second-order facilities which the third-order compiler was designed to provide for the compiler-writer. The second-order system gives him a very powerful means of expression, and it compiles more efficient programs (but takes longer over compiling). What further benefit can he hope for from a third-order system?

4. A CONTEXT FOR FREE MAN-MACHINE COMMUNICATION

At present about the only context for a compiler-compiler has been the introduction of a new large computer, where a number of compilers are required quickly, mostly adaptations to the machine of existing languages, but also one or two new ones. For new languages it might also be desirable to revise the compiler substantially after a year or two's practical experience. Such calls for a compiler-compiler are rare enough; what immediate prospect is there of wanting sufficient repetition of the compiler-writing process to justify a further refinement of the facilities?

The answer to this question derives from the answer to the previous question. Any further advance in programming beyond a second-order system must surely be in the communication from the machine back to the programmer, to help him write, test and run his programs. The second-order system provides him with a powerful means of communication to the machine - any further increase in power will only aggravate the existing imbalance. For at present any communication from the compiler tends to be very primitive both in language and in content. This is partly due to economic considerations, but chiefly it is because there is no subadaptation of the communication from the compiler; the same programmed responses apply equally to all programmers, and with few exceptions to all programs.

The two most obvious ways of narrowing the context within which communication is made, and hence increasing its relevance and variety, is to make communication easily specifiable from individual non-basic routines (e.g., library routines), and to make it adaptable to the individual programmer. There is clearly great scope for communication in large and complex routines, e.g., in reorder ... or in mathematical routines where there are many ways of carrying out the operation depending on the characteristics of the data.

The second-order system goes part of the way towards providing this narrowing of context by encouraging a great build-up in routines, and allowing adaptations of existing language to the individual. Many of these routines may be trivial, apparently saying the same thing in a different way. But the fact that we want to say something different usually implies that the context is different.

However it is not ideal for the programmer to have to specify all the communication back to himself to achieve the required adaptation. Neither could the compiler-writer deal personally with all the users of his compiler. The answer may be to split ordinary programmers into groups each looked after by a master-programmer who provides them with a comprehensive programming service, providing teaching and advice as well as help in writing, testing and running their programs. In a large University, say, the structure suggests itself of one or more master-programmers in each department supervising a group of pupil programmers. Each department would have its own adaptation of the most suitable language(s) based on the common basic language together with a set of library routines tailored to the requirements of the department - a dialect. Each ordinary programmer would have a set of private formats (now fairly small and mostly written in collaboration with the supervisor), and these together with his supervisor's dialect would be called in to translate all his programs.

The master-programmer system provides the justification for a third-order system. For it would bring about a proliferation of compilers (but not necessarily of languages). It also places a heavy burden on such a system. For the emphasis is not so much on a compiler-writer providing a working language, as on a master-programmer, given a basic compiler system, providing a comprehensive computing service for a set of ordinary programmers. To do this he will require a very powerful and accurate language, particularly as his system will be subject to continual amendment.

There is not space to outline the potentialities of such a system in newer fields that might become possible, such as teaching programmers, error detection and correction, and monitoring in general (in particular running programs under compiler control - not necessarily interpretively). All these newer possibilities provide a context for language generation, and if all the higher-order systems programming is also in expressive English, all communications should be immediately understandable in their context.

The following brief examples will illustrate two or three more obvious opportunities for language generation. These are applicable both to straightforward compiling and in an off-line system, although of course they are equally applicable in the contexts mentioned above, and are more suited to an on-line multiple-access computer system. Directive English refers to the projected free English programming in compiler mode of the third-order system, where the parameters are frequently phrase-identifiers as well as more conventional variables.

The machinery of the compiler-compiler is already adaptable to providing easy generation of language. A format is a generalised sentence, reading (with imaginative choice of class-word names) as a general English sentence, which represents an indefinite set of source sentences; its precise meaning can be derived from breaking down the general structure, using phrase-handling instructions that read in directive English as a computer-independent analysis of meaning. In the same way, new sentences can be built up in appropriate contexts in format routines, that read as a relevant generalised communication in class-word form, but which on phrase-identifier substitution are output as source string immediately understandable in the particular context of the original variant of the format.

Apart from giving information and printing out errors in a more sophisticated way, there are two immediate contexts for language generation opened up by second-order programming. The easiest is for a format routine to print out the source string of the variant it is dealing with. This is of little interest if it is an ss routine being obeyed at top level, as the sentence already occurs in the source program. But if a switch is set so that this printing out only occurs in the level one below top level, or only at the level of basic English, the compiler will then give a continuous translation of the source program into English at one level down or into basic English. For example, when form the sum of a+b was being compiled, set the sum = a+b would be printed. The re-order instruction quoted previously would translate into about 35 English instructions in the level of the format routine (i.e., one level down) or about 80 at the level of basic English.

Similarly, if each format routine gives a REPORT directive, the compiler can arrange for a program to print out a continuous report of the instructions it is obeying in a section of program (at top level, or a lower level if required). The report format would be basically the same format as the instruction together with a specification of which values were to be printed as well. Thus the report on form sum if a was 2 and b was 3 would be form the sum (5) of a and b. Conditional clauses would require two report formats, one if the condition was satisfied (since . . .) and one if not (note that. . . not...). In reporting directive English for master-programmers, the values of the phrase-identifiers would of course be source strings. Such reporting is quite impressive, since it reads as the master-programmer's thought-process in translating higher-level language into the requisite more basic language.

However the most interesting extension of the existing machinery is in conversing. Obviously the system can answer any questions put to it, as the question would be recognised as a format in the normal way, and its format routine would establish the answer and generate a suitable output format to reply. However it is equally possible (particularly on-line) to formulate a question in a format routine where further information is required of the programmer. This will be printed out with source string substitution where phrase-identifiers are involved, and if the instruction specifies the format of the reply, it can then recognise the reply in phrase-identifier form, thus making new information available to the routine.

For example, a simple integral loop control format might be in its full form REPEAT for EACH [item] from a [STARTING-POINT] [in, ex]clusive [up, down] to a [FINISHING-POINT] [in, exclusive, but the actual format REPEAT for EACH [item][LIMITS?] might allow all or any of the information about the [LIMITS?] to be missing. If no direction [up, down] is given, up is assumed; inclusive is assumed where no indication is given; if the starting-point or finishing-point is not given, the appropriate preset parameter FIRST-[ITEM'] or LAST-[ITEM'] is generated, where [ITEM'] is [item] with small letters raised to capitals.

If a programmer is using the format for the first time, or even a particular alternative of [LIMITS], it might be wise to check back that the correct assumptions have been made. So at the point when all the missing parameters have been generated, if necessary the routine could proceed:

ASK THE QUESTION: [REFERENCE]

I assume that the limits for the [item] are from the [STARTING-POINT] [in, ex/1]clusive [up, down] to the [FINISHING-POINT] [in, ex/2]clusive.

Is this correct?

WHERE THE [REPLY/1] SHOULD BE [YES], [NO], OR [MAYBE].

IF THE [REPLY/1] is [NO] : ASK THE QUESTION: what are they [please?]?

WHERE THE [REPLY/2] SHOULD BE [for EACH [item] ?][LIMITS]. THEN START AGAIN REPLACING THE [LIMITS?] WITH THE [LIMITS] OF [REPLY/2].

Note that the ASK format with its associated WHERE clause would be part of the standard compiler-compiler machinery that the master-programmer can use freely (maybe after adapting it slightly for his private use). The ASK directive prints out the symbols following, substituting the symbol string on hand for any phrase-identifiers involved. Then the WHERE clause, which locally defines the phrase (format) of the appropriate reply, reads in the reply, setting up a new phrase-identifier value that can then be tested or otherwise manipulated in the usual way by the phrase-handling directives. If the WHERE machinery fails to recognise a reply it might automatically request that the reply be repeated, giving the principal alternative formats of the required reply.

In this way, with standard machinery provided by the third-order system and maybe adapted for his own use, the master-programmer could freely specify any exchanges he requires to make in any situation. Routines of his dialect would probably have separate memory for each programmer (rather than existing separately for each) and would be able to adapt to the individual, e.g., only asking the above question if his last use of the current alternative form had been incorrect. There would be scope for little learning programs to deal automatically with such situations and others. If the master wanted to carry out a certain action for a subset of programmers only, the requisite sequence would merely be prefaced by, e.g., IF THE PROGRAMMER is Smith OR Jones:.

Thus it could be possible for the master-programmer to communicate freely with his pupils, both direct and via his compiler, gradually building more and more of his actions into the system as he sees automatic patterns developing. And if programming in a more natural self-explanatory language is encouraged in source programming, this will enable him more quickly to see the context of source program when he is obliged to deal with it direct rather than automatically through generalised directive English. Similarly there would be analogous free communication in directive English between the master-programmer and the central master-master-programmer(s) (compiler-compiler-writer) to help him keep his master-program correct. This would give a double pyramid of communication, with additional communication existing laterally between master-programs (e.g., exchanging information and looking for someone else's format to match an unrecognised instruction), and between master-programs and the computer's Supervisor system, etc. All these communications would be in one or other of the modes of English, since all the programming concerned would be.

In this way it might be possible to have a computer installation that is a continuously evolving complex of people and mechanical systems all intercommunicating freely in English in a very large but economically viable context. In particular it should be possible to get some subsystems operating without a formal master, where a master-programmer has succeeded in mechanising a sufficiently high proportion of his actions, so that in the few remaining situations where the correct procedure is not formalised, little artificial intelligence programs can be used to make decisions, or the system could ask for help from another master-programmer.

5. REFERENCES

Brooker, R. A., MacCallum, I. R., Morris, D., & Rohl, J. S. (1963). The Compiler Compiler. A. Rev. autom. Progrmg, 3, 229-275

Napper, R. B. E. (1965). An introduction to the Compiler Compiler. Tech. Report, Dept. of Comp. Science, Univ. of Manchester.

Napper, R. B. E. (1966). A system for defining language and writing programs in 'Natural English'. Formal language description languages for computer programming, 231-248 (North-Holland).