Brooker-Morris Compiler Compiler

Jump To Main Content

Jump Over Banner


ACLApplicationsCompiler Compiler

Jump Over Left Menu

Early Translator Writing Systems


1. Introduction

1.1 Early Autocodes

Prior to 1950, most computer programs were written in assembly code; that is a low-level programming language where a one-to-one relationship exists between instructions in the language and the computer's machine code instructions. By 1950, people were considering whether it would be possible for the machine to generate the assembly code rather than humans.

In 1950, William Schmitt implemented Short Code for the Univac 1 In 1952, Alick Glennie produced an autocode programming language for the Machester Mark 1 computer (the compiler was 750 instructions). Laning and Zierler produced an autocode for the the Whirlwind at about the same time. (Knuth lists 20 autocodes that predate Fortran and states that Glennie's autocode was the first one completed.)

1.2 Fortran

Work on a more ambitious autocode for the IBM 704 started around 1952. The proposal for Fortran I went to IBM Management in 1953, was specified by 1954 and compiler produced by 1957. The major deficiency of Fortran I was the absence of user-written subroutines and functions. These were added in Fortran II in 1958.

The need for a system to automatically write compilers had not really appeared so far as many computers still ran programs in assembler language or a single autocode. Implementing a single language on many different computers or implementing many languages on a single computer were still not major requirements.

1.3 Algol 58

By 1958, there was concern that programming languages would proliferate. There were already signs of this. In both the USA and Europe there was support towards defining a single programming language. A meeting was held in 1958 in Zurich which defined the language called IAL (International Algebraic Language) in the USA and Algol58 in Europe. This was followed quite closely by Algol60.

In the USA, languages such as JOVIAL, MAD and and NELIAC were defined as subsets of IAL. In Europe, there was enthusiasm for Algol 60 which had a number of unfortunate features:

The main result of the attempt at a single universal programming language was a proliferation of languages and dialects that meant it was impossible to implement the compilers needed by a specific customer without some automation of the production of compilers.

There was a need for a lexical analysis subsystem to resolve the problems associated with the variety of dialects and languages. It was also needed to cover up the deficiencies in the way some languages had been defined, notably Fortran.

For Fortran, syntax analysis is relatively straightforward once the lexical problems are resolved. Algol is significantly different. In the Algol 60 Report the language is defined in BNF, a method for defining legal programs in the language. It also had statements in English that also described the language. The block structure facilities in Algol 60 made it more difficult to do the syntax analysis.

The semantic description of Algol 60 is also quite difficult. Features were defined that were difficult to compiler. Ambiguities existed. Meanwhile the architectures of computers were changing with different storage systems, radically different instruction codes, and speed based on pipelining specific operations. In consequence, generating efficient code across a range of computers was also becoming difficult.

The need for some automation in the production of compilers was needed.

1.4 Some Terminology

In describing translator writing systems, it is useful to define a set of terms that can be used irrespective of the system being described.

Backus-Naur Form (BNF)
A set of rewrite rules that define a programming language. A rule has the form:
 <address> ::= <name> , <street> , <zip>
The non-terminal address can be replaced by a non-terminal name followed by a comma followed by a non-terminal street followed by a comma followed by a non-terminal zip. Rewrite rules are applied starting from a single non-terminal, program say, until a string is obtained that contains no non-terminal symbols. This is a program in the language.
BNF language
A language defined using BNF
parse tree
Given a program in the BNF languge, it is possible to describe all the productions applied to create the program as a tree starting from the non-terminal program. This is called a parse tree.
syntax-directed compilation
Any system that can take the definition of a BNF language and generates a parse tree for an arbitrary program in that language.
(Manchester Mark 1 Autocodes: Alick Glennie, Tony Brooker 1954) Syntax Machine A E Glennie, 1960 PSYCO Princeton Syntax Compiler E T Irons 1960 Brooker-Morris Compiler-Compiler 1960 SIG/PLAN Working Group 1 on Syntax Driven Compilers 1962 Howard Metcalf Fall 1962 Val Schorre Meta I : Jan 1963 Lee Schmidt March 1963 Val Schorre et al Meta II : Spring 1963 Schneider and Johnson Meta III : 1964 J F Rulifson SRI Meta III : 1964 Book LISP-META : 1965 Oppenheim and Haggerty Meta V : 1966 O'Neil Meta PI : 1968 J F Rulifson Tree-Meta : 1968 J A Feldman FSL:1962/3 L F Mondshein VITAL:1967 CMU Cabal:1967 W Wulf CMU PQCC:1980 Computer Associates TGS 1964 Computer Associates CGS 1962 J C Reynolds COGENT 1962 J C McClure TMG 1963 S C Johnson YACC 1970 Early Translator Writing Systems Early Translator Writing Systems

Early Translator Writing Systems

2. Early Translator Writing Systems

In 1960, several systems appeared aimed at easing the task of producing a compiler. They went under a variety of names , such as translator writing system, compiler-compiler and meta-compiler. As such a system is capable of generating a compiler, it may be possible for it to generate itself. Although not a requirement, it adds a certain elegance to the system and makes enhancement of the system relatively straightforward. A base system can be defined and this can be used to define a more powerful system and so on. Three of the earliest systems were:

The three activities were reasonably independent and differed in their approach.

The following functions are required explicitly or implicitly in most compilers:

2.1 Syntax Machine

The Syntax Machine defines a top-down parser for the language to be compiled having possible rules of the form:

<A> ::= <B> <C> <D> ... <X>
<A> ::= <B> | <C> | <D> |... | <X>
<A> ::= <B> <C> <D> ...       {<X>}

The first parses an A as a B followed by a C ..etc. The second parses an A as a B or a C or a D ... The third parses an A as a B followed by a C .. by one or more X. Having this iteration, saves on recursion.

It is up to the person writing the translator for a specific language to ensure that no backup is required. However, the rules described above are converted into a pseudo flow chart that allows variations on this basic parsing. The Syntax Machine defines a machine-independent instruction set which defines the semantics of the program. It is possible to optimise the pseudo code before transforming it into the actual order code of a specific computer. The whole system was simulated on an IBM 650 and used to generate compiled code for both the IBM704 and IBM709. Many of the ideas that appear in the Syntax Machine were used in the S1, S2 and S3 Fortran compilers developed at AWRE Aldermaston for the IBM 7030 (Stretch) and later ported to the Ferranti Atlas II. However, by then, an operator precedence parser was used for syntax analysis.


Irons started his work on a syntax-directed compiler in 1959 while at Hanover for the summer.

PSYCO uses a bounded-context bottom-up parser with backup. That is at any stage in the recognition of a program, a finite set of previously recognised non-terminals and the next finite set of terminal systems will always define the action that the parser needs to take. PSYCO was mainly aimed at writing translators for Algol-like languages. The system started with a symbol table containing basic information and a parse of the Algol-like program collected declarative information about the variables used in the program. A second pass used this information to compile the executable part of the program.

Irons also recognised the need for handling errors in programs presented to the compiler.

2.3 Compiler-Compiler

The Brooker-Morris compiler-compiler (BMCC) has a diferent notation from either Glennie or Irons. The parser is a top-down recursive-descent parser with statements of the form:

FORMAT [SS] = [V] = [SAE]

It indicates that a source statement (SS) in the languge consists of a variable (V) followed by '=' followed by a signed arithmetic expression (SAE).

Associated with each FORMAT statement is a ROUTINE that defines the semantics of the statement:

ROUTINE [SS] = [V] = [SAE]
LET [SAE] = [±?][T][±T*?] 
Ac = [±?][T]  

The appropriate ROUTINE statement is passed the parse tree of the statement recognised. It can inspect the parse tree and decide what code should be generated.

The Compiler-Compiler effectively adds the definition of the language being defined to that of the Compiler-Compiler itself. Before releasing the final compiler, it is usual to remove the Compiler-Compiler's own statements from that of the language being compiled.

The Compiler-Compiler was used to produce Fortan II, Fortran IV, Algol, Extended Mercury Autode, Atlas Autocode, SOL, ACL, CPL, Elliott Autocode Mk III compilers for the Ferranti Atlas computer.

3. Early Translator Writing Systems

The early translator writing systems influenced a number of systems that followed soon after. We shall look at three of these and one other, TMG, that also comes later but appears to have not been influenced by the earlier systems.


John Reynolds' COGENT system was developed at the Argonne national Laboratory around 1962. COGENT stand for COmpiler and GENeralised Translator. The objective was to unify the concept of syntax-directed compilation with the more general but primitive concept of recursive list-processing. It is similar to BMCC in that it has two main constructs productions and generator definitions which correspond somewhat to FORMAT and ROUTINE in BMCC.

An example of a set of productions for a simple language is:


As in BMCC, several productions can be written as a single production:


COGENT uses the production statements to construct a parser, similar to the Irons parser, that takes the input and generates a parse tree. By giving each recognised production a unique internal number, the parse tree can be very compactly represented as a list structure. This list structure can be manipulated by the generator definitions and eventually output the compiled code. COGENT was implemented on the Control Data 3600. A novelty of the implementation was that backup spawned a set of processes that performed the various alternatives in parallel.

3.2 CGS

Computer Associates of Massachusetts was in the business of writing compilers. Compiler Generator System (CGS) was their first attempt at a general purpose translator writing system aimed at producing efficient object code with optimisation provided at several places in the system. The basic approach was a table-driven top down syntax analyser. It ran on an IBM 7090, a Burroughs D-825, and a CDC 1604.

The generated compiler had five phases:

  1. Syntactic Analyzer: converts input string into a tree-representation of its syntax
  2. Generator: transforms the tree into a sequence of macro-instructions
  3. Optimizer: recognises and eliminates redundant computation (eg common subexpressions, invariant computation out of loops, etc)
  4. Code Selector: assembles code fragments
  5. Assembler: binds the code fragments in form required by the compiler environment.

The Syntactic Analyzer, Generator, and Code Selector are driven by tables defined for a specific language and machine code using a BNF-like definition of each. The Syntactic Analyzer is used to generate the relevant tables.

3.3 TMG

TMG (stands for TransMoGrifier) was developed for the PDP-7 by R. M. McClure at Texas Instruments in 1975 and appears not to have been influenced by the earlier systems. The system was ported to the IBM 7040, IBM 7090, and CDC 1604. It was used at Bell Labs and also by the Multics Project. At Bell Labs, Dennis Ritchie used TMG to produce the compiler for B that later became C. McIlroy and Morris used TMG to write the EPL compiler for Multics. As an aside, Stephen Johnson named his system YACC because they already had TMG at Bell Labs! TMG was used to implement a subset of PL/I by Cal Tech.

TMG was aimed at constructing simple one-pass translators for some specialized language with almost no optimisation of code generated. It was defined in a language TMGL with a compiler for TMGL on the target machine. Later, the system was defined in itself.

The system is somewhat similar to the Glennie Syntax Machine in that the syntax specification of a program consists of sets of statements with a label, a syntactic specification, and a semantic action that takes place if the syntactic entity is recognised. The syntactic part consists of a set of actions with possibly two exit labels which give where to go next in the case of the syntactic entity being recognised or not. If the syntactic part is recognised, the semantic part is executed.

In the first 18 months TMG was used to define two different FORTRAN compilers, a logic simulation system, several data format converters, a geometric description language translator, and the TMGL compiler itself.

The system is not really aimed at handling structured languages like Algol 60.

3.4 Meta II

The SIG/PLAN Working Group 1 on Syntax Directed Compilers met monthly in the conference room at the UCLA computing facility. John Backus attended one meeting to see how BNF was being used as a compiler writing language. Several separate attempts at defining a translator writing system came out of that working group and the ideas finally came together initially as a system called META but finally as META II.

META II was a simple translator writing system written by V Schorre et al in 1962.

In META II, the language for which a translator is required must be specified as a set of syntax rules. These very much resemble the BNF notation except that they contain code generation commands as part of the syntax definition. There is limited memory in META II so there is a need to output information soon after it has been recognised.

A restrictive top-down method of syntax analysis is used in the META II system. The restriction is that it does not allow back tracking or left recursion. The main goal is to match the complete program with the Syntax Rule whose name follows .META at the head of the program. The name of the Syntax Rule appears at the start of the rule followed by the = symbol. The right-hand side of this Syntax Rule indicates which entities have to be recognised to achieve this goal. The recogniser achieves its main goal by looking for these smaller entities, from left to right, as individual subgoals. These subgoals are themselves defined in terms of other entities, and so the process of recognising a complete program becomes a matter of looking for progressively smaller entities, right down to the level at which basic symbols such as characters and numbers are recognised.

A typical Meta II rule would be:

SET = 'C' $ ( 'D' / 'E' )

This recognises any string consisting of the letter C followed by any sequence involving the letters D and E.

Meta II rules can include .OUT statements that generate output code. Recognised basic symbols (.ID, .STRING, .NUMBER) are stored and can be output by .OUT(*). Also two unique labels can be output for each statement by writing *1 or *2.

Meta II is a very simple system that can be defined in itself. This is the complete definition:

OUT1   = '*1' .OUT('GN1') / '*2' .OUT('GN2') / '*' .OUT('CI')/ .STRING .OUT('CL '  * ) .,
OUTPUT = ('.OUT' '(' $OUT1')' / '.LABEL' .OUT('LB') OUT1 ) .OUT('OUT') .,
EX3    = .ID .OUT('CLL' * ) / .STRING .OUT('TST'  *) /
         '.ID' .OUT('ID') / '.NUMBER' .OUT('NUM') / '.STRING' .OUT('SR') /
         '(' EX1 ')' / '.EMPTY' .OUT('SET') /
         '$' .LABEL *1' EX3 .OUT('BT ' *1 ) .OUT( 'SET') .,
EX2   = (EX3 .OUT('BF ' *1 ) / OUTPUT ) $(EX3 .OUT('BE') / OUTPUT) .LABEL *1 .,
EX1   = EX2 $( '/' .OUT('BT ' *1) EX2) .LABEL *1  .,
ST    = .ID .LABEL * '=' EX1 '.,' .OUT('R') .,
PROGRAM = '.SYNTAX' .ID .OUT('ADR' * ) $ ST '.END' .OUT('END') .,

4. Third Generation Translator Writing Systems

After the initial promise of early translator writing systems, there was considerable interest in producing better systems. The early systems had shown that the approach was viable. The understanding of the whole area of parsing and what was possible had improved. A theory of syntax analysis was evolving. The number of new languages arriving was still on the increase. Computer systems were evolving.

4.1 FSL

In the period 1963-67, Bob Floyd from Computer Associates had moved to Carnegie-Mellon University. Jay Earley, a doctoral student, was working on a parsing algorithm, the most efficient general context-free algorithm known. Alan Perlis et al were defining an enhanced version of Algol, Formula Algol, which would incorporate the manipulation of mathematical formulae within the language Algol. There was a need for a Formula Algol compiler. Jerome Feldman had completed his thesis, A Formal Semantics for Computer Oriented Languages, in 1964. This defined a compiler-compiler, FSL, that was capable of translating into machine language most of the existing programming languages. It was, therefore, used to produce the Formaula Algol Compiler for the Bendix G-21. FSL had two main subsections. A Syntax Loader that builds tables which will control the recognition and parsing of programs in the required language. A Semantic Loader builds a table describing the semantics of the instructions in the required language. The Syntax Loader generated a Production Language (PL), based on the work of Bob Floyd, that scanned the current input stream against the PL rules and decided on the action to take. The Semantic Loader defined the meaning of each language statement in a machine-independent code.

FSL was a major input t0 the CABAL Compiler-Compiler system produced at Carnegie-Mellon University in the period 1967 onwards.

4.2 TGS

TGS (sometimes called TRANGEN), Translator Generator System, was produced at Computer Associates as a follow on to CGS. It ran on an IBM 7094, CDC-1604, Univac M460 and GE-635 and was used to write translators for PL/I, ALGOL, and FORTRAN IV. Each compiler is driven by a set of tables, TRANTAB, described in a language called TRANDIR. Synatx Analysis was done by a system based on Floyd Productions. The code generation system was also used in later versions of the META systems.

4.3 Tree-Meta

After a number of intermediate systems, Rulifson defined Tree-Meta as a major advance over the earlier META systems. The TREE-META program is in two parts. The first part is a definition of the syntax for the source language, L, for which a translator is required and this is largely inherited from Meta II. The major difference is that instead of outputting code, the output is a tree built up as the syntax proceeds. The second part defines a set of code generation rules that generate the object code to be produced from this tree. The tree itself is quite flexible in format and is defined by the TREE-META user as he defines the syntax analyser for the language. The code generator can take note of any peculiarities in the form of the tree, if that will lead to better code being produced.

An example of the syntax recognise is:

EXPRESSION = TERM $ ( '+' TERM :ADD[2] / '-' TERM :SUB[2] ) ;
TERM = FACTOR $ ( '*' FACTOR :MULT[2] / '/' FACTOR :DIV[2] );

The construct ADD[2] generates an ADD tree node from the top two items on the stack. The code generator instruction would be of the form:

ADD[-,-] => *1 ' PLUS ' *2 ;

Separating code generation from the building of the parse tree means that quite efficient code can be generated.

5. The Demise of Translator Writing Systems

Interest in Translator Writing Systems continued after the mid-1960s but the interest in producing a universal system capable of production-quality compilers decreased. This occurred for a number of reasons:

6. References

Author Title Publication Relates to
Andrews D I, Rulifson J F A Meta Compiler for the SDS 940 Draft Dec 1967 Tree-Meta
Andrews D I, Rulifson J F A Meta Compiler System for the SDS 940 SRI December 1967 Tree-Meta
Barnet J A et al The LISP 2 Programming Language and System FJCC68 Meta
Bastian A L A Phrase-Structure Language Translator June 1962
Birman A The TMG Recognition Schema 1970 TMG
Book E, Bratman H, Schwartz J A One Pass JOVIAL Compiler SDC, Jan 1963
Book E, Schorre D V, Sherman S J The CWIC/36O system, a Compiler for Writing and Implementing Compilers SIGPLAN June 1970
Brooker R A The Autocode Programs Developed for the Manchester University Computers BCS 1958 Vol 1 p15 BMCC
Brooker R A Further Autocode Facilities for the Manchester (Mercury) Computer BCS 1958, p124 BMCC
Brooker R A, Morris D An Assembly Program for a Phrase Structure Language BCS Vol 3 1960 p168 BMCC
Brooker R A Some Techniques for Dealing with Two-Level Storage BCS 1960 p189 BMCC
Brooker R A, Morris D Some Proposals for the Realization of a Certain Assembly Program BCS 1961 BMCC
Brooker R A, Morris D, Rohl J S Trees and Routines BCS Vol 5 p33 1962 BMCC
Brooker R A, Rohl J S, Clark S R The main features of Atlas Autocode BCS Vol 8 p303 1966 BMCC
Brooker R A, Morris D, Rohl J S Compiler Compiler facilities in Atlas Autocode BCS 1967 BMCC
Brooker R A, Clark S R An index directed compiler BCS 1967 BMCC
Brooker R A, Morris D, Rohl J S Experience with the Compiler Compiler BCS Vol 9 1967 p345 BMCC
Carr C S, Luther D A, Erdman Sherian A Meta Compiler System for the Univac 1108 and the General Electric 645 Tree-Meta
Cattell R G A Survey and Critique of Some Models of Code Generation CMU Nov 1977
Cheatham T E, Sattley K Syntax Directed Compilation SJCC 1964 TGS
Cheetham T E, Christensen C, Floyd R W Collected Research Papers CA Nov 1965, TGS, AMBIT etc CGS,TGS
Cheetham T E, Christensen C Research in Machine-Independent Software Programming CA June 1968, CGS, TGS, AMBIT CGS,TGS
Conway Design of a Separable Transition-Diagram Compiler CACM July 1963
Coulouris, George The Compiler Compiler - Reflections of a User 50 Years On 2013
Dove R K CABAL: environmental design of a compiler-compiler Jan 1968 Various
Earley J Formula Algol CMU June 1967 FSL
Earley J An Efficient Context-Free Parsing Algorithm CACM Feb 1970 FSL
Engelbart D C, English W K, Rulifson J F Development of a Multidisplay, Time-Shared Computer Facility and Computer-Augmented Management System Research RADC Contract AF 30(602)-4103 SRI April 1968 Tree-Meta
Feldman J A A Formal Semantics for Computer Oriented Languages May 1964, Carnegie Institute FSL
Feldman J A A Formal Semantics for Computer Languages and its Application in a Compiler-Compiler CACM Jan 1966 BMCC
Feldman J, Gries D Translator Writing Systems CS 69 June 1967 Various
Feldman J, Gries D Translator Writing Systems CACM Feb 1968 Various
Floyd R W A Descriptive Language for Symbol Manipulation JACM Oct 1961
Floyd R The Syntax of Programming Languages-A Survey IEEE Aug 64
Garwick J V GARGOYLE, A Language for Compiler Writing CACM Jan 1964
Gilbert P, McLellan W G Compiler Generation Using Formal Specification of Procedure-Oriented and Machine Languages RADC-TR-67-454 August 1967
Glennie A E Automatic Coding for the Mark 1 December 1952, Knuth Papers
Glennie A E On the Syntax Machine and the Construction of a Universal Compiler Computation Center, Carnegie Institute of Technology, Technical Report No. 2 July 1960 Syntax Machine
Graham R M Bounded Context Translation SJCC 1964
Hay R E, Rulifson J F MOL940:Preliminary Specification for an Algol-like Machine-Oriented Language for the SDS-940 March 1968 Tree-Meta
Hopgood F R A, Bell A G The Atlas ALGOL preprocessor for non-standard dialects BCS 1967 BMCC
Hopgood F R A Compiling Techniques Chapter 10 Compiler Compilers, 1969 Various
Irons E T PSYCO: Princeton Syntax Compiler Jan 1961 PSYCO
Irons E T A Syntax Directed Compiler for ALGOL 60 CACM jan 1961 PSYCO
Irons E T An Error-Correcting Parse Algorithm CACM Nov 1963 PSYCO
Iturriaga R, Standish T A, Krutar R A, Earley J C Techniques and advantages of using the formal compiler writing system FSL to implement a Formula Algol compiler SJCC 1966 FSL
Knuth D E, Pardo L T The Early Development of Programming Languages August 1976
Ledley R S, Wilson J B Automatic-Programming Language Translation Through Syntactical Analysis CACM March 1962 PSYCO
McKeeman W M An Approach to Computer Language Design CS48 August 31 1966
McClure R M TMG-A Syntax Directed Compiler ACM 1965 TMG
Mondshein L F VITAL Compiler-Compiler SYSTEM Reference ManualL Lincoln Lab TECHNICAL NOTE 1967-12, 8 February 1967 VITAL
Morris D, Rohl J S The Atlas Compiler System BCS 1967 BMCC
Naur et al Report on the Algorithmic Language ALGOL 60 CACM May 1960
Newman W M Interactive Graphics on a Small Microprogrammable Computer QMC Sept 1973 Tree-Meta
O'Neil J T META PI-An on-line interactive compiler-compiler FJCC 1968 Meta
Oppenheim D K, Haggerty D P META5: A Tool to manipulate string languages ACM National 1966 Meta
Padua D The Fortran I Compiler
Perlis A J, Iturriaga R, Standish T A A Definition of Formula Algol March 1966 FSL
Pollack B W An Annotated Bibliography on the Construction of Compilers STAN-(X-249-71 DECEMBER 1971 Various
Presser L, Melkanoff M A Software measurements and their influence upon machine language design SJCC 1969 Meta
Reynolds J C An Introduction to the COGENT Programming System 1965 COGENT
Reynolds J C COGENT Programming Manual 1965 COGENT
Rosen S A Compiler-Building System Developed by Brooker and Morris CACM Volume 7/Number 7/July, 1964 BMCC
Rulifson J F A TREE META for the XDS 940 SRI April 1968 Tree-Meta
Schneider F W, Johnson G D Meta-3: A Syntax-Directed Compiler Writing Compiler to Generate Efficient Code 19th ACM national conference 1964 Meta
Schorre D V META II: A Syntax-Oriented Compiler Writing Language UCLA Computing Facility 1965 Meta II
Schorre/Rulifson LOT Notes 1966 Tree-Meta
Schorr, Herbert Analytic differentiation using a syntax-directed compiler BCS 1965 PSYCO
Scott D W WIZOR, A Compiler Compiler for the GE 225 Computer ACM 1962
Shaw M, Fierst J Formal Compiler-Descriptive Systems CABAL 1967 Various
Trout R G A compiler-compiler system ACM 1967
Turing A M, Brooker R A Programmers Handbook for the Manchester Electronic Computer Mark 1 (3 editions) 1950-3 BMCC
Warshall S A Syntax Directed Generator 1961 CGS
Warshall, Stephen Summary of a Method for the Automatic Construction of Syntax Directed Compilers Computer Associates, December 1962, AFCRL-62-955 CGS