Prior to 1950, most computer programs were written in assembly code; that is a low-level programming language where a one-to-one relationship exists between instructions in the language and the computer's machine code instructions. By 1950, people were considering whether it would be possible for the machine to generate the assembly code rather than humans.
In 1950, William Schmitt implemented Short Code for the Univac 1 In 1952, Alick Glennie produced an autocode programming language for the Machester Mark 1 computer (the compiler was 750 instructions). Laning and Zierler produced an autocode for the the Whirlwind at about the same time. (Knuth lists 20 autocodes that predate Fortran and states that Glennie's autocode was the first one completed.)
Work on a more ambitious autocode for the IBM 704 started around 1952. The proposal for Fortran I went to IBM Management in 1953, was specified by 1954 and compiler produced by 1957. The major deficiency of Fortran I was the absence of user-written subroutines and functions. These were added in Fortran II in 1958.
The need for a system to automatically write compilers had not really appeared so far as many computers still ran programs in assembler language or a single autocode. Implementing a single language on many different computers or implementing many languages on a single computer were still not major requirements.
By 1958, there was concern that programming languages would proliferate. There were already signs of this. In both the USA and Europe there was support towards defining a single programming language. A meeting was held in 1958 in Zurich which defined the language called IAL (International Algebraic Language) in the USA and Algol58 in Europe. This was followed quite closely by Algol60.
In the USA, languages such as JOVIAL, MAD and and NELIAC were defined as subsets of IAL. In Europe, there was enthusiasm for Algol 60 which had a number of unfortunate features:
The main result of the attempt at a single universal programming language was a proliferation of languages and dialects that meant it was impossible to implement the compilers needed by a specific customer without some automation of the production of compilers.
There was a need for a lexical analysis subsystem to resolve the problems associated with the variety of dialects and languages. It was also needed to cover up the deficiencies in the way some languages had been defined, notably Fortran.
For Fortran, syntax analysis is relatively straightforward once the lexical problems are resolved. Algol is significantly different. In the Algol 60 Report the language is defined in BNF, a method for defining legal programs in the language. It also had statements in English that also described the language. The block structure facilities in Algol 60 made it more difficult to do the syntax analysis.
The semantic description of Algol 60 is also quite difficult. Features were defined that were difficult to compiler. Ambiguities existed. Meanwhile the architectures of computers were changing with different storage systems, radically different instruction codes, and speed based on pipelining specific operations. In consequence, generating efficient code across a range of computers was also becoming difficult.
The need for some automation in the production of compilers was needed.
In describing translator writing systems, it is useful to define a set of terms that can be used irrespective of the system being described.
<address> ::= <name> , <street> , <zip>The non-terminal address can be replaced by a non-terminal name followed by a comma followed by a non-terminal street followed by a comma followed by a non-terminal zip. Rewrite rules are applied starting from a single non-terminal, program say, until a string is obtained that contains no non-terminal symbols. This is a program in the language.
In 1960, several systems appeared aimed at easing the task of producing a compiler. They went under a variety of names , such as translator writing system, compiler-compiler and meta-compiler. As such a system is capable of generating a compiler, it may be possible for it to generate itself. Although not a requirement, it adds a certain elegance to the system and makes enhancement of the system relatively straightforward. A base system can be defined and this can be used to define a more powerful system and so on. Three of the earliest systems were:
The three activities were reasonably independent and differed in their approach.
The following functions are required explicitly or implicitly in most compilers:
The Syntax Machine defines a top-down parser for the language to be compiled having possible rules of the form:
<A> ::= <B> <C> <D> ... <X> <A> ::= <B> | <C> | <D> |... | <X> <A> ::= <B> <C> <D> ... {<X>}
The first parses an A as a B followed by a C ..etc. The second parses an A as a B or a C or a D ... The third parses an A as a B followed by a C .. by one or more X. Having this iteration, saves on recursion.
It is up to the person writing the translator for a specific language to ensure that no backup is required. However, the rules described above are converted into a pseudo flow chart that allows variations on this basic parsing. The Syntax Machine defines a machine-independent instruction set which defines the semantics of the program. It is possible to optimise the pseudo code before transforming it into the actual order code of a specific computer. The whole system was simulated on an IBM 650 and used to generate compiled code for both the IBM704 and IBM709. Many of the ideas that appear in the Syntax Machine were used in the S1, S2 and S3 Fortran compilers developed at AWRE Aldermaston for the IBM 7030 (Stretch) and later ported to the Ferranti Atlas II. However, by then, an operator precedence parser was used for syntax analysis.
Irons started his work on a syntax-directed compiler in 1959 while at Hanover for the summer.
PSYCO uses a bounded-context bottom-up parser with backup. That is at any stage in the recognition of a program, a finite set of previously recognised non-terminals and the next finite set of terminal systems will always define the action that the parser needs to take. PSYCO was mainly aimed at writing translators for Algol-like languages. The system started with a symbol table containing basic information and a parse of the Algol-like program collected declarative information about the variables used in the program. A second pass used this information to compile the executable part of the program.
Irons also recognised the need for handling errors in programs presented to the compiler.
The Brooker-Morris compiler-compiler (BMCC) has a diferent notation from either Glennie or Irons. The parser is a top-down recursive-descent parser with statements of the form:
FORMAT [SS] = [V] = [SAE]
It indicates that a source statement (SS) in the languge consists of a variable (V) followed by '=' followed by a signed arithmetic expression (SAE).
Associated with each FORMAT statement is a ROUTINE that defines the semantics of the statement:
ROUTINE [SS] = [V] = [SAE] LET [SAE] = [±?][T][±T*?] Ac = [±?][T] ....
The appropriate ROUTINE statement is passed the parse tree of the statement recognised. It can inspect the parse tree and decide what code should be generated.
The Compiler-Compiler effectively adds the definition of the language being defined to that of the Compiler-Compiler itself. Before releasing the final compiler, it is usual to remove the Compiler-Compiler's own statements from that of the language being compiled.
The Compiler-Compiler was used to produce Fortan II, Fortran IV, Algol, Extended Mercury Autode, Atlas Autocode, SOL, ACL, CPL, Elliott Autocode Mk III compilers for the Ferranti Atlas computer.
The early translator writing systems influenced a number of systems that followed soon after. We shall look at three of these and one other, TMG, that also comes later but appears to have not been influenced by the earlier systems.
John Reynolds' COGENT system was developed at the Argonne national Laboratory around 1962. COGENT stand for COmpiler and GENeralised Translator. The objective was to unify the concept of syntax-directed compilation with the more general but primitive concept of recursive list-processing. It is similar to BMCC in that it has two main constructs productions and generator definitions which correspond somewhat to FORMAT and ROUTINE in BMCC.
An example of a set of productions for a simple language is:
(LETTER) = A,B,C,D,E. (STRING) = (LETTER),(STRING)(LETTER). (VARIABLE) = (STRING). (FACTOR) = (VARIABLE). (FACTOR) = (()(POLYNOMIAL)()). (TERM) = (FACTOR). (TERM = (TERM)*(FACTOR). (POLYNOMIAL) = (TERM),+(TERM),-(TERM). (POLYNOMIAL) = (POLYNOMIAL)+(TERM) (POLYNOMIAL) = (POLYNOMIAL)-(TERM)
As in BMCC, several productions can be written as a single production:
(TERM) = (FACTOR),(TERM)*(FACTOR)
COGENT uses the production statements to construct a parser, similar to the Irons parser, that takes the input and generates a parse tree. By giving each recognised production a unique internal number, the parse tree can be very compactly represented as a list structure. This list structure can be manipulated by the generator definitions and eventually output the compiled code. COGENT was implemented on the Control Data 3600. A novelty of the implementation was that backup spawned a set of processes that performed the various alternatives in parallel.
Computer Associates of Massachusetts was in the business of writing compilers. Compiler Generator System (CGS) was their first attempt at a general purpose translator writing system aimed at producing efficient object code with optimisation provided at several places in the system. The basic approach was a table-driven top down syntax analyser. It ran on an IBM 7090, a Burroughs D-825, and a CDC 1604.
The generated compiler had five phases:
The Syntactic Analyzer, Generator, and Code Selector are driven by tables defined for a specific language and machine code using a BNF-like definition of each. The Syntactic Analyzer is used to generate the relevant tables.
TMG (stands for TransMoGrifier) was developed for the PDP-7 by R. M. McClure at Texas Instruments in 1975 and appears not to have been influenced by the earlier systems. The system was ported to the IBM 7040, IBM 7090, and CDC 1604. It was used at Bell Labs and also by the Multics Project. At Bell Labs, Dennis Ritchie used TMG to produce the compiler for B that later became C. McIlroy and Morris used TMG to write the EPL compiler for Multics. As an aside, Stephen Johnson named his system YACC because they already had TMG at Bell Labs! TMG was used to implement a subset of PL/I by Cal Tech.
TMG was aimed at constructing simple one-pass translators for some specialized language with almost no optimisation of code generated. It was defined in a language TMGL with a compiler for TMGL on the target machine. Later, the system was defined in itself.
The system is somewhat similar to the Glennie Syntax Machine in that the syntax specification of a program consists of sets of statements with a label, a syntactic specification, and a semantic action that takes place if the syntactic entity is recognised. The syntactic part consists of a set of actions with possibly two exit labels which give where to go next in the case of the syntactic entity being recognised or not. If the syntactic part is recognised, the semantic part is executed.
In the first 18 months TMG was used to define two different FORTRAN compilers, a logic simulation system, several data format converters, a geometric description language translator, and the TMGL compiler itself.
The system is not really aimed at handling structured languages like Algol 60.
The SIG/PLAN Working Group 1 on Syntax Directed Compilers met monthly in the conference room at the UCLA computing facility. John Backus attended one meeting to see how BNF was being used as a compiler writing language. Several separate attempts at defining a translator writing system came out of that working group and the ideas finally came together initially as a system called META but finally as META II.
META II was a simple translator writing system written by V Schorre et al in 1962.
In META II, the language for which a translator is required must be specified as a set of syntax rules. These very much resemble the BNF notation except that they contain code generation commands as part of the syntax definition. There is limited memory in META II so there is a need to output information soon after it has been recognised.
A restrictive top-down method of syntax analysis is used in the META II system. The restriction is that it does not allow back tracking or left recursion. The main goal is to match the complete program with the Syntax Rule whose name follows .META at the head of the program. The name of the Syntax Rule appears at the start of the rule followed by the = symbol. The right-hand side of this Syntax Rule indicates which entities have to be recognised to achieve this goal. The recogniser achieves its main goal by looking for these smaller entities, from left to right, as individual subgoals. These subgoals are themselves defined in terms of other entities, and so the process of recognising a complete program becomes a matter of looking for progressively smaller entities, right down to the level at which basic symbols such as characters and numbers are recognised.
A typical Meta II rule would be:
SET = 'C' $ ( 'D' / 'E' )
This recognises any string consisting of the letter C followed by any sequence involving the letters D and E.
Meta II rules can include .OUT statements that generate output code. Recognised basic symbols (.ID, .STRING, .NUMBER) are stored and can be output by .OUT(*). Also two unique labels can be output for each statement by writing *1 or *2.
Meta II is a very simple system that can be defined in itself. This is the complete definition:
.SYNTAX PROGRAM OUT1 = '*1' .OUT('GN1') / '*2' .OUT('GN2') / '*' .OUT('CI')/ .STRING .OUT('CL ' * ) ., OUTPUT = ('.OUT' '(' $OUT1')' / '.LABEL' .OUT('LB') OUT1 ) .OUT('OUT') ., EX3 = .ID .OUT('CLL' * ) / .STRING .OUT('TST' *) / '.ID' .OUT('ID') / '.NUMBER' .OUT('NUM') / '.STRING' .OUT('SR') / '(' EX1 ')' / '.EMPTY' .OUT('SET') / '$' .LABEL *1' EX3 .OUT('BT ' *1 ) .OUT( 'SET') ., EX2 = (EX3 .OUT('BF ' *1 ) / OUTPUT ) $(EX3 .OUT('BE') / OUTPUT) .LABEL *1 ., EX1 = EX2 $( '/' .OUT('BT ' *1) EX2) .LABEL *1 ., ST = .ID .LABEL * '=' EX1 '.,' .OUT('R') ., PROGRAM = '.SYNTAX' .ID .OUT('ADR' * ) $ ST '.END' .OUT('END') ., .END
After the initial promise of early translator writing systems, there was considerable interest in producing better systems. The early systems had shown that the approach was viable. The understanding of the whole area of parsing and what was possible had improved. A theory of syntax analysis was evolving. The number of new languages arriving was still on the increase. Computer systems were evolving.
In the period 1963-67, Bob Floyd from Computer Associates had moved to Carnegie-Mellon University. Jay Earley, a doctoral student, was working on a parsing algorithm, the most efficient general context-free algorithm known. Alan Perlis et al were defining an enhanced version of Algol, Formula Algol, which would incorporate the manipulation of mathematical formulae within the language Algol. There was a need for a Formula Algol compiler. Jerome Feldman had completed his thesis, A Formal Semantics for Computer Oriented Languages, in 1964. This defined a compiler-compiler, FSL, that was capable of translating into machine language most of the existing programming languages. It was, therefore, used to produce the Formaula Algol Compiler for the Bendix G-21. FSL had two main subsections. A Syntax Loader that builds tables which will control the recognition and parsing of programs in the required language. A Semantic Loader builds a table describing the semantics of the instructions in the required language. The Syntax Loader generated a Production Language (PL), based on the work of Bob Floyd, that scanned the current input stream against the PL rules and decided on the action to take. The Semantic Loader defined the meaning of each language statement in a machine-independent code.
FSL was a major input t0 the CABAL Compiler-Compiler system produced at Carnegie-Mellon University in the period 1967 onwards.
TGS (sometimes called TRANGEN), Translator Generator System, was produced at Computer Associates as a follow on to CGS. It ran on an IBM 7094, CDC-1604, Univac M460 and GE-635 and was used to write translators for PL/I, ALGOL, and FORTRAN IV. Each compiler is driven by a set of tables, TRANTAB, described in a language called TRANDIR. Synatx Analysis was done by a system based on Floyd Productions. The code generation system was also used in later versions of the META systems.
After a number of intermediate systems, Rulifson defined Tree-Meta as a major advance over the earlier META systems. The TREE-META program is in two parts. The first part is a definition of the syntax for the source language, L, for which a translator is required and this is largely inherited from Meta II. The major difference is that instead of outputting code, the output is a tree built up as the syntax proceeds. The second part defines a set of code generation rules that generate the object code to be produced from this tree. The tree itself is quite flexible in format and is defined by the TREE-META user as he defines the syntax analyser for the language. The code generator can take note of any peculiarities in the form of the tree, if that will lead to better code being produced.
An example of the syntax recognise is:
.META EXPRESSION EXPRESSION = TERM $ ( '+' TERM :ADD[2] / '-' TERM :SUB[2] ) ; TERM = FACTOR $ ( '*' FACTOR :MULT[2] / '/' FACTOR :DIV[2] ); FACTOR = '+' PRIMARY / '-' PRIMARY :MIN[1] / PRIMARY ; PRIMARY = .ID / .NUM / '(' EXPRESSION ')' ; .END
The construct ADD[2] generates an ADD tree node from the top two items on the stack. The code generator instruction would be of the form:
ADD[-,-] => *1 ' PLUS ' *2 ;
Separating code generation from the building of the parse tree means that quite efficient code can be generated.
Interest in Translator Writing Systems continued after the mid-1960s but the interest in producing a universal system capable of production-quality compilers decreased. This occurred for a number of reasons:
Author | Title | Publication | Relates to |
---|---|---|---|
Andrews D I, Rulifson J F | A Meta Compiler for the SDS 940 | Draft Dec 1967 | Tree-Meta |
Andrews D I, Rulifson J F | A Meta Compiler System for the SDS 940 | SRI December 1967 | Tree-Meta |
Barnet J A et al | The LISP 2 Programming Language and System | FJCC68 | Meta |
Bastian A L | A Phrase-Structure Language Translator | June 1962 | |
Birman A | The TMG Recognition Schema | 1970 | TMG |
Book E, Bratman H, Schwartz J | A One Pass JOVIAL Compiler | SDC, Jan 1963 | |
Book E, Schorre D V, Sherman S J | The CWIC/36O system, a Compiler for Writing and Implementing Compilers | SIGPLAN June 1970 | |
Brooker R A | The Autocode Programs Developed for the Manchester University Computers | BCS 1958 Vol 1 p15 | BMCC |
Brooker R A | Further Autocode Facilities for the Manchester (Mercury) Computer | BCS 1958, p124 | BMCC |
Brooker R A, Morris D | An Assembly Program for a Phrase Structure Language | BCS Vol 3 1960 p168 | BMCC |
Brooker R A | Some Techniques for Dealing with Two-Level Storage | BCS 1960 p189 | BMCC |
Brooker R A, Morris D | Some Proposals for the Realization of a Certain Assembly Program | BCS 1961 | BMCC |
Brooker R A, Morris D, Rohl J S | Trees and Routines | BCS Vol 5 p33 1962 | BMCC |
Brooker R A, Rohl J S, Clark S R | The main features of Atlas Autocode | BCS Vol 8 p303 1966 | BMCC |
Brooker R A, Morris D, Rohl J S | Compiler Compiler facilities in Atlas Autocode | BCS 1967 | BMCC |
Brooker R A, Clark S R | An index directed compiler | BCS 1967 | BMCC |
Brooker R A, Morris D, Rohl J S | Experience with the Compiler Compiler | BCS Vol 9 1967 p345 | BMCC |
Carr C S, Luther D A, Erdman Sherian | A Meta Compiler System for the Univac 1108 and the General Electric 645 | Tree-Meta | |
Cattell R G | A Survey and Critique of Some Models of Code Generation | CMU Nov 1977 | |
Cheatham T E, Sattley K | Syntax Directed Compilation | SJCC 1964 | TGS |
Cheetham T E, Christensen C, Floyd R W | Collected Research Papers | CA Nov 1965, TGS, AMBIT etc | CGS,TGS |
Cheetham T E, Christensen C | Research in Machine-Independent Software Programming | CA June 1968, CGS, TGS, AMBIT | CGS,TGS |
Conway | Design of a Separable Transition-Diagram Compiler | CACM July 1963 | |
Coulouris, George | The Compiler Compiler - Reflections of a User 50 Years On | 2013 | |
Dove R K | CABAL: environmental design of a compiler-compiler | Jan 1968 | Various |
Earley J | Formula Algol | CMU June 1967 | FSL |
Earley J | An Efficient Context-Free Parsing Algorithm | CACM Feb 1970 | FSL |
Engelbart D C, English W K, Rulifson J F | Development of a Multidisplay, Time-Shared Computer Facility and Computer-Augmented Management System Research | RADC Contract AF 30(602)-4103 SRI April 1968 | Tree-Meta |
Feldman J A | A Formal Semantics for Computer Oriented Languages | May 1964, Carnegie Institute | FSL |
Feldman J A | A Formal Semantics for Computer Languages and its Application in a Compiler-Compiler | CACM Jan 1966 | BMCC |
Feldman J, Gries D | Translator Writing Systems | CS 69 June 1967 | Various |
Feldman J, Gries D | Translator Writing Systems | CACM Feb 1968 | Various |
Floyd R W | A Descriptive Language for Symbol Manipulation | JACM Oct 1961 | |
Floyd R | The Syntax of Programming Languages-A Survey | IEEE Aug 64 | |
Garwick J V | GARGOYLE, A Language for Compiler Writing | CACM Jan 1964 | |
Gilbert P, McLellan W G | Compiler Generation Using Formal Specification of Procedure-Oriented and Machine Languages | RADC-TR-67-454 August 1967 | |
Glennie A E | Automatic Coding for the Mark 1 | December 1952, Knuth Papers | |
Glennie A E | On the Syntax Machine and the Construction of a Universal Compiler | Computation Center, Carnegie Institute of Technology, Technical Report No. 2 July 1960 | Syntax Machine |
Graham R M | Bounded Context Translation | SJCC 1964 | |
Hay R E, Rulifson J F | MOL940:Preliminary Specification for an Algol-like Machine-Oriented Language for the SDS-940 | March 1968 | Tree-Meta |
Hopgood F R A, Bell A G | The Atlas ALGOL preprocessor for non-standard dialects | BCS 1967 | BMCC |
Hopgood F R A | Compiling Techniques Chapter 10 | Compiler Compilers, 1969 | Various |
Irons E T | PSYCO: Princeton Syntax Compiler | Jan 1961 | PSYCO |
Irons E T | A Syntax Directed Compiler for ALGOL 60 | CACM jan 1961 | PSYCO |
Irons E T | An Error-Correcting Parse Algorithm | CACM Nov 1963 | PSYCO |
Iturriaga R, Standish T A, Krutar R A, Earley J C | Techniques and advantages of using the formal compiler writing system FSL to implement a Formula Algol compiler | SJCC 1966 | FSL |
Knuth D E, Pardo L T | The Early Development of Programming Languages | August 1976 | |
Ledley R S, Wilson J B | Automatic-Programming Language Translation Through Syntactical Analysis | CACM March 1962 | PSYCO |
McKeeman W M | An Approach to Computer Language Design | CS48 August 31 1966 | |
McClure R M | TMG-A Syntax Directed Compiler | ACM 1965 | TMG |
Mondshein L F | VITAL Compiler-Compiler SYSTEM Reference ManualL | Lincoln Lab TECHNICAL NOTE 1967-12, 8 February 1967 | VITAL |
Morris D, Rohl J S | The Atlas Compiler System | BCS 1967 | BMCC |
Naur et al | Report on the Algorithmic Language ALGOL 60 | CACM May 1960 | |
Newman W M | Interactive Graphics on a Small Microprogrammable Computer | QMC Sept 1973 | Tree-Meta |
O'Neil J T | META PI-An on-line interactive compiler-compiler | FJCC 1968 | Meta |
Oppenheim D K, Haggerty D P | META5: A Tool to manipulate string languages | ACM National 1966 | Meta |
Padua D | The Fortran I Compiler | ||
Perlis A J, Iturriaga R, Standish T A | A Definition of Formula Algol | March 1966 | FSL |
Pollack B W | An Annotated Bibliography on the Construction of Compilers | STAN-(X-249-71 DECEMBER 1971 | Various |
Presser L, Melkanoff M A | Software measurements and their influence upon machine language design | SJCC 1969 | Meta |
Reynolds J C | An Introduction to the COGENT Programming System | 1965 | COGENT |
Reynolds J C | COGENT Programming Manual | 1965 | COGENT |
Rosen S | A Compiler-Building System Developed by Brooker and Morris | CACM Volume 7/Number 7/July, 1964 | BMCC |
Rulifson J F | A TREE META for the XDS 940 | SRI April 1968 | Tree-Meta |
Schneider F W, Johnson G D | Meta-3: A Syntax-Directed Compiler Writing Compiler to Generate Efficient Code | 19th ACM national conference 1964 | Meta |
Schorre D V | META II: A Syntax-Oriented Compiler Writing Language | UCLA Computing Facility 1965 | Meta II |
Schorre/Rulifson | LOT Notes | 1966 | Tree-Meta |
Schorr, Herbert | Analytic differentiation using a syntax-directed compiler | BCS 1965 | PSYCO |
Scott D W | WIZOR, A Compiler Compiler for the GE 225 Computer | ACM 1962 | |
Shaw M, Fierst J | Formal Compiler-Descriptive Systems | CABAL 1967 | Various |
Trout R G | A compiler-compiler system | ACM 1967 | |
Turing A M, Brooker R A | Programmers Handbook for the Manchester Electronic Computer Mark 1 (3 editions) | 1950-3 | BMCC |
Warshall S | A Syntax Directed Generator | 1961 | CGS |
Warshall, Stephen | Summary of a Method for the Automatic Construction of Syntax Directed Compilers | Computer Associates, December 1962, AFCRL-62-955 | CGS |