Contact us Heritage collections Image license terms
HOME ACL Associates Technology Literature Applications Society Software revisited
Further reading □ OverviewComputing at HarwellBTM 1202Glennie: Syntax MachineHowlett: OrbitIrons: Syntax Directed CompilerSchorre Meta IIHowlett: ACLGill: Atlas conceptsSumner et al: Atlas CCUAngus: Computers in scienceGood: GoBell: KalahBond: CarnegieQuatse: G-21Baylis: Eng AssistantAnderson: UAIDE 68Ogborn: Change and chanceHopgood: Hash overflowUSA Visit 1965Bowden talkChurchhouse: All-purpose LabUSA Visit 1969USA Visit 1970USA Visit 1971Hash tablesBell: HimmellbettHayward: Computerised StudioChurchhouse: Arts and SciencesHowlett: Charles BabbageHopgood: Teaching toolUSA Visit 1975Yasaki: IBM StretchNash: IBM StretchFORTRAN comparative studyOPSCANWichmann: Algol compilersGlennie: Electronic computers at AWRE Aldermaston
ACD C&A INF CCD CISD Archives Contact us Heritage archives Image license terms

Search

   
ACLLiteratureReports
ACLLiteratureReports
ACL ACD C&A INF CCD CISD Archives
Further reading

OverviewComputing at HarwellBTM 1202Glennie: Syntax MachineHowlett: OrbitIrons: Syntax Directed CompilerSchorre Meta IIHowlett: ACLGill: Atlas conceptsSumner et al: Atlas CCUAngus: Computers in scienceGood: GoBell: KalahBond: CarnegieQuatse: G-21Baylis: Eng AssistantAnderson: UAIDE 68Ogborn: Change and chanceHopgood: Hash overflowUSA Visit 1965Bowden talkChurchhouse: All-purpose LabUSA Visit 1969USA Visit 1970USA Visit 1971Hash tablesBell: HimmellbettHayward: Computerised StudioChurchhouse: Arts and SciencesHowlett: Charles BabbageHopgood: Teaching toolUSA Visit 1975Yasaki: IBM StretchNash: IBM StretchFORTRAN comparative studyOPSCANWichmann: Algol compilersGlennie: Electronic computers at AWRE Aldermaston

The Structure and Use of the Syntax Directed Compiler

Edgar T Irons

Annual Review in Automatic Programming, Vol. 3 (1963).

Communications Research Division, Institute for Defense Analyses, Princeton, N.J.

A COMPILER is generally understood to be a program operating on a digital computer which translates one (object) language into another (target) language. To serve in this capacity, the compiler must contain a set of rules for recognizing the structure of strings in the object language and for performing the translation. In many compilers this set of rules is inextricably intertwined with the program which performs the recognition and translation.

This report describes the structure and use of a compiling system in which the translator is independent of the translation rules and hence is independent of either the object or target language. Part 1 gives the meta language in which the translation rules are expressed. Part 2 contains a set of examples illustrating the use of the meta language to specify translations of an algorithmic language into the language of a hypothetical machine. Part 3 describes the recognition procedure which is the heart of the language independent translator.

THE META LANGUAGE

The translation rules consist of a series of sentences, each one consisting of a syntax formula followed by a string of symbols designating the semantics of that syntax formula. The sentences have the following form:

Each sentence of the specifications then has the form:

SSSS...SS      →     S         {PPPPPPPP . .. P} 
Components	  Subject           Definition

The syntax unit S following the meta-symbol → in any sentence is the subject of the sentence, and the syntax units to the left of → are the components of the sentence. The string PPPP ... PP between the meta-symbols { and } is the definition of the sentence. Specifically, P may have one of the following three forms:

1. Any symbol (p) of an output language. The output language alphabet may contain any symbols, but when that alphabet does contain the symbols

{ ρ ' φ [ ] ; }

special conventions will hold in the cases described below.

2. Any output string designator of the form

ρ "...'n [p ← PP...P;p ← PP... P;... ;p ← PP...P]
where  "...'  may be empty
everything after n may be empty

where P and p are defined as above, and n is an integer designating a particular string.

If a string designator (2) is of the simple form ρn it denotes the string which is the definition of the sentence whose subject is the nth component to the left of → in the sentence containing the designator ρn. If the string designator is of the form

ρn[p ← PP...P;...]

it denotes the same string but with substitutions made as indicated; namely, with the symbol p replaced at every occurrence by the symbols PP... PP, these substitutions being made one after the other from left to right. Examples are given below.

3. An output string function designator of the form

φ "...' n [PPP...P;PP...P;...;PP...P]
where 
"...' may be empty
[PPP...P;PP...P;...;PP...P] may be empty

where P is defined as above, and n is an integer designating a particular string function.

The output function designator φ, of (3), is used to specify a function of the strings PPPPP ... PP enclosed between the brackets following the function designator. The integer n serves to identify a particular function which is relevant to some particular set of syntactic sentences. They serve to enhance the descriptive ability of the output language, and constitute part of the description of an input language.

Consider as an example of the use of string designators the following five sentences specifying a translation of an input string consisting of some series containing the letters a and b to an output string composed of the letters A, B, c and y, t, m.

a → letter {A x}
b → letter {B t}
letter → iden {ρ1}
iden letter → iden {ρ2ρ1 [t ←m]}
iden → simvar {ρ1 [x ← y ]}

The diagram of the input string

babaa 

is shown on the following page.

b letter a letter b letter a letter a letter iden iden iden iden iden simvar

and the meaning of the final syntactic unit 'simvar' is

BtAyBmAyAy

Note that the above diagram is unique under the following two conventions (which we will assume throughout the report):

  1. That the diagram encompass the entire string.
  2. That as many brackets be drawn as possible (e.g. we cannot leave off the bracket for 'simvar' above).

An example of a function φ is one whose value is a string of the characters 0123456789, concatenated to represent the number of symbols in the parameter string of the function on any use. If we identify this function by the integer 1 and change the last syntactic statement to

iden → simvar {φ11 ]}

the meaning of the string of our example would now be merely the characters

10 

The meta symbol ' serves as a left metaparenthesis counter to allow the output language to contain the metasymbols of the description, so that an input language may be described in terms of – and hence translated into – the meta language. This convention enables a translator to modify the set of translation specifications it is currently using according to the particular input string it is examining. This enabling convention depends on the use of the symbols { and } as meta parentheses. If in any definition of a syntactic sentence the number of ' following any occurrence of ρ or φ is not equal to (the number of {s to the left) – 1 then the symbol ρ or φ and its associated symbols

[ ; ← ] 

will be treated as symbols of type 1 rather than in the way described above. The meta symbols { and } are always treated as symbols of type 1, when they occur in the string of a definition. If the last sentence of the descriptions of the example were changed to

iden → simvar {ρ1 → realtype {ρ1'[x ← y] φ1}}

the translation of the string babaa would be:

BtAxBmAxAx → realtype {BtAyBmAyAyφ1}

SOME TYPICAL META LINGUISTIC CONSTRUCTIONS FOR AN ALGEBRAIC COMPILER

In the following examples, the object languages are taken from sections of ALGOL 60 and similar algebraic languages. The target language is the symbolic language of a simple, single address machine. The format for a machine instruction is:

III...I:OOO-III...I;    .

The strings of I's represent identifiers to indefinite length, each composed of strings of letter or digits, with the first character a letter. The O's represent three letter mnemonic machine instructions from the following list:

LDA m	Bring contents of m to accumulator
ADD m	Add contents of m to accumulator (floating point)
SUB m	Subtract contents of m from accumulator (floating point)
MPY m	Multiply contents of m by contents of accumulator (floating point)
DIV m	Divide contents of accumulator by content of m (floating point)
STA m	Store the contents of accumulator in m
GEJ m	Transfer to m if accumulator ≥ 0
LEJ m	Transfer to m if accumulator ≤ 0
UEJ m	Transfer to m if accumulator ≠ 0
JMP m	Transfer unconditionally to m        
RND	    Round the contents of the accumulator

A possible sequence of instructions is:

LDA-APG; SUB-APX;GEJ-G0101 ;LDA-APX;G0101:STA-APQ; 

Example 1: Simple arithmetic expressions

The following set of statements serves to specify a translation of simple arithmetic expressions into machine language. The expressions allowed are a subset of the ALGOL arithmetic expressions, and the syntax describing them is a direct subset of that used in the ALGOL report.

 (1) A → letter {A}
 (2) B → letter {B}
(26) Z → letter {Z}
(27) letter → iden {ρ1}
(28) iden letter → iden {ρ2ρ1}
(29) × → multop {MPY}
(30) / → multop {DIV}
(31) + → multop {ADD}
(32) - → multop {SUB}
(33) iden → primary {LDA - ρ1}
(34) (arithex) → primary {ρ2}
(35) primary → term {ρ1}
(36) term multop primary → term
(37) {ρ1;STA - t;ρ3[t - ti];ρ2 - t} term → termsum {ρ1}
(38) termsum addop term → termsum {ρ1;STA - t; ρ3[t ← ti];ρ2 - t}
(39) termsum → arithex {ρ1}

would be diagrammed

A 1 letter B 3 letter + 8 adop ( C 9 letter - 14 adop D ) 15 letter × 23 multop B 24 letter 2 iden 10 iden 16 iden 25 iden 4 iden 11 primary 17 primary 26 primary 5 primary 12 term 18 term 6 term 13 termsum 7 termsum 19 termsum 20 arithex 21 primary 22 term 27 term 28 termsum

The output code associated with each of the brackets above is:

1,2           A
3             B
4             AB
5,6,7         LDA - AB
8             ADD
9,10          C
11,12,13      LDA - C
14            SUB
15,16,17,18   LDA - D
19,20,21,22   LDA - D;STA - t;LDA - C;SUB - t
23            MPY
24,25         B
26            LDA - B
27            LDA - B;STA - t; LDA - D; STA - ti; LDA - C; SUB - ti; MPY - t
28            LDA - B;STA - t; LDA - D; STA - ti; LDA - C; SUB - ti; MPY - t; LDA - AB; ADD - t

The final translation of the arithmetic expression is attached to the outermost bracket (28). It is assumed that temporary storage cells are labelled t, ti, etc.

Although this translation does indeed produce machine code which calculates the value of the arithmetic expression, it is clear that this code is unnecessarily long. The quality of the translation is poor because we have specified the syntax of the object language in such a way as to minimize the number of specification statements. For purposes of defining the language (in terms of our machine language) this indeed is desirable. On the other hand, in keeping the number of statements to a minimum we have restricted the number of output code groups (definitions) which can appear in a translation. Thus we have not been able to specify better translations in some frequently occurring special configurations of the object string.

For example, if the termsum and term in Statement 38 were respectively

A*B   and   C*D

then our way of putting together their translations in the definition of 38 would produce a good translation, but if they were simply identifiers, say

A    and   B 

we would rather produce

LDA A; ADD B

The above syntax does not permit this output, since in Statement 38 we do not know the detailed composition of the term or termsum.

Example 2: More efficient output code

To avoid the difficulty of Example 1, we can arrange the syntax so that a simple identifier cannot compose a term or termsum all by itself and then specify the syntactic structure for addition and multiplication of identifiers separately; in so doing, we recognize separately the special cases with which we wish to deal. We change the specifications by deleting Statement 33 from the specifications of Example 1 (so that an identifier may not be a primary, term, or termsum) and add the following specifications (to specify the syntax and semantics of adding and multiplying identifiers) :

(36.1) iden multop iden → term {LDA - ρ32 - ρ1}
(36.2) term multop iden → term {ρ32 - ρ1}
(36.3) iden × primary → term {ρ1;MPY - ρ3}
(36.4) iden / primary → term {ρ1;STA - t;LDA - ρ3;DIV - t}
(38.1) iden addop iden → termsum {LDA - ρ32 - ρ1}
(38.2) termsum addop iden → termsum {ρ32 - ρ1}
(38.3) iden + term → termsum {ρ12 - ρ3}
(38.4) iden - term → termsum {ρ1;STA - t;LDA -  ρ3;SUB - t}
(39.1) iden → arithex {LDA - ρ1}

With the expanded set of specifications, we can produce a considerably better translation of the string

AB + (C - D)*B 

The string is diagrammed.

A 1 letter B 3 letter + ( C 5 letter - 7 adop D ) 8 letter × 14 multop B 15 letter 2 iden 6 iden 9 iden 16 iden 4 iden 10 termsum 11 arithex 12 primary 13 term 17 term 18 termsum

The output code for each bracket is :

1,2             A 
3               B
4               AB
5,6             C
7               SUB
8,9             D
10,11,12,13     LDA - C; SUB - D
14              MPY
15,16           B
17              LDA - C;SUB - D; MPY - B
18              LDA - C;SUB - D; MPY - B; ADD - AB

In constructing the specifications for this example, we have taken advantage of the commutativity of addition and multiplication.

Example 3: A pair of useful definition functions

The specification statements of this example show the use of tag generation functions in specifying a translation of control statements of an object language to machine language. The functions used are:

1. The generation function φ1 has as its value a string consisting of the letter L followed by two digits. The digits are generated in such a manner that each generated string is unique in any translation (unless there are more than 99 occurrences of φ1 in a translation).

2. The retrieving function φ2[n] has as its value the string identical to that generated by φ1 in its nth occurrence before the occurrence of φ2[n] within one definition.

For example the definition

1φ1ρ1φ2[1]φ2[2]}

would yield the string

L01 L02 XXX . . . X L02 L01

where the X's represent the string produced by the ρ1 string designator. Note that whether or not the string

XXX ... X

contains any strings generated by φ1, the string given above is the same.

The following specifications give the translation rules for a subset of ALGOL conditional expressions (arithmetic expressions are specified as in Example 1 or 2):

< → relop {GEJ}
> → relop {LEJ}
≠ → relop {UEJ}
iden relop iden → bprimary  {LDA - ρ3;SUB - ρ1; ρ2 - j}
bprimary → bterm {ρ31}
bterm → bsum {ρ1}
bsum ∨ bterm → bsum {ρ3;[j ← φ1];JMP - φ12[2]:ρ12[1]:}
bsum → boolex {ρ1}
if boolex then arithex else arithex → arithex {ρ5[j ← φ1];ρ3;JMP - φ12[2]:ρ12[1]:}

Using these specifications, the expression

if X < Y ∨ W > Z then A else B

is diagrammed and translated as follows:

if X 1 letter < 3 relop Y 4 letter W 10 letter > 12 relop Z 13 letter then A 19 letter else B 22 letter 2 iden 5 iden 11 iden 14 iden 20 iden 23 iden 6 bprimary 15 bprimary 21 arithex 24 arithex 7 bterm 16 bterm 8 bsum 17 bsum 18 boolex 25 arithexex
1,2               X
3                 GEJ
4,5               Y
6,7,8             LDA - X;SUB - Y;GEJ - j
10,11             W
12                LEJ
13,14             Z
15,16             LDA - W;SUB - Z; LEJ - j
17,18             LDA - X;SUB - Y;GEJ - L21; JMP - L22
                  L21:LDA - W; SUB - Z;LEJ - j;L22:
19,20             A
21                LDA - A
22,23             B
24                LDA - B
25                LDA - X; SUB - Y; GEJ - L21; JMP - L22;
                  L21:LDA - W;SUB - Z;LEJ - L11;
                  L22:LDA - A;JMP-L12;
                  L11:LDA - B;
                  L12

Example 4: Dealing with declarative information

In some of the object languages with which we are concerned, information is given in the form of declarative statements which affect the interpretation of sections of the object language strings. For example, we might wish the expression

A + B

to specify ordinary addition of A to B or complex addition of A to B depending on whether or not there occurred elsewhere in the same object string the declarative statements

real A, B 
or	
complex A, B

To cope with such languages, we adopt a multi-pass translating system where the earlier passes translate the declarative statements into syntactic specification which are appended to partially completed specifications for later passes.

As an example of this system consider an object language in which a complete string in the language consists of a list of declarative statements followed by a list of non-declarative statements. For simplicity we will restrict the declarations to those declaring variables to be of type real or of type integer. The effect of an integer declaration is to cause rounding of the value of an arithmetic expression in a statement of the form

identifier = arithex

If the variable has been declared real, the value of the arithmetic expression is not rounded.

The following specifications (for the first pass) give translation rules for producing syntax statements from the declarations, while ignoring the rest of the string.

A → letter {A}
B → letter {B}
Z → letter {Z}
× → glot {}
- → glot {}
and so on for all characters other than integer, real, end
letter → glot {}
letter → iden {ρ1}
iden letter → iden {ρ2ρ1}
integer iden → glot {ρ1 → intvar{ρ1'}}
real iden → glot {ρ1realvar1'}}
glot → mglot {ρ2ρ1}
mglot end → program 1 {ρ2}

The translation produced from the string

            real X ; integer Y ; Y = X end
         

is

real X 1 letter ; 5 glot integer Y 7 letter ; 11 glot Y 13 letter = 16 glot X end 18 letter 2 iden 8 iden 14 glot 19 glot 3 glot 9 glot 4 mglot 6 mglot 10 mglot 12 mglot 15 mglot 17 mglot 20 mglot 21 program 1
1,2               X
3,4,6             X → realvar{X}
7,8               Y
9                 Y → intvar{Y}
10,12,15,17,20,21 X → realvar{X}  Y → intvar{Y}
13                Y
18                X
5,11,14,16,19

To illustrate the effect of the first pass on the second, we add the following specifications to those used in Example 2:

realvar = arithex → statement  {ρ1 ; STA - ρ3}
intvar   = arithex → statement {ρ1 ; RND - ; STA – ρ3}
real realvar → declaration { }
integer intvar → declaration { }
declaration → declist { }
declist ; declaration → declist { }
statement → statlist (ρ1
statlist ; statement → statlist (ρ3; ρ1)
declist ; statlist end → program 2 {ρ2}
realvar → iden {ρ1}

These specifications together with those produced by the first pass translation above, namely:

X → realvar {X} 
Y → intvar   {Y}

comprise the complete specifications for the second translating pass for the string

            real X ; integer Y ; Y = X end
         

The translating diagram and the resulting translation are:

real X 1 realvar ; integer Y 4 intvar ; Y 7 intvar = X end 8 realvar 2 declaration 5 dec 9 iden 3 declist 10 arithex 6 declist 11 statement 12 statlist 13 program 2
1	          X	
4	          Y	
2,3,5,6		
7	          Y	
8,9	          X	
10	          LDA - X
11, 12, 13    LDA – X ; RND – ; STA – Y

Example 6: Specification which admit non-unique diagramming

In all of the examples given thus far, the specifications have been written in such a way that it is possible to diagram any legal object string in only one way. It is useful in some cases to construct specifications which allow more than one diagram to be drawn on the same string if a convention is adopted to dictate which of the possible diagrams is to be drawn during a translation. A natural convention is that in drawing the diagram, the first specifications in a list are used first.

One application of this type of specifications is in producing error indications during a translation when an illegal object string is given. We construct specifications for a larger language which includes, as a subset, the language in which we are interested – the larger language consisting of all possible strings in the alphabet of the sub-language. Hence every possible symbol string will be translated into something. In particular, strings not in the sub-language will be translated into text, thereby indicating that parts of the object string are outside the sub-language. In this way, one achieves a translation of strings in the sub-language as before but, in addition, one obtains diagnostic information about illegal strings where they occur rather than just an indication that the whole string is illegal (i.e. cannot be diagrammed).

The following set of specifications, when appended to the end of the specifications for Example 2 illustrate the discussion above:

letter → set 1 {ρ1}
digit → set 1  {ρ1}
set 1 → set 2  {ρ1]
+     → set 2  {+}
-     → set 2  {-}
×     → set 2  {×}	
/     → set    {/}	
set 2 → set 3 {ρ1}
( → set 3 {(}
) → set 3 {)}
set 1 → mset 1 {ρ1}
mset 1 set 1 → mset 1 {ρ2ρ1}
set 2 → mset 2 (ρ1}
mset 2 set 2 → mset 2 {ρ2ρ1}
set 3 → mset 3
mset 3 set 3 → mset 3 {ρ2ρ1}
mset 1 → primary {ERROR - ρ1}
( mset 2 ) → primary {ERROR – ρ2} 
mset 3 → arithex {ERROR - ρ1}

The alphabet here consists of the symbols which may occur in arithmetic expressions, namely the letters, and + - */ ( ). Using this set of specifications the string

A +  B (B *)

which is not an arithmetic expression would be diagrammed and translated as follows:

A 1 letter + ( B 4 letter × 8 set2 ) 2 iden 5 set 1 6 set 2 7 mset 2 9 mset 2 10 primary 11 term 12 termsum
1, 2	     A
4,5,6,7	     B      
8            ×
9            B×
10, 11	     ERROR - (B× )
12           ERROR - (B× ); ADD – A         

THE TRANSLATOR

The translator is a program which operates on a set of specifications already described and on a string of symbols in the object language for those specifications. It produces a string of symbols in the output language. The translator itself is completely independent of either language and may operate on any object language which can be described by the specifications of Part 1.

The heart of the translator is a program which diagrams a string of input symbols by referencing the specifications. The output of this program is a linked list connecting selected definitions together.

In the operation of the translator, this output list serves as input to a second program which forms the output string from the indicated sequences of definitions.

The diagramming program, given below in ALGOL, is programmed as a recursive procedure. Essentially, given the name of a syntactic unit and the index of a symbol in the input string, the program will try to form the syntactic unit from the longest possible string of symbols following the one indicated. If it is possible to form the requested syntactic unit, the program will place in its output string a linked list indicating the definitions of the syntactic units which compose the requested syntactic unit and supply as an output parameter its location in the output string of this list. If it is not possible to form the requested syntactic unit from the indicated string, the failure is reported.

In the machine implementation of this ALGOL program the translation specifications are stored in a semi-linked list, represented in the ALGOL program by the three vectors STAB, STC, and TRAN. This list is constructed in the memory of the machine from the string of symbols which are the specifications. In the ALGOL program, as in the machine representation, the syntactic units and symbols of both languages are represented by integers.

In the tabling of the translation specifications, the numerical representation of the leftmost syntactic unit of any sentence is taken as the index of the integer vector THAN. The value of any element of TRAN is the index of the element of STAB which is the syntactic unit following the first in some sentence. If the first syntactic unit is the leftmost one of more than one sentence, the value of STC [TRAN [syntactic unit]] is the index of the syntactic unit following the first in its second occurrence. The linkages are continued from the second element in the same way. The definition follows the syntactic unit which is the subject of the sentence. In the tabling, the subject is treated in the same manner as the components, the fact that it is the subject being indicated by the brace which follows it. The symbol → is ignored. The following example illustrates the composition of the three vectors. The five sentences:

SVAR + SVAR → TSUM {LDA ρ1} 
SVAR + TERM → TSUM {ADD ρ2} 
SVAR – SVAR → TSUM {...} 
SVAR – TERM → TSUM {...} 
SVAR – TRIM → TSUM {...}

would be linked as follows:

SVAR + SVAR TSUM {LDA ρ1 } TERM TSUM {...} - SVAR TSUM {...} TERM TSUM {...} TRIM TSUM {...}

Thus SVAR may be followed by + or while + may be followed by TERM or SVAR, etc. This same information is stored in the machine representation by the three vectors STC, STAB, and TRAN. STAB consists of elements of the tree ordered by exhausting one horizontal line, then adding the next branch from the last junction passed, etc. STC lists, in the position corresponding to one element of the tree, the location of the next alternate element which could have been reached from the junction which led to this element. TRAN has one entry for each syntactic name. The value of the entry is the index of STC and STAB where the tree for that name begins. The composition of the vectors for the above example would be:

          TRAN                  STC                    STAB
          
SVAR       I1         I1         I4         I1           +
TERM                  I2         I3         I2         SVAR
                                  0                    TSUM
 +                                0                      {
                                  0                      }
 -                    I3          0         I3         TERM
                                  0                    TSUM
                                  0                      {
                                  0                      }
                      I4          0         I4           -
                      I5         I6         I5         SVAR
                                  0                    TSUM
                                  0                      {
                                  0                      }
                      I6         I7         I6         TERM
                                  0                    TSUM
                                  0                      {
                                  0                      }
                      I7          0         I7         TRIM
                                  0                    TSUM
                                  0                      {
                                  0                      :
                                  0                      }

where In is the integer which is the index of the adjacent component.

In order to determine when indeed the longest string of input symbols meeting the requirements of the requested syntactic unit have been found, the diagramming routine makes use of a precedence matrix. This matrix, SUCGR, is constructed in its elementary form while the syntax tables are constructed, and is then extended to form the complete matrix. If SUCCR [p, q] is true, then syntactic unit p is the first element of a sentence whose subject is either q or whose subject is the first element of a sentence whose subject is either q or ... and so on.

STAB, STC, TRAN, and SUCCR are considered to be global to the procedure DIAGRAM. Other global parameters of DIAGRAM are the vectors INPUT and OUTPUT and their indices, j and k respectively, these vectors being the input string of symbols and the output string.

The output string consists of positive and negative integers. If an element is positive, it is the index of an element of STAB which begins a definition in some sentence. Each such positive element will be followed by n – 1 negative integers, these being the negative of indices of other positive elements of the output string which are in turn links to other definitions. n is the number of components of the sentence in question, and the jth element of OUTPUT after any positive element is the link to the definition of the jth component to the left of the subject of the sentence, whose definition begins in STAB at the spot marked by the positive element. This string of integers is then very much like an assembler macro notation, and in fact is translated to the final output string in a quite similar manner. Of course the symbol substitution and invocation of compiling functions of the definitions must be done as the integer string is unravelled.

In the operation of DIAGRAM, the parameter i marks the spot in STAB which is currently of interest. DIAGRAM first examines all the components of sentences following the component discovered one level up in the recursion to determine whether the elements at the current location in the input string will form any of these components. If the input string meets the requirements of one of the components, the successors of this component are specified as the ones to be examined in the next call of DIAGRAM. If it does not, DIAGRAM then specifies–in order–the successors of the subjects following the component discovered on the last level of recursion as the next components to be considered, until either one of the subjects leads to a correct path, or until the list of subjects is exhausted. If the list is exhausted before a correct path is found, exit is made through ERROR to pass back the information that the path in question did not lead to success.

Since each step forward through the syntax table causes another recursion of DIAGRAM, it is not possible to eliminate a path through the table until all possible (according to the input string) paths have been examined, or until the requested syntactic unit has been formed. Hence if the paths from one structure to another in the table are unique (that is, if the syntax allows a unique diagramming of any input string) the sentence may be tabled in any order. If the paths are not unique, DIAGRAM gives preference to the first sentences in the table. Note, however, that the order of the sentences may have some effect on the efficiency of the diagramming process.

Other local parameters of DIAGRAM are sw, a Boolean variable which indicates that a syntactic unit has been discovered (though not necessarily that it encompasses the longest possible set of elements in the input string), and the constant LEFTBRACE ( { ), which indicates that the syntactic unit preceding it is a subject. (The meta symbol → is ignored in the tabulation.)

procedure DIAGRAM (i, GOAL, PARAM, ERROR) ;
  value i, GOAL; 
  integer PARAM; 
  label ERROR; 
  comment i is the starting position in the syntax table STAB. 
          GOAL is the requested syntax unit. 
          If this unit is discovered, the index of the appropriate definition string is placed 
          in the output string, and the negative index of the output string is assigned to PARAM. 
          If not the procedure exits via ERROR;
          
  begin 
  integer J, K, I, OTCEL; 
  Boolean sw;
  J := j;
  K := k;
  I := i;
  START: if STAB [i + 1] ≠ LEFTBRACE then
   begin
     j := j+1;
     sw := if INPUT [j] = STAB [i] then true else false; 
     if SUCCR [INPUT [j] , STAB [i]] then 
     begin 
       DIAGRAM (TRAN [INPUT [j]] , STAB [i] , OTCEL, NOGO) ;
       go to CONTINUE 
     end 
     else 
     begin 
       NOGO : if sw then go to CONTINUE
              else 
              begin 
                RETRACE: j = J 
                k:= K 
              end 
     end 
  end START;
  
  i := STC [i];	
  if i ≠ 0 then go to START;
  i := I;	
  NEWSTART: if STAB [i+1] = LEFTBRACE then 
    begin 
      OTCEL := i + 2;
      if STAB [i] = GOAL then begin PARAM := –k; sw := true end 
      else sw := false; 
      if SUCCR [STAB [i] , GOAL] then 
      begin 
        DIAGRAM (TRAN [STAB [i]] , GOAL, PARAM, NOPATH) ;
        go to FOUND 
      end                                        
      else NOPATH : if sw then go to FOUND 
    end NEWSTART ;
 i := STC[i];
 if i ≠ 0 then go to NEWSTART;         
 k:= K;	.                               
 go to ERROR;	 
 CONTINUE: DIAGRAM (i+1, GOAL, PARAM, RETRACE);	
  FOUND: OUTPUT [k] := OTCEL;
  k:=k+1
end
⇑ Top of page
© Chilton Computing and UKRI Science and Technology Facilities Council webmaster@chilton-computing.org.uk
Our thanks to UKRI Science and Technology Facilities Council for hosting this site