Chilton::ACL::TREE-META Manual

3. TREE BUILDING

3.1 Syntax

ntest   ::= : identifier | : identifier [ integer ] | [ integer ] | + string | *

3.2 Trees

Syntax Rules, as well as defining the syntax of the source language, also define how the syntax tree is to be built for particular source programs. A TREE-META tree consists of a set of nodes, each having an alphameric string as its name. Each node has a number of branches which point to other nodes or to leaves of the tree. A node may be defined without any branches attached to it. The leaves of the tree consist of strings which are, in the main, parts of the input that have been recognised. However, the strings may have been added explicitly by the Syntax Rules. A typical TREE-META tree is:

Fig. 3.1: Typical TREE-META tree

One of the main features of TREE-META is the flexibility with which the tree can be generated.

3.3 The TREE-META Stack

There are two stacks in TREE-META. The first is a System Stack for storing return addresses when rules are called, and the second is the stack used in tree production. The phrase the stack will refer to the one used in tree building.

The stack is empty when the first Syntax Rule is called. As each basictype (.ID,.NUM etc) is recognised, the string of characters making up the entity is placed on the stack as a single entry. Facilities exist for adding other strings to the stack.

As the tree is being built, some entries at the top of the stack will be removed to make tree nodes. Pointers to the partially completed tree will be added to the stack. During the tree-building stage, the stack is, therefore, likely to hold a mixture of strings and pointers. The individual strings have a type field associated with them so that the way they are recognised can be deduced when code is being generated.

3.4 Adding Specific Strings to the Stack

Specific strings may be added to the stack by inserting the string 1n a Syntax Rule preceded by the symbol ↑. For example:

.META XYZ
XYZ = .ID ↑ 'PLUS' '+' . NUM ;

would recognise the input: ALPHA + 17. The contents of the stack, once the number has been recognised, would be:

    17
   PLUS
  ALPHA

Inserting specific strings may be useful in controlling the code to be generated.

3.5 Tree-building Commands

For each tree node that is generated, the user must specify the name of the node and the number of branches to be attached to it. The name of the node must be an alphanumeric string starting with an alphabetic character. It is defined in a Syntax Rule by writing it preceded by the symbol.. The number of branches to be associated with the node is defined by enclosing this number in square brackets. For example:

:MULT[3]

appearing in a Syntax Rule would indicate that a node, whose name is MULT, should be generated at this point with 3 branches attached to it. The nodes to be attached to these branches will be the contents of the top 3 stack positions. The right-most branch of the tree node will point to the top stack entry. The result of this tree-building operation will be to remove the top 3 items from the stack and replace them by a single pointer to the node that has just been generated. If the entries used from the stack are strings, the branches will point to these items as leaves of the tree. If an entry is a pointer to a tree node that has been generated previously, the branch from the new node will point to this earlier one.

A suitable syntax tree for arithmetic expressions could be generated by the following program:

.META EXPRESSION 
EXPRESSION = TERM $ ( '+' TERM :ADD[2] / '-' TERM :SUB[2] ) ;
TERM = FACTOR $ ( '*' FACTOR :MULT[2] / '/' FACTOR :DIV[2] );
FACTOR = '+' PRIMARY / '-' PRIMARY :MIN[1] / PRIMARY ;
PRIMARY = .ID / .NUM / '(' EXPRESSION ')' ;
.END

Inputting the expression X+Y*Z would generate the tree:

Fig. 3.2: Tree generated for X+Y*Z

This can be seen by working through the example in detail:

EXPRESSION is called.
EXPRESSION calls TERM which, in turn, calls FACTOR.
The input does not match either the + or - symbol and, therefore, the first two alternatives of FACTOR are rejected. The third alternative calls PRIMARY.
PRIMARY accepts the input character X as a .ID and stores it on the stack.
PRIMARY returns to FACTOR which returns to TERM.
As the next character on the input is neither a * or a / symbol, a TERM has been recognised and so TERM returns to EXPRESSION.
EXPRESSION accepts the + symbol on the input.
The items (2) to (5) are repeated to recognise Y as a .ID which is placed on the stack.
TERM accepts the * symbol on the input and calls FACTOR.
The input symbol Z is recognised as a .ID and returns control to TERM having recognised a FACTOR. At this stage, the stack contains:
```
    Z
    Y
    X
```
and :MULT[2] is obeyed which produces:
```
   pointer to tree for MULT node with children Y and Z
     X
```
As no * symbol follows, the second TERM in EXPRESSION has been recognised and :ADD[2] will then generate a tree node from the top two stack entries. These will be removed and replaced by a pointer to this new tree for the complete expression.

3.6 Specifying Name and Branches of a Node Separately

It is usual for the two parts of the node-defining command to be written together. However, it is possible for the two parts to be separated by other commands, as long as the part which defines the name of the node precedes the part which defines the number of branches. The node itself is not generated until the second part is obeyed. The user should ensure that no other node-building commands are present in between these two parts. For example:

A =.ID  :MULT .ID :XYZ[1][2] ;

is not allowed. An example of how this separation of parts might be used is as follows:

TERM = FACTOR $ (( '*' FACTOR :MULT / '/' FACTOR :DIV ) [2] );

This has the same effect as the definition of TERM in Section 3.5.

3.7 Backtracking

If backtracking is specified in a Syntax Rule, not only will the input pointer be returned to its original position if a test produces a false result, but also the stack and the partially completed tree will be returned to their original states.

3.8 Complete Trees

Tree-building continues until the user decides that sufficient input has been processed for meaningful code to be generated. The appearance of the symbol * in a Syntax Rule will cause syntax analysis to stop and code generation to start. The first Code Rule called is the one whose name is the same as the top tree node of the tree pointed at by the top stack entry. Usually there will be a single stack entry which points to the complete tree that has been generated.

The user must define the frequency with which the tree is decoded. If too much input is analysed, the tree will be large and may not fit into the available storage. If too little input is analysed, the tree may be too small to manipulate to allow good code to be generated. In general, input is processed on a statement at a time basis. For example:

.META PROG
PROG = '.BEGIN' $ ( STATEMENT :ST[1] ';' * ) '.END' ;

defines a language consisting of a number of statements surrounded by .BEGIN and .END. As each statement is recognised, the appropriate tree is processed to generate code. In the example, the Code Rule with name ST is called.

It is possible for a number of trees to be left in the stack when code generation is initiated. In this case, the individual trees will be processed starting from the top of the stack and working downwards. This is not a particularly useful feature. It is mainly added as a debugging aid so that trees left in the stack by accident will be noticed. The user is recommended to pass only a single tree to the code generation phase.

The symbol * should only appear in the main Syntax Rule (the one whose name appears after .META). If it appears in any other rule, it is possible for the system to be incorrectly set up after the code generation.