Chilton::ACL::Games Playing with Computers

Chapter 4: Heuristics

INRTRODUCTION

Consider the problem of choosing a move in a game of full information. The decision can be made by examining the result (immediate and remote) of making each of the possible moves, the replies thus opened up to the opponent, the possible replies to these moves, etc. This tree of possibilities is generated by the two-move mate algorithm described in Chapter 3 and, in that case, could be completely elaborated. In principle it is possible to ask the program to find n-move mates and present it with the opening position; if it fails n could be increased. It has been estimated that a computer would have to search about 10¹⁰⁰ different paths through the resulting chess tree before it could report whether white does have a forced win from the opening position. By then it is possible that the universe would no longer exist.

A heuristic is a device which reduces the search for solutions in such large problem trees. It cannot guarantee an optimal solution (as an algorithm does), but trades some degree of exactitude for a solution which must be obtained in the order of minutes. There are three basic ways of reducing a tree which is too large to analyse fully. The first is to take a controlled look into the future, the second is to look into the past, and the third is to develop a - for want of a better word - instinct.

4.1 THE FUTURE

Consider a simple game in which each player has, at his turn, three alternatives. If the computer only looks three plys ahead then it would generate the tree shown in Fig. 4.1.

The circles are called nodes, the group of lines from a node are called a fan, and the final nodes, at ply 3, are called leaves.

In this case all 27 terminal leaves have been reached; this would not be so if either the computer or the opponent had winning moves in the intervening plys. This check should be built into the procedure which lists the legal moves (see the chess algorithm's use of label cut off').

Fig. 4.1 A look-ahead tree

At ply 3 the computer applies a static evaluation function to the leaves. The exact details of the function are dependent on the game, and will be discussed later for the particular cases of chess, draughts, and Kalah. The details are also dependent on the programmer, the machine, the language, etc., but this will be ignored. The result of applying the function is that the leaves are given a compressed description (usually an integer) which, hopefully, describes how good (positive) or how bad (negative) the positions would be for the computer if they were reached in actual play. The integers are given within the leaves at ply 3.

The computer now completes the dynamic evaluation of the position at ply 0 by applying the basic tool for playing full information games. At odd plys it selects the maximum integer of a fan and backs up that integer to the parent node; at even plys it selects the minimum integer of a fan and backs that up to the parent node. This is called mini-maxing. In Fig. 4.1 the arrows show the selected integers being passed up the tree, and how the computer decides that the left hand branch at ply 0 is the most advantageous. This is chosen because the machine must reach a position at ply 3 which will have a value of 8 at least. Because of the opponent's intervening move the other, larger, integers are assumed to be unobtainable.

Computers are sequential and must search the tree and back up the scores in an ordered fashion. An example of this systematic search is shown in Fig. 4.2. The numbers give the order in which the nodes receive their backed up scores.

Fig. 4.2

The actual positions (or the moves that generate them) are stored in a stack. In the chess algorithm (Chapter 3) the stack's name was moves and the pointers to the ply levels were in the array number of moves. In this simpler example of only 3 alternatives and 3 plys, a stack of 9 with 3 pointers (1® 3), (4® 6), (7® 9), is enough to search the generated tree. Fig. 4.3 shows the actual stack entries (1-9) accessed.

Fig. 4.3

A common trick before starting the search is to initialise the mini-max back ups (in this case there are 3: α, β and γ) to impossibly large integers ( -ve for max, +ve for min). The rough flow of the mini-max is then

Mini-max

Note how it is only necessary to record the best path (the chosen move) at the top level.

When a large number of moves and/or plys is to be searched it is sometimes better to keep the stack as small as possible. This is especially important on machines which depend on (slow) indirect addressing or paging to gain access to large amounts of storage. In this example (and unlike the chess algorithm which wastes space) the stack could be searched backwards in each ply to conserve space. Fig. 4.4 shows the stack entries accessed (the top circle) and the order in which the nodes are visited (the lower circle in each pair) in this mode of search. The saving is more and more appreciable the larger the fans.

The first chess program to use the basic mini-max with any success was Bernstein's (15) in 1958. This excellent pioneer work will now be considered in some detail to illustrate the improvements that have taken place since.

Fig. 4.4

The program first inspects each of the 64 squares asking if it is (1) occupied, (2) by whose man, (3) whether it is attacked, (4) whether it is defended, and (5) whether it can be occupied. Questions (1) and (2) should not have to be asked, but questions (3), (4), and (5) are typical examples of the extra information (other than the raw moves) required by a program if it is actually to play the game.

There are usually about 30 possible moves from a position. The program now prunes this list to 7 by applying eight questions. These questions are complicated (the first three imply having to generate the opponent's moves). They are:

Am I in check?
Can I or the opponent capture?
Can I castle?
Can I develop a knight or bishop?
Can I occupy an open file?
Can I enter critical squares of pawn chains?
Can I make a pawn move?
Can I make a piece move?

The questions are asked in order, and the first seven moves to comply are accepted. The program looks ahead 4 plys, so the number of nodes it will visit to back up the mini-max is 7⁴ + 7³ + 7² + 7¹ + 7⁰ = 2801 (see Fig. 4.5).

Fig. 4.5 Bernstein's stack

The static evaluation function at ply 4 is based on (1) gain of material, (2) defence of king, (3) mobility of pieces, and (4) control of important (centre) squares.

The program was the first reasonable chess machine, requiring more than a novice to beat it. It took about 8 minutes to select a move, but the machine (an IBM 704) was about 60 times slower than modern machines. The major part of the time the program was gathering and applying information to prune the fans down to seven moves and applying the 4-part static evaluation at ply 4. Bernstein mentions that the machine often made masterly moves. The position in Fig. 4.6 is given with the comment that the machine move (Q-K2) is the only satisfactory move for experts.

Fig. 4.6

But has the machine chosen the right move for the right reasons?

Consider the following very simple algorithm. The position has only five legal moves. The machine makes each of the five moves, and then calculates how many moves it has and how many moves the opponent has in the new position. The result is:

		Machine has	Opponent has	Ratio
1	Q-K2	29	29	1.00
2	K-B1	38	40	0.95
3	Kt-K2	34	39	0.87
4	K-Q2	35	42	0.83
5	B-K2	31	41	0.76

This simple mobility algorithm, without mini-maxing and without complicated pruning + static evaluation, has given the same best move (Q-K2) for the wrong reason.

Bernstein also gives the moves of an actual game between computer and human. The opening moves are:

	Machine WHITE	Opponent (BLACK)
1	P-K4	P-K4
2	B-B4	P-QN3
3	P-K3	N-KB3
4	B-KN5	B-N2
5	B*N	Q*B
6	N-KB3	P-B3

If one applies the simple mobility algorithm to the opening position in chess then the move P-K4 tops the list (white has 30 moves and black 20 moves, therefore mobility ratio = 1.5). If the seven best moves are chosen by this means, then it is found that the move chosen by Bernstein is almost always in this short list and, therefore, his mini-max + static evaluation would still have chosen them.

To investigate how relevant the mobility ratio is, the author applied it to selected games of ten great masters (from the nineteenth century to the present) described in Golombek's book (16). The masters are Anderssen, Morphy, Blackburne, Steinitz, Tarrasch, Lasker, Capablanca, Nimzovitch, Alekhine, and Botwinnik.

In the ten games the masters were faced with 336 positions. All these positions were presented to a program which sorted the moves for each of them into order on the basis of two requests:

Give all captures (easily detected by list legal moves).
List the remaining moves in order of resulting mobility ratio. The opponent can only make legal replies, this allows for the machine making a check move but not always moving into check itself.

The result was that if the top 16 moves of the list were kept, then for over 95% of the cases the move chosen by the master was in that list! Hopefully a deep mini-max would result in selecting that move.

The striking weakness of the approach was that it usually failed to include moves marked ! (in the book) which were not captures, i.e. the important turning point moves. Thus the majority of moves it failed to include were those which did not have an immediate effect on the game, only becoming more understandable as to why they were made by the master at a much later stage, e.g. minor pieces strategically placed to prevent the opponent developing.

One interesting feature of the generated lists was the behaviour of a move the master would make about five moves hence. These moves first appear low down the list and then rise steadily, entering the top 16 just before the master selects them. Other moves remain in the top 16 for some time before being chosen, and in these cases it often appears that the master is preparing the ground to make that move. Queen moves are almost always present.

Because of the striking relevance of the mobility ratio, many chess programs not only use it to select a short list but also to perform the static evaluation. In this role it is in effect giving a slightly inaccurate look-ahead of a further two plys. Some modern chess programs have forsaken Bernstein's rather complicated pruning and static evaluations in favour of simple pruning and evaluation (relying heavily on the mobility ratio), and used the saving in time to search wider fans and deeper plys.

Because a large proportion of the time is now spent in searching the tree, a great deal of effort has been put into developing techniques of searching more efficiently. One major advance was the discovery of the, so called, alpha-beta heuristic (17); this was a phenomenal oversight by early users of the mini-max technique, including Bernstein and even the legendary Samuel in Part 1 of his draughts playing program (11).

The alpha-beta heuristic

This states: It is not always necessary to perform a static evaluation at all the leaf positions or back up the scores from every node.

Consider Fig. 4.7. The mini-max routine takes the highest integer (6) in the bottom left hand fan. This has been backed up, via the arrow, to the parent node at ply 2. On investigation of the next fan of leaves the computer finds the first static evaluation to have the value 7; this is greater than the lowest value already obtained for ply 2, therefore it can omit all the remaining static evaluations of position in that fan.

This fact is also true for backed up values at any ply level (not just the leaves), but the test of greater than or less than depends on whether the program is at a maximising or minimising level.

Fig. 4.7

The alpha-beta pruning is fully effective if applied to Fig. 4.1. The result is shown in Fig. 4.8. The final answer is always the same as a full search, but this time all the nodes marked with a cross have not been generated. The example is ideal because the moves were in the optimum order for maximum pruning to take place. If this is so then the apparent branching factor is reduced to about the square root of its original value. This saving allows the program to almost double the maximum ply of its search, yet still obtain an answer (obviously far more accurate) in the same time the full search would take.

Fig. 4.8

Actual savings are not as great because perfect ordering of the moves is difficult to obtain. (If one could, then it would be unnecessary to generate the tree, let alone search it.) Nevertheless, reduction of the amount of work by factors of a thousand are now common in real game situations. In Bernstein's case the program could have gone down about 7 plys in 8 minutes.

Modern game playing programs now make a greater or lesser attempt to order the generated moves. The list all captures, then sort in order of mobility ratio has already been mentioned for the case of chess. It is fairly easy to have the list legal moves store captures in a separate list from non-capturing moves, which immediately imposes a certain amount of order. A further trick for the chess algorithm is to place the men in white piece and black piece so that the more powerful (i.e. more mobile) pieces will have their moves generated first, e.g. for the opening position in chess, white piece should be initialised thus:

Initialisation

so the queen is considered first and the rook pawns last.

In Chapter 3 the use of a stack to list the moves in Kalah was described. It is a fortunate fact that continuation moves spend a long time in the stack and, therefore, tend to come out at the end of the list of legal moves. Such moves obviously add more to the contents of the kalah than immediately terminal moves. The simplest static evaluation in kalah is to maximise on (contents of machine's kalah)-(contents of opponent's kalah), so if the list of moves is searched in a reverse order then a certain amount of good ordering has already occurred merely from the technique used to generate the moves.

Further tricks for cutting down the tree search have been tried, but these usually result in some loss of information, i.e. they will not always obtain the same answer as the full or alpha-beta search. This is often unimportant because the full mini-max search does not necessarily give the best answer anyway! Turing was one of the first to point out a failing in the look-ahead when applied to draughts. His program (18) looked ahead 3 plys, but had difficulty in the following situation:

Turing's Chess Position

The machine (black) is aware that the opponent is about to king a man. Because it rates a king as being worth three men this is to be avoided if possible (it loses two points). Because of the no huffing rule it seems better to sacrifice a man. If the machine does so it loses only one point, but it has only postponed the inevitable. It now faces the same problem from a worse position and, because of the too shallow look-ahead, will eventually sacrifice all its men.

Such play is very noticeable, and the machine could be beaten easily by leaving a man in the position of kinging indefinitely. At the first opportunity the machine would sacrifice men, and then play for positions where it could sacrifice again. A similar situation can arise in chess. For example the opponent is about to queen a pawn, but it is possible to check him with a man which will inevitably be captured. Turing's answer to this difficulty was that the machine should investigate the moves ahead until it finds two consecutive moves without capture, i.e. it reaches a Turing dead position.

Another failing of mini-max is illustrated in Fig. 4.9. If the static evaluation score does not have a large range then it is possible to have the maximum value many times over at ply 3. Because the machine searches from left to right, it will consider the left hand branch to be as good as the right hand branch (as both maximise to 10), and take the left hand branch. This is almost certainly not true in the situation shown.

Slagle and Dixon (19) demonstrated statistically that a significant improvement over the mini-max procedure was to inspect the successors to a similar score node. Their M and N procedure was written in LISP to play the game of kalah, a simple game in which the dominant term in the evaluation function is the difference in the number of stones in the two kalahs (see next section, The Past, and Chapter 3). Because of the simple static evaluation the situation in Fig. 4.9 is quite likely to occur.

Fig. 4.9

Slagle and Dixon found that a more correct evaluation was to add 0.2. of a stone for every repetition of the maximum value. Therefore, in the example, the left hand branch would remain at 10, but the right hand branch would now be valued at 11 because it has more options.

Unfortunately, use of this technique means that the alpha-beta cut-off must be ignored when a value equal to the present max (or min) is encountered. However, this is not difficult to program and, in the case of kalah, only resulted in a 3% loss of speed, showing that the situation is relatively rare but nevertheless significant.

A final failing of the mini-max is best exemplified by assuming that the value of a game of chess is a loss for black. A super program would look ahead from the opening position and, because it assumes its opponent is equally powerful, it would come to the conclusion that, providing the opponent has made a winning opening move, it is irrelevant what reply is made. This intrinsic assumption that the opponent thinks like the machine is dangerous; the majority of humans operate on the principle of giving the opponent the maximum number of opportunities to make a mistake. Also, game theory shows that even thinking like the opponent does not guarantee that you can predict his actions. This is fortunately not true for full information games, but is true for card games and most real life situations (the classic example being the arms race).

4.2 THE PAST

A number of techniques have been devised to allow a machine to benefit from its experience. The result is that the machine learns the game, and can reply more quickly or use the time to analyse new situations more deeply. It should also play more accurately or attractively and, for simple games, it should eventually generate an algorithm.

The two fundamental ways of improving play are to remember analysed situations and to tune the parameter settings of the pre-pruner, the static evaluation, and the dynamic evaluation functions. The first approach comes under the general heading of rote learning.

ROTE LEARNING

It was first used effectively by Samuel (11). His program saves all the draughts game positions encountered during play, together with their computed evaluations. For example, in Fig. 4.10 the nodes marked with a cross have already been encountered and analysed.

The program recognises them, and is therefore getting an evaluation backed up from a depth of 5 and 6 plys, not the usual 3. This position and improved evaluation is now stored, and should the program encounter the top position, A, in a future leaf then the score will be, in effect, backed up a further three plys (to 9) for some parts of the tree. (For a practical example see Grundy's Game, Chapter 6.) This recognition can occur at any level; in fact most people only expect it to apply to the node (8), i.e. the immediate position. However, to be most effective the machine should only try to recognise positions at the lowest two plys (it should have already detected the positions in higher plys at previous turns to play). Note that previously stored positions can also have their scores updated and improved.

Use of such stored information results in three improvements. The first is that the program plays more and more accurately because it gradually finds an increasing number of recognisable situations which have been analysed in great depth.

Fig. 4.10

A second advantage might seem, at first sight, a disadvantage.

For example, let's say we have a program which is learning draughts. It is good policy to test such a program by giving it a simple end game, e.g. two kings against a lone king. The program should discover that, almost irrespective of where the pieces start, it can always win. It will also record the few situations where the opponent can immediately take one or both of its men, and scores these as losses. Now the program is presented with an opening situation where it has three kings against one. Its immediate reaction is to sacrifice one of the kings to get to a winning position! Such behaviour is very difficult to prevent, but is compensated by its simplicity (is it really bad play?), and is more than compensated if the program has a three kings to two situation and does a one for one swap. Nevertheless rote learning systems are very prone to develop misconceptions and the problem of preventing this behaviour is largely unsolved.

One way to stop it is to present the three against one situation first. As it learns this situation it should also record how many moves are needed to effect the win. It then learns the two against one situation, and should record that more moves are required to effect this win. The program has now provided itself with a sense of direction, it will distinguish the situations and press on to the quickest win. Such considerations are important when designing learning programs, but not too critical if we only want a program to win the game. The best way of ridding programs of misconceptions is, as in this case, to actually play the three kings situation against it, but this takes time.

A third advantage of rote learning is that the program makes less and less static evaluations or back ups to the penultimate ply. If the static evaluation is complicated this can result in a saving of time (a great advantage if playing tournament chess for example).

Note that Greenblatt's chess program (Chapter 5) is provided with many opening situations to save time and avoid traps, but this recognition occurs at ply 0 (i.e. node A). This is only possible on the erroneous assumption that some of the opening moves in chess have been completely solved, a state of affairs that no rote learning program could ever reach.

The big problem of rote learning is that the information must be accessed quickly, but the more information that is stored the longer it takes the program to recognise a position and, even worse, the more time it wastes looking for positions it has never encountered! To limit the number of positions saved we must

CATALOG or INDEX
DELETE REDUNDANCIES and GENERALISE
DISCARD POSITIONS of LITTLE or NO VALUE

(a) Catalog

First note that this is an important problem when writing a compiler. As a compiler scans a program it encounters an a priori unknown number of identifiers whose properties have been declared elsewhere in the program. Whenever it finds an identifier it must check its declaration (integer, real, array, procedure, etc.) and take the appropriate action. We are only concerned with how to store the declaration (original position + evaluation) so that we quickly recognise the identifier (repeat of the position) and take the appropriate action. The usual way is to use a hash table.

This method was first used by A. L. Samuel in an assembler for the IBM 701 in 1954, i.e. before he wrote the draughts program. The name hash table is common but it is sometimes, more descriptively, called a computed entry table. Consider the following problem. We want a program to learn the names of an a priori indefinite number of games and, on request, to tell us quickly whether the game is played on a board or with cards. It has two modes of operation, both triggered by capital letters.

INPUT MODE Example:
   ACKNOWLEDGE = (new line) 
   QUERY = ARE YOU SURE?
   FAIL = I DO NOT KNOW 
   CHESS is a Board Game 
   BRAG is a Card Game 
   GO is like CHESS

RESPONSE MODE Example:
   What is GO?
   GO IS A BOARD GAME 
   What is ELEUSIS?
   I DO NOT KNOW 
   ELEUSIS is played with Cards 
   and CHESS is like ELEUSIS 
   ARE YOU SURE?
   DO IT dumpfkopf!
   and in future ACKNOWLEDGE = YES MASTER 
   YES MASTER 
   DELETE CHESS

The response time to this simple language is dependent on how quickly the machine can store, retrieve, and check the definitions. If the program stored each definition sequentially, as it was presented, then its response time would rise in proportion to the number of entries, averaging N/2 for known games and N for unknown games.

The open hash technique is the method most often used in compilers to overcome this similar problem. In this instance the initial letter of a name can be used to point into a 26 word table. For example, the program is initially given the following:

  BRAG C CHESS B BRIDGE C 
  CHECKERS B GO C GOMOKU GO ZIGZAG B 
  ZAP C.

The resulting table is shown in Fig. 4.11.

Fig. 4.11

If the program is asked

What is BRIDGE?

it will enter at location B, fail to match BRAG, fail to match CHESS, and then find the required entry, to answer

BRIDGE IS A CARD GAME.

If asked for an unknown game it will eventually reach an empty cell and report its ignorance.

The power of the method is that input of more names (preferably not starting with B,C,G,Z) can be accessed just as quickly. The theory of how to compute the entry and scan through the table is far more complicated than this simple example can show (20). It is important to get a good random spread through the table, and a better hash is to add the number of letters in the word to the initial letter, i.e. BRAG will then go into cell F etc.

Also the table is scanned in increments of 1 but, if the table length was a power of 2 (say 32), then a complete scan of the table, for input and interrogation, can be made in increments of prime numbers, in which case a simple mask of bits is sufficient to take the modulus and cause the scan to circulate. Such a scanning mode is another way of spreading information through the table. If the input is random (i.e. any first letter, any length, etc.), then the average length of search is dependent not on table size but on the number of entries divided by table size, i.e. how full it is. The average length of search in a table 80% full is about 3, a surprisingly low result. If the table contents exceed 80% then the best thing to do is to double the tables size and rehash the contents (20). N.B. this rehash is essential because the actual sequence of contents must change if any collisions occurred in the old table.

One of the drawbacks of hash tables is that it is difficult to delete incorrect or useless information, e.g. if we delete BRAG in Fig. 4.11 then BRIDGE becomes inaccessible. To overcome this a flag must be set in the deleted cell, and if we now say BLACKJACK is a card game, then the program first scans the definitions, skipping flags, to make sure there is no conflict and, on finding none, stores BLACKJACK at the first entry possible, i.e. the original BRAG cell.

Greenblatt's chess program uses a hash table to save time searching the move tree. For example the sequences

1. WR-R1   1. WR-R2    1. WR-R7
2. BK-K1   2. BK-K1 .. 2. BK-K1
3. WR-R8   3. WR-R8    3. WR-R8

would all lead to the same position. This can be detected quickly and effectively, and transforms the move tree into a directed graph (see Fig. 6.5).

Even more sophisticated ways of scanning hash tables have been studied and it now appears that if the increment (usually a prime number) is itself incremented, e.g. the jumps are P, P+1, P+ 2, etc., then the average length of search could drop to about 2.

(b) Delete redundancies and generalise

To delete redundancies usually means reflecting and rotating the position wherever possible. Examples are: draughts when only kings remain on the board, the three distinct opening positions of noughts and crosses, and GO with three rotations and a reflection for any position.

The degree of generalisation is also dependent on the game. We have already mentioned how a draughts learning program may equivalence a three kings to one situation to that of two kings to one. The program could treat the third king as irrelevant, leave him on the board, and win with the other two.

A program very much concerned with redundancy and generalisation was written by Elcock and Murray to play GO MOKU (21). Despite its name the game is more related to noughts and crosses than GO. The board is infinite and players alternate in placing a 0 or an X. The first to get a row of five of his symbols is the winner. It is impossible to give the program all the possible configurations and ask it to prune the tree (cf. Michie's Menace, Chapter 1). The program had to play the game, discover for itself the critical patterns and describe them in such a way that any reflection or rotation was still recognised. Any irrelevant 0s or Xs had to be filtered out, and this was a big problem.

One of the commonest mechanisms used by humans when solving problems is just such a rejection of irrelevant information. Even at a more automatic level, the eye has great powers of discriminating what it sees and what information is thus sent to the brain.

The author made a study of the game of kalah (22), the rules of which are given in Chapter 3. This game is peculiarly suitable to memory compression techniques, not only because every position can be reflected, but also because the content of many pits is unimportant. Consider Fig. 4.12. I. The machine is playing the top row of 3 in a pit kalah. Therefore it only needs to get 19 or more counters into its kalah on the left. This it can do by playing M1-(continue)-M2(continue)-M1, (Fig. 4.12.2). The program therefore stores this position as a winner, but it need not record the content of every pit. To find which pits are relevant (i.e. generalise the position) we non-equivalence Fig. 4.12.1 with Fig. 4.12.2. Non-equivalencing two equal values gives a zero result, two dissimilar values gives a non-zero result. Therefore we obtain Fig. 4.12.3, the Xs denote pits whose contents are immaterial to the winning sequence.

Unfortunately this is not sufficient. Consider Fig. 4.12.4, which has a winning sequence M6-M1-M3-M1-M2-M1-M4 (captures 5 stones) resulting in Fig. 4.12.5, a winning position. Non-equivalence of the two positions results in Fig. 4.12.6, which is not correct. The following additional operation must also be made: The content of any pit between the kalah and an altered pit may be relevant to the winning sequence and must be recorded. This results in the correct generalisation given in Fig. 4.12.7.

This extraction of relevant features guarantees that the contents of at least five of the opponent's pits and his kalah can be ignored and also, at most, two pits can be assigned minimum values for their contents, which can be merged into a single value, e.g. ≥ 12 for Fig. 4.12.7.

            M1 M2 M3 M4 M5 M6 
            1  2  0  0  0  0 
12.1   16                       16 
            1  0  0  0  0  0 


            0  0  0  0  0  0 
12.2   19                       16 
            1  0  0  0  0  0


            1  2  X  X  X  X
12.3  ≥16                        X
            X  X  X  X  X  X


            0  0  2  0  0  6
12.4   7                        16
            0  0  5  0  0  0

            0  0  0  0  1  0
12.5  19                        16
            0  0  0  0  0  0


            X  X  2  X  0  6
12.6  ≥7                         X
            X  X  5  X  X  X


            0  0  2  0  0  6
12.7  ≥7                         X
            X  X  5  X  X  X

Fig 4.12

To get the same effect in other games, e.g. a chess winning position, one should systematically transform each man on the board until the essential men are identified. Such transformation programs are useful when designing (apparently) new two-move mate problems.

(c) Discard positions of little or no value

Rote learning programs are fairly effective at the opening and end of games like chess and draughts because the probability of recognition is high. However, in the middle game, the probability of meeting the same position twice is relatively low, and such complicated positions cannot be generalised significantly. Consequently the program can store a great deal of virtually useless information during the middle game. One way to prevent this is to detect the start of the middle game (e.g. the program has not recognised anything in the last 2 moves), and stop the program storing the positions until the number of men drops to a level which indicates the end game has been reached. Another method, used by Samuel, is to assign a time to each new board position. If that position is ever referenced, either to improve its evaluation or for use in the look-ahead, its time is boosted. Periodically a search is made of the memory and any position which has run out of time is forgotten, i.e. erased.

A final and extremely sophisticated way of memory compression can occur when all the moves of a fan have been analysed, and the best (or as good as any other) move has been found. This means the program can ignore useful positions it has previously analysed, stored, and used. An example of this is Michie's noughts and crosses machine (see Chapter 1) which would eventually only make corner openings despite the fact that middle and centre openings are also safe.

To remove information (useless or useful) can be difficult if the catalog is badly designed. It is therefore not only important to access information quickly but also to be able to garbage collect easily and efficiently, a fact quite often neglected in rote learning systems. The ability to deliberately remove useful information can also help to solve the problem of misconceptions already mentioned.

This completes the discussion of rote learning. The second way of improving with experience is to adjust parameters.

Adjust parameters

Most game playing programs are designed and controlled by the programmer. He decides the width and depth of search, how the static evaluation shall be made, how a short list of moves is generated, selected, and ordered, etc. Assuming no rote learning is incorporated (and even that has many parameters decided by the programmer), then the only way the program can improve is by altering the parameters which control the generation, search, and evaluation functions. The biggest (and possibly only) improvements gained by automatically altering the parameters have been with the static evaluation function.

Consider the simple game of kalah. If we start with 3 stones in each pit then a winning position is defined as 19 or more stones in a kalah. A simple static evaluation is to maximise the difference or ratio of the number of stones in the two kalahs. On either evaluation the program would prefer a position

Static Evaluation

which is dangerous.

To overcome this problem the program (22) recorded all the previous positions encountered in a game. At the termination of the game it scanned the contents of each kalah in all previous moves and, placing a 90% bias on the result, built up the following table:

MK	OK
18	18
17	9
16	4
15	2
14	2
13	1
12	1
11	0
10	0

What the table means is that if the program can, for example, obtain 17 stones in its kalah and the opponent has only 9 or less, then 90% of the time (or more) the game should be won. We therefore define the relation to be good as a win.

A better way to see what the program has learnt is to graph it thus:

Graph of Table

The demarcation line is not, strictly speaking, a discontinuity. For example, 17:10 is probably preferable to 16:6, but the program never became that sophisticated. The reason for this is that the numbers in the right hand column are a measure of how good the machine's opponent is. Against a poor player the numbers would be larger; against a good player they would be smaller. As the program was taught by the author trouble was encountered in the following position:

Kalah Position

Although this is a winning position for the opponent, the machine had, by now, become quite a powerful player. Consequently unless a narrow path of correct moves is now taken the machine has a good chance of drawing or winning. Eventually it will begin to believe that one stone in each kalah is a won or drawn game! The unfortunate state of affairs had been reached where the program, having learnt to play a good game, now began to deteriorate against bad play. In fact, in demonstrations to visitors, the learning and adjustments of parameters had to be shut off because the program sometimes won from appalling]y bad positions. The moral of this story is that if you have a program which learns in this fashion, don't let it play bad opponents or don't allow it to downgrade what it has learnt from good opponents.

The best example of automatically adjusting parameters and, also, of how to stop the program backsliding is by Samuel (11) with his learning procedures involving generalisations. A simple illustration of the principle involved is to go back to the static evaluation of kalah positions. As already mentioned we can either difference the contents of the kalahs or take their ratio.

One way to decide which is the better evaluation is to incorporate them into opposing programs and let them play a number of games. In a complicated game there are a number of parameters which have relevance to the evaluation. For example, Greenblatt's evaluation of the chess board is given by S = B + R + P + K + C where

B is a material balance
R is the piece ratio
P is a measure of the pawn structure
K is the king safety term
C is control of the centre

All these terms were chosen and rated by Greenblatt. What Samuel did was provide a long list of such terms from which two opposing programs (Alpha and Beta) took a subset of 16, weighted them, and then played a game of draughts. Alpha generalised its experience after each move by altering the weight on its terms, and (sometimes) replacing terms which appeared useless with new terms from the long list. Beta used the same evaluation polynomial for the whole of a game.

When a game was finished a neutral compared the ability of Alpha and Beta. If Alpha had done better, then Beta became Alpha. If, however, Beta had prevailed, then Alpha scored a black mark. After three black marks Alpha would be punished by the removal of its most highly weighted term. If this is not done the programs will improve only slightly and then stagnate. The punishment kicks Alpha away from the local solution it has discovered and, hopefully, it will find a larger hill to climb thus:

Local maxima

The actual program was far more complicated than this rough outline and the reader should consult the original paper. Samuel's conclusions were that rote learning was good at the opening and end game but the middle game was much better treated by parameter adjustment.

One illuminating error by Samuel is that one of the terms provided is the relative piece advantage. He states, It is assumed that it is to the machine's advantage to reduce the number of opponent's pieces as compared to its own. A reversal of the sign of (relative piece advantage) will cause the program to play give away checkers, and, with learning, it can only play a better and better give away game. In actual fact two kings can usually win against one in either version of the game, and it is impossible to draw in misere draughts because of tempo. This proves that it is not always possible to have a single parameter in the static evaluation.

The Alpha-Beta principle is applicable to the game of Twenty Questions. We could supply a program with a long list of questions, e.g. is it an animal? can it fly? etc., and it would learn in what order to ask the questions. A neutral program could select an object, answer Alpha and Beta separately, and decide which program was the better.

The failing of the learning principle is that the long list has always been supplied by the programmer. We now consider programs that are told absolutely nothing about the game they have to play; that they improve at all can only be explained by inheritance .

4.3 INSTINCT

If we touch something dangerous we jerk away. This is not because of past experience, nor by looking into the future; it is an instinctive move in the game of survival. Instincts are the result of biological evolution which (although based on random mutations, crossing, and selection) is not a blind trial and error process. The hereditary material of all individuals of a species is organised into a collective intelligence mechanism whose function is to actively solve the problems (old or new) which have to be faced for the survival of the species. One of the most startling inventions of biological evolution is the human brain.

The original theory of evolution by Darwin stated that a set of elements which could (1) self reproduce, and (2) undergo hereditary changes (mutations) would evolve by a process based on the survival of the fittest.

If this were so, then why is it a characteristic of all higher animals and plants that there should be two sexes? Surely a population consisting entirely of parthenogenetic females would increase twice as fast as would a population of equal numbers of males and females? Males would appear to be a waste of material. The compensating advantage of sexual reproduction is that it quickly spreads favourable mutations through the species with the possibility they will all combine in a single individual.

N. A. Barricelli used a computer to test this and other theories (23). In describing his work he used biological terms to refer to purely mathematical concepts, but warned the reader against too broad an interpretation.

Barricellian symbioorganisms

Barricelli devised the following extremely simple rules for sexual (symbiogenetic) reproduction. There are two integer arrays, this generation and next generation. The array this generation initially contains a random pattern of positive and negative integers. The following algorithm (expressed in Algol) is now executed:

integer array this generation, next generation [1 :512];
begin
 loop: for i : = 1 step 1 until 512 do 
       begin 
       n := j := this generation[i];
reproduce: if j = 0 then goto next i;
       k := modulo 512 of (i) plus: (j);
       if next generation[k] > 0 then 
       goto mutation else 
       next generation[k] := n;
       j := this generation[k];
       goto reproduce;
  mutation:
  next i: end;

  copy next generation into this generation;

  print this generation;

  goto loop;
end;

Two examples of what happens are shown below:

Evolution

An underlined integer means it is negative. In the left hand figure no reproduction has occurred, there is only one 4 and one 2 at each generation. In the right hand figure an extra integer (3) is present; this has caused the integer 4 to reproduce twice at generation 2. The whole point is that an integer will only reproduce when it is in contact with a different integer, i.e. reproduction is dependent on a symbiotic co-operation.

To the above reproduction rules one may add arbitrary mutation rules (at the label mutation: in the program); for example, by taking advantage of the cases where two different integers collide in the same square, we can (i) add the two integers and divide by 2, (ii) leave the square empty, etc.

Starting from a random pattern it is possible for the integers to organise themselves into stable configurations (called numerical symbioorganisms) which have many properties similar to those observed in living structures. The two properties of growth and self-repair are shown by the symbioorganism (5,-3,1,-3,0,-3,1,0) in Fig. 4.13. At line 9 about 50% of its growing structure was destroyed randomly, yet the pattern reforms successfully. Another property, namely crossing, is shown in Fig. 4.14. N.B. the Xs denote where different integers are colliding and have been replaced by zero.

Fig. 4.13 Example of growth and self-repair

Fig. 4.14 Crossbreeding in numeric symbioorganisms

Some other analogous properties claimed are spontaneous formation, great variability, parasitism, mutation (this must be programmed), and evolution. Of these, spontaneous formation can be easily tested for by the reader. Start generation 0 with a random sequence of + 1's, - 1's, and 0's with roughly as many non-zero integers as zero's. By generation 100 a number of organisms should have appeared. These organisms exist in a cyclic universe (a modulo 512 word array in the example program) and, as they grow, they come into contact with each other. Sometimes a crossing will occur, but more often an 'area of devastation' appears in the printout at the point of contact. The two organisms appear to be fighting a battle for the survival of the fittest and, if sophisticated mutation rules are programmed in, new organisms can appear of such power that they will crush all opposition, and eventually cause homogeneity by occupying the whole universe. The reader is cautioned that this is a rare but exciting event; the most common outcome is that the organisms break down and decay into disorganised homogeneity because the original symbioorganisms are infected by parasites (see original paper).

As Barricelli gained experience both these homogeneity problems were eventually solved. He used different mutation rules in different sections of the universe, modified the reproduction rules to create different environments, and exchanged segments of different 512-word universes every 200 generations. Successful evolution experiments, lasting many thousand generations, were performed. The dominant organisms were recorded at intervals of every 200 generations. Later tests showed clearly that symbioorganisms at an advanced stage of evolution (greater number of generations) easily crushed the earlier, more primitive organisms.

The question now arose of whether it was possible to select symbioorganisms able to perform a specific task assigned to them. To do this Barricelli proposed that colliding organisms should be detected and then made to play a game, the winner being permitted to occupy the collision place. The first game chosen was Tac Tix (24), and the rules are as follows.

In a square of (6 × 6 = 36) coins the players remove, alternately, any number of adjoining coins in any single row or column. The player who picks up the last coin is the loser. The game was invented by Piet Hein of Copenhagen, who showed that the reverse game (who picks up the last coin is the winner) is trivial because of symmetry. The second player can always win the 6 × 6 game by making mirror moves and, for squares with an odd number of sides, the first player wins by taking the centre coin and then playing mirror moves.

However, the game faced by the organisms has not been solved. The decision to play a game was taken every time a collision occurred in certain game-competition areas of the universe, the first number in a cell being treated as a gene of the left player, and the last one as a gene of the right player. A sophisticated program then identified the two rival organisms and, interpreting the integer pattern of the structures as a game strategy program, a player body (technically called a somatic structure) was determined. The first move was made by using the player body of the left organism as a set of instructions operating on the initial position of the game. The second player then 'made his move' from his player body program plus the new game situation. The players are constrained to legal moves by a 'referee' who also decides which organism eventually wins, and the winner's gene is then entered in the collision place.

In games between symbioorganisms the majority are lost by a stupid move of the loser who takes all of the remaining coins at his last move, instead of leaving just one. An example game is shown below left (odd moves are first player, even moves are the second player).

Fig. 4.14a Tac Tix

In contrast the game recorded on the right was played between two highly evolved organisms, and the winner's (the second player) last (14) and next to last (12) moves were both correct decisions leaving no favourable choice to the opponent. For human beginners, the mean number of correct final decisions by the winner in the first 5 games is also between 1 and 2 (termed quality 1 or 2).

The conclusion made by Barricelli was that it required (for a machine speed of 1 million operations per second) about 2 hours for the organisms to reach an average game quality higher than 1, i.e. to learn what the game is all about; but there were large fluctuations in the progress because the organisms were very prone to infection and damage. Nevertheless the progress was significant.

Barricelli speculated that although it took the organisms a long time to learn the simple operation of leaving a single coin it may be possible to test them in a game of chess. This is because the crossing mechanisms, which play a fundamental role in the process, are extremely well suited to make progress, via evolutionary learning, in many different directions at the same time. Such a program was written by Barricelli and the author, but lack of machine time prevented further progress. The results have a mainly biotheoretic significance. Their importance lies in the fact that the problem is completely unknown to the organisms, many of which ignore it completely and are quickly destroyed. It is not exactly fair to compare their performance with a human player who has understood the point of the game. A more exact analogy would be to display the legal moves to a human and punish him if and when he eventually lost a game; the human would not have learnt the game until he could state that its object was to leave one coin, and this takes considerably more than 5 games for a beginner to achieve.

Other works along these lines are (25), (26).