Chilton::ACL::Explanatory Hypotheses

Prehistory, Explanatory Hypotheses and the Computer

James Doran

1971

Few archaeologists would dispute that their primary task is the reconstruction and explanation of man's unrecorded past. Given an immense amount of detailed evidence derived from the artifacts and environmental traces that prehistoric man has left behind him, given all that we know of present human societies and their relationship with their environment, the prehistorian must not only elucidate what happened in the vast expanse of time before history begins, he must also try to understand it, to explain it.

This task is no easy one and in recent years leading American archaeologists have begun to urge their colleagues to abandon traditional archaeological methodology, with its suspect antiquarian origins, and to adopt a rigorous scientific methodology (see, for example, Binford, 1968; Fritz and Plog, 1970). Following the main emphasis in the works of philosophers of science such as Hempel, they have stressed the need for archaeologists to formulate explicit hypotheses from their observations and then rigorously to test these hypotheses on new independent data. They advocate the use of mathematical and statistical methods for hypothesis testing, using digital computers as appropriate. They attack what they see as the unjustifiable tendency of the traditional archaeologist to employ inductive reasoning and to make analogies with surviving primitive societies.

In practice this new scientific approach in archaeology encounters major obstacles. Not only is there the obvious difficulty that archaeologists, unlike most scientists, cannot conduct controlled experiments; there is also the problem that, even more than most scientists, archaeologists cannot merely evaluate hypotheses. They have first to obtain them. In fact an archaeologist spends a great part of his time devising explanatory hypotheses from this is probably a storage pit, through this is obviously a late Bronze Age settlement to maybe urban civilisation only evolves where there are good opportunities for long distance trade. Philosophers of science tend to dismiss the origins of hypotheses as a matter of mere psychology (or mysterious creativity) and to stress that the validity of a hypothesis depends not at all upon its origins. Unfortunately this leaves the poor archaeologist with no guidance as to how he should move from data to hypothesis in a systematic and repeatable manner and, worse, with the impression that it is foolish to hope for such guidance.

Automatic generation of explanatory hypotheses

Fortunately the traditional attitude of philosophers of science has not gone unchallenged. A number of computer scientists have begun to study the generation of scientific hypotheses and theories either with a view to developing a logic of induction (for example, Meltzer, 1970, and Plotkin, 1971) or with a view to the design of effective heuristic computer programs (for example Amarel, 1971, especially page 412, and Buchanan, 1966, and see below). Such work falls within the scope of that research in computer science and computational logic known as Artificial Intelligence. As such it subscribes to the view that research progress requires either relevant mathematical theorems or effective computer programs.

The prime example of a computer program which generates explanatory hypotheses is Heuristic DENDRAL (Buchanan and Lederberg, 1971). This program has been developed over the past five years at Stanford University by a research group which includes leading computer scientists such as E. A. Feigenbaum and leading organic chemists such as J. Lederberg. It rivals analytic chemists in their ability to explain empirical data by a specific hypothesis formulated against a background of accepted theory.

Heuristic DENDRAL works out the molecular structure of an organic compound given its empirical mass spectrum and, if available, other data such as NMR measurements. In order to do this it is equipped with a theory of the (imperfectly understood) ways in which molecular structures fragment under electron bombardment, as happens in the mass spectrometer, and it uses this theory to guide a heuristic search through the extremely large set of chemically possible molecular structures. This set is itself defined by the original (non-heuristic) DENDRAL generating algorithm devised by Lederberg. The outcome of the search is one, or at most a few, molecular structures each of which could plausibly have given rise to the observations.

For those not too limited classes of compound which Heuristic DENDRAL knows about its ability matches that of a post-doctoral chemist. The program has had a significant impact both upon artificial intelligence studies and upon the relevant branches of organic chemistry.

Meta-DENDRAL, a program now under development at Stanford, will go one step further (Buchanan, Feigenbaum, and Lederberg, 1971). It will not operate within a given theory of mass spectrometer processes, but will endeavour to generate that theory given only empirical data and an appropriate set of theory building blocks. This will require complex processes of generalisation and theory organisation.

It seems fairly clear that Heuristic DENDRAL and Meta-DENDRAL are but the lowest rungs of a ladder of which each successive rung corresponds to an explanatory theory of greater generality and power.

Archaeological experiments

Successful though it has been the DEN ORAL project has inevitably raised more questions than it has answered. There is a clear need for similar experiments to be conducted in other problem areas. Given the archaeological background as I have sketched it above, it is natural to explore the possibility of developing a program of the Heuristic DENDRAL type capable of interpreting archaeological data. My own experiments, which I shall now outline, seek to do just this.

The first task is to select an archaeological problem area which is well enough demarcated to be considered in isolation, and which is neither so complex as to be impossibly difficult, nor so simple that the interesting questions do not arise. The problem area I have chosen, largely prompted by previous computer work, is that of interpreting prehistoric cemeteries, particularly those from the Central European Iron Age.

The most famous of these cemeteries is that excavated at Hallstatt in Austria about a hundred years ago (Kromer, 1959). It contained well over a thousand graves, roughly equally divided between cremations and inhumations. Each burial was accompanied by a greater or lesser number of objects such as iron weapons, bronze jewellery and vessels, craftsmen's tools, and pottery. Some of these objects have considerable artistic value. It has been possible to determine the sex and age of the burial, directly from skeletal remains, only in a few instances. This is a more difficult task than one might imagine especially with the remains of prehistoric peoples. Of course, where the grave contains weapons, one can be fairly sure that the burial is male. Jewellery, however, cannot be relied upon to distinguish the sexes.

The task of the archaeologists is to explain the detailed observations made and recorded during the excavation of the cemetery and to explain the nature both of the objects which accompanied the burials and of the burials themselves, in terms of the people which gave rise to them. How was the cemetery planned (if at all)? Which burials are early in time and which late? Which burials are male and which female? What systematic burial practices can be recognised and do they change with time? What patterns of development can be recognised in the different kinds of objects recovered? How far can different social or racial groups be recognised? Can anything at all be inferred about the economic basis of the community?

Such questions, which are manifestly not independent of one another, are far from easy to answer. To begin to answer them the archaeologist must bring to bear not only such knowledge as has already been built up about the people of roughly that time and place, but also a much less specialised and more intuitive knowledge of people and the way in which they live in societies at any time and place. In practice the conclusions which archaeologists reach are rarely capable of anything like full proof. They have the status of hypotheses, to be revised as soon as fresh evidence is available from any source.

If we wish to write a computer program to answer such questions, then at least partial solutions must be found to two basic problems. Firstly we must devise an effective way of specifying, without explicit enumeration, the range of alternative explanatory hypotheses which might apply to such a cemetery, and secondly we must decide how the program is to find the most plausible hypothesis, given a particular set of cemetery data. In terms of Heuristic DENDRAL and its task, the first problem corresponds to that of specifying the range of possible molecular structures and the mass spectrometer theory, and the second to that of organising the heuristic search through the set of candidate structures.

There is little doubt that the first of these problems is much the more difficult. The computer program with which I am currently experimenting (on simulated excavation data only as yet) does its best with a range of candidate hypotheses specified in terms of a general stochastic model for the generation of a cemetery of this type. The use of such a model gives the whole exercise a flavour of statistical estimation; the program's task is to estimate the wide range of parameters in the model, some of which are numerical and some not.

This might suggest that a solution to the second problem, that of actually finding the most plausible hypothesis, could be found in a textbook on, say, maximum likelihood estimation. Unfortunately the model is too complex and ill-structured for this to be the case. As with Heuristic DENDRAL one is obliged to fall back on some form of heuristic search; my present program copies some of the heuristics used by archaeologists.

Two of these heuristics are of particular importance. Firstly, one can divide the problem of interpreting a cemetery into a range of sub-problems - a study of the chronology, of burial practice, of metal working - ignoring the interdependence of the sub-problems. This seems analogous to trying to maximise a function of many variables by maximising with respect to each variable independently and to have similar merits and demerits. Certainly such a strategy, while typically a great convenience in practice, can seriously mislead.

Secondly, archaeologists are great classifiers. They define types of burial, types of pottery, types of sword, types of cemetery and so on. The most obvious (but not the only) reason for doing so is to cut the problem down to a manageable size - it is much easier to remember the description of a particular type of bracelet, say, than it is to remember the descriptions of a few score instances of that type. Unfortunately, it is rarely easy to decide which classification of one's material to adopt - hence the interest of many archaeologists in automatic and therefore objective methods of classification (compare Hodson, 1968). Further, and more immediately relevant here, working with types always means that a certain amount of detailed information has been discarded. Again, therefore, there is a trade-off between computational convenience and ultimate accuracy.

My experiments should shed a little light on the almost completely unstudied relationship between the effectiveness of these heuristics and the structure of the inference problem itself.

Originality and induction

It is natural to ask to what extent computer programs which generate explanatory hypotheses can ever be original. Is it not the case that whatever comes out must, at some earlier stage, have been put in?

In fact, there is no doubt that such a program can propose hypotheses which have never before been considered. True, any such hypothesis is implicit in the program as written, but to stress this is to imply that a needle in a haystack is as accessible as a needle in the palm of one's hand. Obviously such a program will fail if it is presented with empirical data which can be explained only by a hypothesis outside its repertoire. Progress means enlarging the range of hypotheses which the program can handle.

Where, finally, does that elusive process induction fit into all this? A common view is that induction is just what these programs are doing; The key concept is that induction becomes a process of efficient selection from the domain of all possible structures. (Lederberg, Sutherland, Buchanan, and Feigenbaum, 1970, p. 402.) I cannot quite accept this. I feel that the term is better applied only when one moves from data to candidate hypothesis by an explicit process of generalisation or abstraction. While there is no such mechanism in Heuristic DENDRAL, there is in Meta-DENDRAL and other experimental computer programs. This is not to suggest, of course, that we know all there is to know about induction. On the contrary, we are only just beginning to know anything worthwhile about it at all.

References

1. Amarel, S. (1971). Representations and modelling in problems of program formation, Machine Intelligence 6, ed Meltzer, B., and Michie, D., Edinburgh: Edinburgh University Press, 411-466.

2. Binford, L. R. (1968). Archaeological perspectives, New Perspectives in Archaeology, ed. Binford, S. R., and Binford, L. R., Chicago: Aldine Publishing Co., 5-32.

3. Buchanan, B. G. (1966). Logics of scientific discovery, A.I. Memo 47, Computer Science Department, Stanford University.

4. Buchanan, B. G., Feigenbaum, E. A., and Lederberg, J. (1971). A heuristic programming study of theory formation in science, Second International Joint Conference on Artificial Intelligence, London, September 1971, proceedings published by the British Computer Society.

5. Buchanan, B. G., and Lederberg, J. (1971). The heuristic DENDRAL program for explaining empirical data, IFIP Congress, Ljubljana, August 1971, (proceedings to be published).

6. Fritz, J. M., and Plog, F. T. (1970). The nature of archaeological explanation, American Antiquity 35,405-412.

7. Hodson, F. R. (1968). Archaeological classification at Chilton, Some Research Applications of the Computer, Atlas Computer Laboratory.

8. Kromer, K. (1959). Das Griiberfeld von Hallstaff, Firenzi: Sansoni.

9. Lederberg, J., Sutherland, G. L., Buchanan, B. G., and Feigenbaum, E. A. (1970). A heuristic program for solving a scientific inference problem: summary of motivation and implementation. Theoretical Approaches to Non-Numerical Problem Solving, ed. Banerji, R., and Mesarovic, M. D., Berlin: Springer-Verlag, 401-409.

10. Meltzer, B. (1970). Generation of hypotheses and theories, Nature, 225, 972, (7 March).

11. Plotkin, G. D. (1971). A further note on inductive generalisation, Machine Intelligence 6, ed Meltzer, B., and Michie, D., Edinburgh: Edinburgh University Press, 101-126.