Chilton::ACL::Organic evolution

Models of Organic Evolution in the Atlas Computer

P. O'Donald

1968

Like other scientists, biologists want to isolate the causes of the phenomena they observe. Like other scientists, they construct models to predict the effects of the supposed causes. But like most biological systems, organic evolution is a very complicated process and many causes interact at various levels. Until computers had arrived, only very simple models could be studied. Most higher organisms are diploids: the chromosomes - that carry the hereditary information as a linear sequence of genes - are in pairs. Thus an organism may contain two identical genes, each of which may be given the symbol A. Such an organism may be symbolised by AA: it is said to have the genotype AA. The gene A may make a particular enzyme to carry out a particular biochemical reaction. But the gene A will almost certainly exist in various slightly altered forms. The alterations or 'mutations' appear to arise at random. Thus there may be a mutation that might be symbolised by 'a'. As well as individuals with the genotype AA, there will be others with the genotypes Aa and aa. Now by the Laws of Hereditary matings between the different possible genotypes produce offspring of a particular genotype with a known probability. Natural selection can be allowed for by giving offspring of a particular genotype a particular probability of survival. AA genotypes may survive with probability I - r. Aa may survive with probability 1 - sand aa survive with probability 1 - t. In the terminology of population genetics, r, sand t are called the 'selection coefficients' of the genotypes AA, Aa and aa. Thus finite difference equations can be written for the changes in the proportions of AA, Aa and aa. Usually such equations are non-linear but approximations can be used to give linear equations and hence a matrix of transition probabilities. Or special methods can be used to give solutions in certain cases (1, 2). Solutions of the more important problems by these methods produced a revolution in our knowledge of evolution by natural selection. The theory was given a firm basis within the known Laws of Hereditary. The now classical work of Fisher, Haldane and Wright (3, 4, 5) is sometimes called neo-Darwinism.

Even so, Fisher, Haldane and Wright were only able to work with very restricted models. For many important evolutionary problems, models must be constructed that allow for a pair of genes at one particular site or locus within a pair of chromosomes to interact with a pair of genes either at another locus in the same pair of chromosomes or at a locus in a different pair of chromosomes. The genes A or a may be at one of the loci and the genes B or b may be at the other locus.

Suppose the genes B and b give rise to the genotypes BB, Bb and bb which determine to some extent how an individual chooses its mate. The individuals who are BB or Bb may prefer to mate with those who are AA. Such mating preferences are common in the higher animals and can often be observed when the genotypes AA, Aa and aa produce striking differences in bodily form or colour pattern like the different colour forms of moths and butterflies. Mating behaviour is often a highly developed ritual in insects and the genotypes BB, Bb and bb may cause subtle alterations in the ritual affecting the chance of mating. Thus sexual selection may take place. In a model of this process there are nine variables because there are nine genotypes - AABB, AABb. AAbb. AaBB. AaBb Aabb. aaBB. aaBb and aabb - and when sexual selection and natural selection have been built into the model in a general way, the finite difference equations are very complicated and there are no approximations to lead to a useful algebraic solution (6). In nature the population will necessarily be finite in size, comprised of N individuals. By generating a random number for each individual, the proportions of the genotypes can be chosen as a multinomial sample of size N from a population whose expected proportions are given by the finite difference equations. This model is possible only in a fast computer like the Atlas. It revealed some interesting ways in which a gene can be maintained in a population by a balance of the forces of natural selection and sexual selection (6).

But even the more complicated models, like this one of sexual selection, may be much too simple to be realistic. And if we were to add only one more pair of genes, C and c at another chromosomal locus, it would become a work of many months even to write out the equations, for there would then be 27 variables. Another approach must therefore be used. It is to simulate the population of organisms in the computer directly. The words of the computer are used to represent the animals making up the population. The pattern of binary digits in a word represents the genes in the organism. The successive digits of the word, taken in pairs, are used to represent the two alternative genes at a particular locus in a chromosome: the digit I can be the gene A and the digit 0 can be a. The first digit of a pair can be a gene on one of the chromosomes of the pair of chromosomes and the second digit can be the gene on the other chromosome. Frazer (7) was the first to simulate genetical populations in this way. Very detailed models can be sent up with alternative genes at loci either on the same pair of chromosomes or on different pairs. But these models have a serious disadvantage: the computer can do only one thing at a time. The larger the population and the more individuals there are for mating, the longer the program takes to run. A very fast computer is thus needed and I have written a set of subroutines in ASP, the assembly language of Atlas, to carry out the basic operations on the computer words regarded as organisms. If there are N individuals in the population, an array of N words represents the population. Two such arrays are needed - one to store the individuals of generation n and the other to store those of generation n+1. Suppose mating is at random. Two words are picked at random from the array of generation n using a random number generator. The subroutine MENDEL then picks out the egg or sperm from each of these individuals according to the Laws of Heredity: the pairs of chromosomes' are assorted at random so that one chromosome of each pair is placed in the egg or sperm: the appropriate digits of every pair of digits representing the alternative genes at particular loci in the chromosomes are picked out and placed in an empty word representing the egg or sperm. Breakage and rejoining between the two chromosomes of a pair - the process of 'crossing over' in genetics - can also take place. The egg and sperm thus formed are then united to form a fertilized egg in a word in generation n + 1. The subroutine FISHER can then be used to calculate what quantitative effect the genotype of an individual will have on some given character. For example a series of genes may affect a character like intelligence or height and so may cause variations in these characters that indirectly affect an individual's chance of survival. The subroutine SELECT then determines whether the individual will live or die according to the assigned or calculated chances of survival of that individual's genotype. Thus suppose because of his genes an individual should have a mean chance of survival of 80 per cent. A random number is generated in the interval between 0 and 1. If the random number is greater than 0.8, the individual dies. If it is 0.8 or less, he lives. There are other subroutines to simulate special processes and to input the genotypes and output the results. All the routines are called in a main routine written in Fortran. Thus for each genetical problem to be studied by simulation a special Fortran routine is written. It is usually in the form of a loop which programs the changes from one generation to the next and in which the ASP routines are called. Very general models can be simulated by this system. It is limited because only the low half of an Atlas word is used for the genotype. But this limitation makes the programming much easier and the program faster to run because the half-word can be read directly into a B:register where the logical operations of picking out digits and shifting them take place. Thus if successive pairs of digits represent the alternative genes at particular loci in the pairs of chromosomes, only 12 loci can be simulated because a halfword contains only 24 bits. However, in systems of genes affecting quantitative characters, the appropriate use of the subroutine FISHER allows for any order of interaction between the loci up to the 12th order. The loci can also be sited at any distance along a pair of chromosomes. And some loci can be on one pair and other loci on other pairs of chromosomes. It is also possible to allow for the tendency of individuals who resemble each other to choose each other as mates. For example, in man, tall men tend to marry tall women, intelligent men tend to marry intelligent women, and so on: there is a correlation between mates. If so, then the subroutine FISHER can be used to calculate the degree of the similarity of a pair of individuals and hence the probability of their mating.

An outstanding problem of evolutionary theory is how the genetical variations of a population are maintained. For example, why do some individuals in human populations have red hair? Red hair is caused by a gene that has its effect only if it is in both of the chromosomes of the pair. Thus if a is the gene for red hair, only the aa individuals are red haired; AA and Aa are normal. The gene a is said to be recessive: it only works in a double dose. Recessiveness can be caused by the genes at other loci interacting with the Aa individuals to repress the effect of a when it is in company with A. Models of this evolution (8, 9, 10) have shown how a gene that was at first deleterious and eliminated by natural selection could become advantageous by the natural selection of other genes that reduce its deleterious effects. In fact the models show how a gene may start disadvantageous when it first appears as a mutation and may end up advantageous, either spreading through a population to replace the original gene, or coming to a stable equilibrium by a balance of the forces of natural selection, as the gene for red hair has presumably done. Thus if the selection coefficient of a genotype is the mean probability of death of individuals possessing it, then natural selection will at the same time be acting to reduce the selection coefficient. This may explain why evolution continues although almost all mutations are deleterious at first: selection will make them advantageous in the long run if they give rise to new characters with potential advantages.

This is just one of the models of evolution I have simulated in Atlas. It would be impossible of course to describe many of the results of the simulations without explaining the genetics of the processes in great detail.

References

1. Fisher, R. A. Proc. Roy. Soc. Edin. 50, 205-220 (1930).

2. Haldane, J. B. S. Proc. Camb. Phil. Soc. 23, 19-41 (1924).

3. Fisher, R. A. The Genetical Theory of Natural Selection. Clarendon Press, Oxford. (1930).

4. Haldane, J. B. S. The Causes of Evolution. Longman Green (1932).

5. Wright, S. Genetics 16,97-159 (1931).

6. O'Donald, P. Heredity 22, 499-518 (1967).

7. Frazer, A. Aust. J. BioI, Sci. 10,484-491 (1957).

8. O'Donald, P. Proc. Roy. Soc. Lond. B. 168,216-228 (1967).

9. O'Donald, P. Genetics 56, 339-404 (1967).

10. O'Donald, P. Genetics in press (1968).