A Computer Movie Simulating Urban Growth in the Detroit Region

Waldo R Tobler, University of Michigan

1970

Economic Geography

In one classification of models the simulation to be described would be considered a demographic model whose primary objectives are instructional. The model developed here may be used for forecasting, but was not constructed for this specific purpose, and it is a demographic model since it describes only population growth, with particular emphasis on the geographical distribution of this growth.

As a premise, I make the assumption that everything is related to everything else. Superficially considered this would suggest a model of infinite complexity; a corollary inference often made is that social systems are difficult because they contain many variables; numerous people confuse the number of variables with the degree of complexity. Because of closure, however, models with infinite numbers of variables are in fact sometimes more tractable than models with a finite but large number of variables [27]. My point here is that the utmost effort must be exercised to avoid writing a complicated model. It is very difficult to write a simple model but this, after all, is one of the objectives. If one plots a graph with increasing complexity on the abscissa, and increasing effectiveness on the other axis, it is well known that science is only asymptotic to one hundred percent effectiveness. No scientist claims otherwise. But the rate at which this effectiveness is achieved is extremely important, ceterus paribus. In other words, the objective is high success with a simple model. Statistical procedures that order the eigenvalues are popular for just this reason. Because a process appears complicated is also no reason to assume that it is the result of complicated rules, examples are: the game of chess, the motion of the planets before Copernicus; evolution before Darwin and the double helix, geology before Hutton, mechanics before Newton, geography before Christaller, and so on. The plausibility of models also varies, but this is known to be an incomplete guide to the scientific usefulness of a model. The model I describe, for example, recognizes that people are born, migrate, and die. It does not explain why people are born, migrate, and die. Some would insist that I should incorporate more behavioral notions, but then it would be necessary to discuss the psychology of urban growth; to do this properly requires a treatise on the biochemistry of perception, which in tutn requires discussion of the physics of ion interchange, and so on. My attitude, rather, is that since I have not explained birth, migration, or death, the model might apply to any phenomenon that has these characteristics, e.g., people, plants, animals, machines (which are built, moved, and destroyed), or ideas. The level of generality seems inversely related to the specificity of the model. A model of urban growth should apply to all 92,200 cities [9, p. 81] (not just to one city), now and in the future, and to other things that grow. These are rather ambitious aims. Conversely, the model attempts to relate population totals only on the basis of prior populations, and neglects employment opportunities, topography, transportation, and other distinctions between site qualities. Consequently the only difference between places in the model is their population density, and other demographic differences are ignored. Similarly, the population model attempts to relate population growth only to population in the immediately proceeding time period. Since, by assumption, everything is related to everything else, such a neglect of history may prove disastrous. To include all history, however, is known to require integral equations of the Volterra type [37] and these complicate the presentation. We may also determine empirically whether a neglect of history has serious consequences, at least in the short run. In summary, the many simplifications of the model are acknowledged as advantages, particularly for pedagogic purposes.

Conceptually, I have been influenced by Borchert's model of the twin city region [2]. This was later applied to Detroit by Deskins, and I have used his data [8]. As formulated by Borchert and Deskins the model is in graphical form and suggests that the lines of growth coincide with extrapolations, modified by local conditions, of the orthogonal trajectories to the level curves of population density. The difficult step is to estimate the amount of growth along these trajectories, Presumably this is proportional to the population pressure, or the gradient of the population density [23].

Following Pollack [26] specific equations may now he postulated, letting dP/dt denote population growth at any location:

dP/dt = k, constant regional growth, or

dP/dt = kP, proportional growth, or

dP/dt = k(1- α)P, logistic growth, or

dP/dt = k[(dP/dx)2 + (dP/dy)2]½ , growth is proportional to the population gradient, or

dP/dt = k(d2P/dx2 + d2P/dy2), growth is proportional to the rate of change of the population gradient, or

d2P/dt2 =k(d2P/dx2 + d2P/dy2), the acceleration of growth is proportional to the population curvature, and so on.

Each of these equations could now be examined in some detail, or converted to finite difference form for empirical estimation purposes, but I prefer to generalize in a different direction.

The simulation of urban growth raises questions of geographical syntax. As an example, recall that many predictive models are of the form

          C=BA 

where A is an n by 1 vector of known observations, B is an m by n transformation matrix of coefficients or transition probabilities, and C is the m by 1 vector to be predicted. This scheme seems inadequate as a geographical calculus. The geographical situation is better represented, in a simplified special case, as

          D=NGE 

where G and D are now m by n matrices, isomorphic to maps of the geographical landscape [32], and N and E are coefficient matrices representing North-South and East-West effects. The matrix D could of course be converted into a long column vector (mn by 1) by partitioning along the columns and the placing of these one above the other. But this destroys the isomorphism to the geographical situation. Since the purpose of computing is insight, not numbers,[13] I aim for a simple structure. Using geographical state matrices seems more natural than using state vectors.

To some extent attempts to simulate urban growth are also related to the problem of comparing geographical maps, a question which occurs frequently in geography [30]. Let me clarify this analogy. Suppose I have a map showing the 1930 distribution of population in the Detroit region, and a map of the 1940 distribution. I would like to measure the degree of similarity of these two maps. Some type of correlation coefficient is needed. Certainly this is necessary to evaluate an urban growth model, which can be considered a means of predicting a map of population distribution. In order to evaluate the coefficient of correlation properly, I should have some notion of the probability of two randomly selected maps being similar. This requires some information concerning the distribution of actual population maps over the set of all possible population maps. Suppose that the population data are assembled by one-degree quadrilaterals of latitude and longitude, of which there are approximately 360 by 180 on a sphere. If only land areas are considered, say 90 by 180 ≈ 1.6 × 104 cells. If a maximum population density of 5000 persons per square-mile is allowed, each quadrilateral can contain from zero to roughly 17.5 × 106 people. The number of possible population maps is then the number of states raised to the number of cells, that is, (17.5 × 106)l.6 × 1042 ≈ 1031. Not all of these are equally likely, and a prediction much better than random can be made by asserting that there will be no change from the present. This suggests that, from an information-theoretic point of view, a prediction does not contain a great deal of information! This unhappy conclusion is avoided by recognizing that geographical predictions must be discounted for the effect of persistence.

The usual measure of association is the Pearsonian correlation coefficient. This not only serves as a measure of similarity, but also provides, via the linear regression equation, a means of prediction. Most discussions of methods of comparing maps overlook this important feature. This clearly suggests predicting the 1940 population of a cell as a linear function of the 1930 population of that cell, that is, P1940ij = A + B P1930ij. Now this, as a model, has advantages and disadvantages. For example, discrepancies between the model and the actual situation might be used as a measure of the perceived suitability of a site for occupation. More cogently, a major disadvantage is that it ignores the premise everything is related to everything else. The geographical interpretation of this premise should be that population growth at place A depends not only on the previous population at place A but also on the population of all other places. More concretely, population growth in Ann Arbor from 1930 to 1940 depends not only on the 1930 population of Ann Arbor, but also on the 1930 population of Vancouver, Singapore, Cape Town, Berlin, and so on. Stated as a giant multiple regression, the 1940 population of Ann Arbor depends on the 1930 population of everywhere else; that is, it is a function of about 1.6 × 104 variables, if population data are given by one-degree quadrilaterals. Note that the meteorologist has a similar problem when attempting to predict the weather, and solves it in the following ingenious manner [10, 11, 14, 15, 25 pp. 233-56; 9696]. The world wide (or hemispheric) distribution of the pertinent weather elements are summarized by an approximating equation. The coefficients of this equation are then used as surrogate variables, much reduced in number, representing the actual distribution. Geographers have also recently used such trend equations [6], but not in this interesting manner. The global distribution of population could now be approximated by an equation with a modest number of coefficients. Alternately, the world population potential [29] could serve as a single surrogate for the 1.6 × 104 variables. Instead of using this approach I invoke the first law of geography: everything is related to everything else, but near things are more related than distant things. The specific model used is thus very parochial, and ignores most of the world.

There is merit in considering urban growth from yet another point of view. Think of it as a linear input-output system; that is, the 1930 population distribution serves as input to a black box, the output of which is the 1940 population distribution. Two points of view can be taken: (a) given the inputs and outputs, calculate the characteristics of the black box, i.e., infer the process; or (b) design the system to achieve a specific output. The latter is what an engineer does when he builds a radio, or what some urban planners hope to do. The present intent is to deduce some characteristics of the process.

A convenient method of studying linear, origin invariant black boxes is by means of the response to a unit impulse:

-

In the present instance the input and output are both two-dimensional distributions, and it is assumed that the system consists of a linear, positionallv invariant, local operator. Such processes are less familiar to engineers but occur in the study of optical systems [20, 17 pp. 278- 281, 28]. The equivalent to the unit impulse is the unit inhabitant. Let us see what happens to him in a decade:

  1. he has 0.3 children,
  2. 0.2 of him dies,
  3. 0.05 of him moves to California,
  4. 0.4 of him moves to the suburbs,
  5. 0.6 of him does nothing

These data are fictitious, hut observe that they include, birth, death, and migration. The net result is 1.15 inhabitants, geographically distributed some what more widely than originally. This then is the final model presented.

The population of a cell, 1.5 miles on a side, is estimated as a linear function of the same and neighboring cells in the preceding time period, i.e., where the unit inhabitant came from, rather than where he went. This result can be visualized in several equivalent fashions. Consider the following Gedanken experiment. Randomly sample the population of the region under study and plot a map showing the locations of individuals in 1930 connected by a directed line to their locations in 1940. Now translate each line to a common origin, thus creating a migration rose. The end points of the migration vectors constitute a probability density surface. A comparable result could be achieved by a random sample of select cells and a study of the behavior of all of the inhabitants of these cells, followed by an averaging over all of the sampled cells. The net result should not differ appreciably from the present more indirect inferential procedure of comparing maps. Mathematically the distribution in, say, 1930 can he considered to be described by P(x,y}, that in 1940 by P'(x,y), and the spread of the unit inhabitant by W(u,v). The assumption is that each individual in P(x,y) undergoes an identical spreading W(u,v)P(x+u,y+v) and the final result is the sum of the individual effects, i.e., ∫∫ W(u,v)P(x+u,y+v) du dv. Now if F(W) denotes the Fourier transform of W(u,v), F(W) = ∫∫W(u,v) exp(2πi(au + bv)) du dv then, by the two dimensional convolution theorem F(P') = F(W)F(P). Thus, by converting to the frequency domain there exists a convenient procedure for calculating the spread function. Specific computational details, and application to other geographical situations are given in an earlier paper [33]. The similarity to Hagerstrand's Mean Information Fields [12], and to an approximately 1000-region input-output study [18] should be apparent. A stochastic model can be written along similar lines [1].

For the initial computer movie [19] the equations used are P1930+Δtij=ΣΣWpqP1930, with p and q ranging from -2 to +2, and with Wpq = Apq + BpqΔt, where Δt is measured in years from 1930. Apq and Bpq were obtained from the coefficients given in the earlier paper [33] by weighting the 1950/60 coefficients twice as much as the 1930/40 and 1940/50 values. An additional movie, giving equal weight to all of the time periods by using Wpq= Apq + Bpq Δt + Cpq(Δt)2 may be more realistic. Both of these models describe time variant systems [3]. The movies simulate from 1910 to 2000 in time steps of Δt= 0.5 and Δt= 0.05 years. A time step of one frame per month would appear to be the most appropriate speed, assuming viewing at 16 frames per second. An interesting question is whether the same coefficients could he used for some other urban region of the United States since the exogenous conditions are obviously relatively constant.

The expectation of course is that the movie representations of the simulated population distribution in the Detroit region will provide insights, mostly of an intuitive rather than a formal nature, into the dynamics of urban growth. Comparison of the simulated values for 1930, 1940, 1950, and 1960 with the actual values for these dates shows that the model differs from a simple interpolation, which could in fact be made to provide an exact fit to the data and its time derivatives. Viewing the movies suggests that the model introduces an excessive amount of smoothing, and that the decline in population of the CBD does not seem to have been adequately captured by the equations. These inadequacies may he due to several factors. For example, the neighborhood over which the spread function was estimated may have been too small, or the 8200 square-mile region over which it is averaged too large. Both of these deficiencies could be explored by additional computations using the available data. Since there is some evidence that diffusion waves occur in city growth [ 24, 22, 35 pp. 326-340], an equation somewhat more general than those postulated earlier may be proposed to characterize geographical change, namely,

where k is a variable function of x, y, and t. This is clearly an attempt to adapt the linear differential equation commonly encountered in systems analysis to take into account the geographical aspects of the problem. It can also be viewed as a statistical procedure for predicting a univariate geographical series, the usual exponential time discounting being extended to include exponential-like space discounting, each observation being related to a space-time cone of previous and nearby observations. There is no assurance, of course, that urban growth can be described by positionally invariant linear equations; eventual extension to interactive multivariate geographical forecasting is also required. From a pedagogic point of view the model presented here has the distinct advantage that its shortcomings are obvious. The model given here, for example, uses translationally invariant two-dimensional Fourier transforms, but a rotationally invariant Mellin-Fourier transform would seem more appropriate for cities. This would allow the spreading of the unit inhabitant to depend on his distance from the CBD, and this seems a more realistic approximation to the true, situation.

LITERATURE CITED

1. Bailey, N. "Stochastic Birth, Death, and Migration Processes for Spatially Distributed Populations," Biometrika, 55 (1968), pp. 189-98.

2. Borchert, J. "The Twin Cities Urbanized Area: Past, Present, Future," The Geographical Review, 51 (1961), pp. 47-70.

3. Brown, B. M. The Mathematical Theory cf Linear Systems. London: Chapman and Hall (Science Paperback), 1965.

4. Brown, B. C. Smoothing, Forecasting and Prediction, Englewood Cliffs: Prentice Hall, 1962.

5. Bunge, M. "The Weight of Simplicity in the Construction and Assaying of Scientific Theories," Philoscphy cf Science, 28 (1961), pp. 120-19.

6. Chorley, H. and P. Haggett, "Trend Surface Mapping in Geographical Research," Transactions and Papers, Institute of British Geographers, 37 (1965), pp. 47-67.

7. Connelly, D. S. "The Coding and Storage of Terrain Height Data: an Introduction to Numerical Cartography," Unpublished thesis, Cornell University, September, 1968.

8. Deskins, D. R., Jr., "Settlement Patterns for the Detroit Metropolitan Area: 1930-1970," Unpublished paper, Department of Geography, University of Michigan, 1963.

9. Doxiadis, C. A. Ekistics. New York: Oxford University Press, 1968.

10. Epstein, E. S. "Stochastic Dynamic Prediction," Tellus, forthcoming.

11. Friedman, D. "Specification of Temperature and Precipitation in Terms of Circulation Patterns,"journal cf Meteorology, 12 (1965), pp. 428-35.

12. Hagerstrand, T., Innovation D~fusion as a Spatial Process. Translated by A. Pred. Chicago: Chicago University Press, 1967.

13. Hamming, R. Numerical Methods for Scientists and Engineers. New York: McGraw-Hill, 1962.

14. Hare, F. "The Quantitative Representation of the North Polar Pressure Fields," Polar Atmosphere Symposium. New York: Pergamon Press, 1958.

15. Haurwitz B. and R. Craig, "Atmospheric Flow Patterns and Their Representation by Spherical Surface Harmonics," Geophysical Research Paper No. 1-1, Cambridge Research Center, 1952.

16.Highway Research Board, Urban Development Models, HRB-Special Report 97, Washington, D.C., NRC, 1968.

17. Hsu, H.P. Outline cf Fourier Analysis. New York: Simon and Schuster, 1967.

18. Isard, W. Methods cf Regional Analysis, Cambridge: MIT Press, 1960.

19. Knowlton, K. "Computer Produced Movies," Science, 150 (1965), pp. 116-20.

20. Kovasznay, L. and H. Joseph, "Image Processing," Proceedings, Institute of Radio Engineers (May, 1955), pp. 560-570.

21. Lee, D. Models and Techniques for Urban Planning, Cornell Aeronautical Laboratory Report VY-2474-G-l, September, 1968.

22. Morrill, R., "Waves of Spatial Diffusion." Journal cf Regional Science, (1969), pp.1-18.

23. Muehrcke, P. "Population Slope Maps," Unpublished MA. thesis, Department of Geography, University of Michigan, 1966.

24. Newling, .B. "The Spatial Variation of Urban Population Densities," Geographical Review, 59 (1969), pp. 242-232;

25. Petterssen, S. Weather Analysis and Forecasting. 2nd ed. Vol. 2, New York; McGraw-Hill, 1956.

26. Pollack, H. "On the Interpretation of State Vectors and Local Transformation Operators," Colloquium on Simulation, University of Kansas Computer Contribution 22, Lawrence, 1968, pp. 43-6.

27. Robinson, E. An Introduction to Irfinitely Many Variates. New York, Hafner, 1959.

28. Rosenfeld, A. Picture Processing by Computer. New York; Academic Press, 1969.

29. Stewart, J. Q. and W. Warntz, "Physics of Population Distribution," Journal cf Regional Science, 1 (1958), pp. 99-123.

30. Tobler, W., "Computation of the Correspondence of Geographical Patterns," Papers, Regional Science Association, 15 (1965), pp. 131-39.

31. Tobler, W. "Spectral Analysis of Spatial Series," Proceedings, Fourth Annual Conference on Urban Planning Information Systems and Programs, University of California, Berkeley, 1966. pp. 179-86.

32. Tobler, W. "Of Maps and Matrices." Journal cf Regional Science, 7 (Supplement, 1967); pp. 276-80.

33. Tobler, W., "Geographical Filters and Their Inverses," Geographical Analysis, l (1969), pp. 234-53.

34. Volterra, V. Theory cf Functionals and cf Integral and Integro-Duferential Equations. Dover: New York, 1959.

35. Watt, K. Ecology and Resource Management. New York; McGraw-Hill, 1968.

36. White, R. M. and W. C. Palson, Jr. "On the Forecasting Possibilities of Empirical Influence Functions," Journal cf Meteorology, 12 (1955), pp. 478-85.