Picture-driven animation

Ronald M Baecker, National Institute of Health, Division of Computer Research and Technology, Bethesda, Maryland

May 14-16, 1969

AFIPS SJCC

Work reported herein was supported in part by Project MAC, an MIT research project sponsored by the Advanced Research Projects Agency, Department of Defense, under Office of Naval Research Contract NONR-4102(01), and by MIT Lincoln Laboratory with support from the US Advanced Research Projects Agency.

This paper is based on a thesis submitted in partial fulfillment for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology, Department of Electrical Engineering.

INTRODUCTION

Animation is the graphic art which occurs in time. Whereas a static image (such as a Picasso or a complex graph) may convey complex information through a single picture, animation conveys equivalently complex information through a sequence of images seen in time. It is characteristic of this medium, as opposed to static imagery, that the actual graphical information at any given instant is relatively slight. The source of information for the viewer of animation is implicit in picture change: change in relative position, shape, and dynamics. Therefore, a computer is ideally suited to making animation possible through the fluid refinement of these changes.[27]

The animation industry is ripe for a revolution. Historical accidents of available technology and knowledge of visual physiology have led to the evolution of the animated film as one that is created frame-by-frame[1]. The prodigious quantities of labor required for the construction of twenty-four individual frames per second of film have led to a concentration of animation activity in the assembly-line environments of a few large companies, an artificial yet rarely surmountable separation of the artist from the medium, and extravagant costs[2]. In conjunction with other trends in American society, the result is usually what the English critic Stephenson describes as the respectable sadism and stereotype of commerce [1]. Yet be offers this hopeful prediction in concluding his 1967 study, Animation in the Cinema: There seems every reason to look forward to changes which would make it possible for the creative artist to put on the screen a stream of images with the same facility as he can now produce a single still picture[1]. This paper explains how a creative artist, aided by a computer, can define a stream of images with the same facility as he can now produce a very few still pictures.

Although the computer's entrance into animation has been a recent one (1964) [3][4], the growth of interest and activity has been phenomenal [5][6][7][8]. Experience to date strongly suggests that the following statements are true:

  1. The animated display is a natural medium for the recording and analysis of computer output from simulations and data reduction, and for the modeling, presentation, and elucidation of phenomena of physics, biology, and engineering[9][10][11][12][13][14][15]. Depiction through animation is particularly appropriate where simultaneous actions in some system must be represented. If the animation is the pictorial simulation of a complex, mathematically-expressed physical theory, then the film can only be made with the aid of a computer.
  2. The computer is an artistic and animation medium, a powerful aid in the creation of beautiful visual phenomena, and not merely a tool for the drafting of regular or repetitive pictures[16][17][18][19].
  3. The formal modeling of pictures by complexes of algorithms and data facilitates the continued modification of a single animation sequence and the production of a. series of related sequences.

This paper discusses ways in which man, aided by a computer in an interactive graphical environment, can synthesize animated visual displays. It is widely recognized that such an environment facilitates man-machine communication about still pictures [20][21][22]. The paper seeks to:

  1. describe the role of direct graphical interaction and sketching in computer animation, resulting in the process we shall call interactive computer-mediated animation; and,
  2. develop a new approach to the specification of picture dynamics, one which exploits the capacity for direct graphical interaction. The result we shall call picture-driven animation.

Animation in an interactive computer graphics environment

The role of direct graphical interaction in the synthesis of animated visual displays

Three aspects of the role of direct graphical interaction in computer graphics are particularly relevant to computer animation:

  1. The availability of immediate visual feedback of results, final or intermediate;
  2. The ability to factor picture construction into stages, and to view the results after each stage; and,
  3. The ability to sketch pictures directly into the computer.

The power of immediate visual feedback in animation is striking. The computer calculates, from its representation of a dynamic sequence, the individual frames of the corresponding movie. Like a. video tape recorder, it plays it back for direct evaluation. A small change may be made, the sequence recalculated, and the result viewed again. The cycle of designation of commands and sketching by the animator, followed by calculation and playback by the computer, is repeated until a suitable result is achieved. The time to go once around the feedback loop is reduced to a few seconds or minutes. In most traditional and computer animation environments, the time is a few hours or days. The difference is significant, for now the animator can see and not merely imagine the result of varying the movement and the rhythm of a dynamic display. Thus he will be led to perfect that aspect of animation that is its core: control of the changing spatial and temporal relationships of graphic information.

Factoring the construction of an animation sequence facilitates the effective use of feedback from ear1y stages to guide work in later stages. Working on individual small subsequences helps overcome the serious practical problems of computer time and space that could disallow rapid enough calculation and playback.

We know from the computer graphics of still pictures that the computer simulates not only a passive recording agent in its ability to retain images, but an active medium which transforms the very nature oft he sketching process. This remark applies trivially to computer animation; one may construct a sequence of drawings to comprise the individual frames of the film, the static images existing at single instants of time. Picture change that extends over entire intervals of time is then synthesized as a succession of individual (temporally) local changes that alter one frame into another.

This paper goes further, for it explains how the computer can be a medium which transforms the very nature of the process of defining picture change, of defining movement and rhythm. Dynamic behavior is abstracted by descriptions of extended picture change. These descriptions may themselves be represented, synthesized, and manipulated through pictures, both static and dynamic. Thus dynamic control can be exercised globally over the entire sequence. What results is one new conception of what it means to draw an animated film.

The components required to realize an interactive computer-mediated animation system

Interactive computer-mediated animation is the process of constructing animated visual displays using a system containing, in one form or another, at least the following eight components:

Hardware:
  1. A general-purpose digital computer.
  2. A hierarchy of auxiliary storage. This is listed separately to emphasize the magnitude of storage required for the data structures from which an animation sequence is derived and for the visual images of which it is composed.
  3. An input device such as a light pen, tablet plus stylus[23][24], or wand[26] which allows direct drawing to the computer in at least two spatial dimensions. The operating environment must, upon user demand, provide at least brief intervals during which the sketch may he made in real time. The animator must then be able to draw a picture without any interruption. Furthemore, the computer must then be able to record the essential temporal information from the act of sketching. Sampling the state of the stylus 24 times per second often suffices for our purposes.
  4. An output device, such as a standard computer display scope or a suitably modified TV monitor, which allows the direct viewing of animated displays at a rate such as 24 frames per second. This is essential to enable the interactive editing of animation subsequences. The final transmission of a movie to the medium of photographic film or video tape can but need not use the same mechanisms.
Software:
  1. A language for the construction and manipulation of static pictures.
  2. A language for the representation and specification of picture change and the dynamics of picture change. We shall introduce in this paper methods of specifying dynamics not possible with traditional animation media and not yet attempted in the brief history of computer animation.
  3. A set of programs that transforms the specifications of picture structure and picture dynamics into a sequence of visual images.
  4. A set of programs that stores into and retrieves from auxiliary memory this sequence of visual images, and facilitates both its real time playback for immediate viewing and its transmission to and from permanent recording media.

Figure 1 portrays a suitable environment for interactive computer-mediated animation. Figure 2 is a block diagram. of such a system.

Figure 1 - An interactive computer-mediated animation console. The author is sketching with the stylus on the tablet. There is a CRT for viewing dynamic displays, a storage scope above it, a typewriter, knobs, toggle switches, and a telephone so that an animator may summon help.
Figure 2 - Block diagram of a minimal system for interactive computer-mediated animation. The parenthesized numbers refer to the system components defined in the paper.

A scenario illustrating the use of an interactive computer-mediated animation system

To illustrate the process of animation in an interactive computer graphics environment, we present a scenario. The example, chosen for its simplicity, is an extended version of one actually executed with the GENEralized-cel animation SYStem. GENESYS is a picture-driven animation system implemented on the MIT Lincoln Laboratory TX-2 computer. All capabilities purported to it are operational or could be made so by minor additions. The written form of the interactive dialogue has been adjusted to increase its clarity.

We want to see a dynamic sequence of a dog dashing to his dinner and then dining: The dog runs towards a bowl. Wagging his tail, he lowers his head and laps up the milk. Several slurps of the milk are to be shown before we cut to the next scene.

How we do it:

ANIMATOR(A): CALL GENESYS;
GENESYS(G): HELLO. GENESYS AWAITS YOUR CREATION;
>

GENESYS either types or displays this response.

A: FORMMOVIE DINNERTIME;
The animator either types the command name FORMMOVIE, hits a corresponding light-button with the stylus, or writes an abbreviation of the command name to a character-recognizer [26]. He then types a movie name, DINNERTIME.
G: FRESH;
No such movie exists in the animators directory. Hence, work begins on a totally new one.
A: FORMBACKGROUND;
A wants to define a subpicture that will be visible in all frames of the sequence.
G: SKETCH IT,MAN;
A:
A sketches the bowl, drawing with the stylus on the tablet. What he draws appears immediately on the display scope.
G: OK;
A: FORMCEL #1 in CLASS BODY
An initial version of the dog's body is to be made a unique subpicture, or cel.
A sketches it, and soon adds one version of the legs, tail, and head, each as a unique cel in a unique cel class. Now, a coherent dog, unmoving, appears on the scope.
A: BIND BODY, LEGS, TAIL, HEAD, TONGUE;
This guarantees that any translational motion applied to the dog will drive the body, legs, tail, bead. and tongue together. Thus the dog won't d disintegrate while moving.
G: OK;
A: SKETCHPCURVE BODY;
A now sketches the path of the desired motion, mimicking the movement with the action of his stylus. Hop . . .hop . . . hophophop . . . goes his hand. The act of mimicking a continuous movement is called a p-curve.
A: PLAYBACK;
Playback the current version of the movie. Hop ... hop ... hophophop ... glides the rigid dog across the scope towards the bowl. Four frames from such a motion are shown superimposed in Figure 3.
Figure 3 - A static dog glides towards a bowl. The sketches are by Mrs Nancy Johnson of Waltham, Massachusetts
A: FORMCELL #2 in CLASS LEGS:
A sketches the legs in another position, that is, he defines the second cel in the class LEGS. This may be followed by several more positions. The images are ones that are useful in synthesizing running and hopping movements.
A: TYPESELECTIONS from LEGS
A types in a sequence of choices of one of the positions of the legs. Each succeeding choice selects which cel is to be displayed as the dog's legs in the next frame. Of course only one set of legs ie visible in a frame.
A: PLAYBACK;
Now, as is portrayed in Figure 4, the legs move while the dog hops to the bowl
Figure 4 - Now the dog hops to the bowl
Further refinements to the leg motion are made. This includes the resketching of one cel The tail and head movements are similarly introduced. The sequence then appears as is shown in Figure 5.
Figure 5 - Eager for dinner, he wags his tail
Three tongue cels are sketched
A: TYPESELECTIONS from TONGUE
For most of the sequence, the zeroth tongue is selected, that is, no tongue is visible. A single lap, or slurp of the tongue is synthesized from the three tongue - positions, and is introduced at the appropriate time in the movie. The leftmost image of Figure 6 shows the extended tongue.
Figure 6 - Slurp goes his tongue, lapping up the milk
A: TAPRHYTHM SLURPINTERVALS;
A can feel or intuit the rhythm of the desired slurps better than he can rationalise it. Hence he goes tap . . . tap . . . taptap . . . on a push-button.
A: REPEATPATTERN FROM frame 59 THROUGH frame 64 of SELECTIONS from TONGUE at INTERVALS of SLURPINTERVALS;
Assume that the visual slurp occurs in frames 59 through 64. The pattern of tongue selections which yields the slurp is repeated at intervals determined by the tapped rhythm.
A: PLAYBACK;
Now the dog goes hop ... hop ... hophophop ... slurp ... slurp .... slurpslurp.
The movie is essentially complete; minor refinements may now be made.
A: EDIT X WAVEFORM of BODY;
More acceleration in the hopping movement would better portray the dog's eagerness for his dinner. Hence, A calls forth a display of the dog's X coordinate versus time, and resketches part of the waveform so that there is more horizontal acceleration.
A: EDIT FRAME 44;
Assume that the dog reaches the bowl in frame 44. Viewing the sequence in slow motion, A notices that the dog's position. at the bowl could be improved. He alters its location in frame 44 using the knobs under the scope.
A: FIX X and Y of BODY AFTER frame 44;
The path descriptions are further modified so that the dog again holds a fixed position, once it has reached the bowl.
A: PLAYBACK;
A: SAVE DINNERTIME;
The movie is saved, available for further refinements at any time.
G: DINNERTIME IS SAVED. GOOD BYE.
Implications of the scenario
  1. Approximately 100 frames have been generated from fewer than 20 cels. Only very limited tools have been used in cel construction, specifically, programs that accept direct sketches and that enable selective erasure of picture parts. Nonetheless, great power results from the animator's ability to control and evaluate dynamic combinations of a few static images.
  2. Immediate playback enables interactive experimentation to achieve desired visual effects. The actions described above, including considerable trial-and-error, may be completed in well under one hour, even if all cels must be constructed anew.
  3. A variety of static images, analytical graphs of picture action, depict the time dependence of dynamic picture parameters. An example is the waveform representing the dog's changing horizontal position. Viewing such static representations aids the understanding of existing animation sequences; resketching or editing them changes the actual dynamic behavior accordingly.
  4. The animator may in real time mimic aspects of dynamic bebavior. His movement and rhythm are recorded by the system for application in the movie. This occurs when the hopping of his stylus motion is used to drive the dog, and when the tapping of a push-button is used to determine the rhythm of the slurps of the tongue.
  5. Three aspects of dynamic behavior appear in the example: path descriptions, or conceptually continuous coordinate changes; selection descriptions or recurring choices of cels from a cel class: and rhythm descriptions, or temporal patterns marking events. The pictures (3) and actions (4), through which direct control over dynamics is exercised, are representations of these three kinds of global descriptions of dynamics.
  6. Global operations (3)-(4), which alter dynamic behavior over entire intervals of time, may be supplemented where necessary by local operations, which adjust individual frames. An example is the positioning of the dog near the bowl.

The specification of picture dynamics

Three old approaches to the definition of picture dynamics

We may distinguish three old approaches to the synthesis of a sequence of frames:

  1. The individual construction of each frame in the sequence;
  2. The interpolation of sequences of frames intermediate to pairs of critical frames; and,
  3. The generation of frames from an algorithmic description of the sequence.

Animation sequences have traditionally been synthesized through the individual construction of frames. The illusion of a continuum of time is attained through rapid playback of discrete instants of time. This approach is the only one applicable to the construction of pictures that defy regular or formal description, and that require unique operations on each frame. Yet the cost is excessive and continues to rise dramatically, faster than the GNP[27]. Salaries in large studio operations typically consume half of the cost, for commercial animation is a complex interaction among producers, directors, designers, layout artists, background artists, key animators, assistant animators, inkers and colourists, checkers, cameramen, editors, and studio managers[2]. It is this division of labor, this dispersal of the creative process, which separates the artist from the medium[27]. Another major weakness of conventional frame-by-frame animation is that there are no efficient methods of making changes to a movie stored on photographic film or video tape. We discuss elsewhere what role the computer might assume in frame-by-frame animation [28].

The technique of interpolation has long been used to cut Costs and reduce the burden of picture construction which is placed on the key animator. Interpolation occurs when the key animator asks his assistants to fill in the pictures intermediate to a pair of critical frames. It has been suggested that part of this process could be mechanized[29]. We do not consider further that problem in this paper.

The generation of a sequence of frames from a formal algorithmic description is a process characterized by:

  1. the need to use a computer, for it is the only animation medium which can follow and execute with ease a complex algorithm;
  2. generality, that is, applicability to a large class of regularly-structured pictures;
  3. representational power, or the compactness with which interesting animated displays may be formulated; and,
  4. flexibility and adaptability, or the ease with which a variety of alterations may be made to a movie expressed as an algorithm.

The form of the expression has to this date been a written program in a picture-processing language such as BEFLIX [3][4] or a sequence of directives in a typewriter-controlled command language such as CAFE[30]. Herein lies another strength of the approach and also a fundamental weakness. On the one hand, many programmers, scientists, and engineers, previously not animators but fluent in this new language, can now produce dynamic displays[31]. On the other hand, an animator trained in traditional media and techniques is forced to learn a completely new language, a completely new way of thinking.

One new approach to the definition of dynamics - picture-driven animation

Picture-driven animation is a new process that augments harmoniously the animator's traditional techniques, that reflects and extends the ways of thinking to which he is accustomed. Within his intuitive language of pictures and sketching and mimicking, he may synthesize both components of frames, called cels, and generative descriptions of extended picture change, called global descriptions of dynamics.

Global dynamic descriptions are data sequences, whose successive elements determine critical parameters in successive frames of the movie. Algorithms embedded in a picture-driven animation system combine cels and dynamic descriptions to produce visible picture change. The animator defines and refines pictorial representations of dynamic descriptions. These data sequences then drive the algorithms to generate an animated display. Hence the process is called picture-driven animation.

The process is powerful because it is easy to achieve rich variations in dynamic behavior by altering the data sequences while holding constant a few simple controlling algorithms. The data sequences precisely determine the evolution of recurring picture change, within the constraints set by a choice of controlling algorithms.

We next introduce the three kinds of global dynamic descriptions, some useful algorithms for which they may be driving functions, and some useful methods for their static and dynamic pictorial representation and construction. The following classification will be helpful:

A global dynamic description is either
  a movement description, which is either
     a continuous movement description = a path description, or
     a discrete   movement description = a selection description; or
  a rhythm description.

Path descriptions

Consider those alterations of static pictures that consist of modifications of continuously variable parameters, such as location, size, and intensity. Their instantaneous values determine the picture's appearance at a given moment. Thus the static picture may be animated by specifying the temporal behavior of such parameters. A representation of the temporal behavior of a continuously variable parameter is called a path decription.

The movement of a fixed-geometry picture (cel) in GENESYS is described as the change of two coordinates with time, and is represented by a pair of path descriptions. Their specification may be used to synthesize the drifting of a cloud, the zooming of a flying saucer, the bouncing of a ball, or the positioning of a pointer.

Since the behavioral descriptions of the parameters apply to entire intervals of time, the animation is liberated from a strictly frame-by-frame synthesis. The computer is a medium through which one can bypass the static or temporally local and work directly on the dynamic or temporally global. Movement is represented as it is perceived, as (potentially) continuous flow, rather than as a series of intermediate states.

Path descriptions, in fact, all dynamic descriptions, may be defined by one of six general approaches:

  1. The sketching of a new pictorial representation of the description;
  2. The editing or refinement of an existing pictorial representation of the description;
  3. The direct algorithmic specification of the data sequence;
  4. The indirect algorithmic specification in terms of existing data sequences;
  5. An indirect algorithmic specification as a property of a constituent picture in an existing animation sequence; and,
  6. A coupling to a real physical process in the external world, such that it transmits a data sequence as (analog) input to the computer. Interesting couplings may be to particle collisions, the atmospheric pressure, or, in the case of (1) and (2), a real live animator.

We shall in this paper be concerned with techniques implementing the first two approaches only. Sketching is useful when one knows the general shape and quality of a motion rather than an analytical expression for a function that determines it. Modifications of the sketches are frequently invoked after one views the current animation sequence and determines how it is inadequate.

There are two related kinds of pictorial representations of all movement descriptions, static and dynamic. Both kinds may be introduced with a single example.

Consider the motion of a figure that goes from one corner of a square room to the diagonally opposite corner by walking along two adjacent walls. We shall ignore the vertical movement and consider only motion of the center of the body in the two dimensions of the plane of the ground. He first walks in the direction of increasing X coordinate, then in the direction of increasing Y coordinate. We further assume that he begins from a standstill, accelerates and then decelerates to the first corner, pauses there for a brief interval while he turns in place, and finally accelerates and decelerates to his destination.

One complete description of this planar movement consists of the functions of the X and Y coordinates versus time. These are depicted in Figures 7 and 8.

Figure 7 - The X coordinate waveform of a movement
Figure 8 - The Y coordinate waveform of a movement

Such representations of changing picture parameters are called waveforms. Time is depicted, in the waveform, along one spatial dimension. The waveform's construction requires movement of the stylus along that dimension; the display records and makes tangible this movement.

Alternatively, both spatial coordinates could denote the two spatial coordinates of the movement. A natural correspondence is established between the X(Y) coordinate of the floor and X(Y) coordinate of the medium of the representation (paper, scope face, etc.). Figure 9 depicts such a parametric curve representation of the movement. It illustrates with clarity the figure's path on the floor.

Figure 9 - A parametric curve representation of the same movement. The rhythm of the movement is not visible.

Yet the dynamics of the motion are hidden because the temporal dimension is only an implicit coordinate. This is rectified in Figure 10. A stream of symbols is used instead of a continuous trail to depict the path. Characters are spaced along the path at short, uniform intervals of time, such as every 24th of a second. Dynamics are apparent in the local density of symbols. Observe in particular how they cluster where the figure pauses.

Figure 10 - A better display of the parametric curve. Symbols are deposited at short uniform intervals of time.

The dynamic construction of a path description is a user-driven animated display in which the timing of the stylus's movement is preserved by recording its position in every frame. A tangible representation of the stylus path is the display of a sequence of characters spaced equally in time. We shall call a parametric curve dynamically sketched in real time a p-curve. The p-curve corresponding to Figures 7-10 is depicted in Figure 11. We have attempted to convey in a single static image that the p-curve is a dynamic display. Each 2-dimensional p-curve determines two path descriptions. Thus the hopping of the dog in DINNERTIME may be synthesized by hopping with the stylus along some path on the tablet surface, that is by mimicking the desired dynamic.

Figure 11 - The p-curve corresponding to Figures 7-10. The dynamic display is compressed into a single static picture containing nine selected frames.

In some cases one may need only one of the path descriptions. To depict the fluttering of a heart, we may assign the X coordinate of the p-curve to a parameter determining the size of the heart, and then flutter the pen back and forth horizontally. Any vertical motion that results is uninteresting and can be ignored.

A path description, in summary, defines dynamic activity that consists of potentially continuous and arbitrarily fine alterations of value. The reader should not be misled by the choice of the word path. What is meant is a path, or sequence of values, through an arbitrary continuous space, through a mathematical continuum. One application or interpretation of this path is the representation of a movement through the location-space of an object, such as a figure's path through a room. This interpretation, however, is not the only possible one. Depending upon the picture description capability of the system in which it is used, and the algorithm which it drives, a path description may determine changing locations, inteneities, thicknesses, densities, or texture gradients. For example, a pulsating heart could be animated by varying either the size or the intensity of a single heart shape.

Reference 28 presents a detailed discussion of the relative strengths and weaknesses of waveforms, p-curves, and other static and dynamic representations of continuous movement. The discussion focuses on their uses as inputs of dynamics and as visual feedback to the animator, their dimensionality, their role in guiding temporal and spatial adjustments to existing motions, their capacity for conceptual extensions, and some practical problems (and solutions) that arise in the sketching process. Furthemore, we describe four kinds of editing and refining capabilities, operations for:

  1. scaling curves;
  2. shaping and reshaping them;
  3. algebraically and logically combining them; and,
  4. performing pattern scanning, matching, and transforming functions upon them.

Selecting descriptions

Consider the algorithm that selects an element of the current frame from among members of a cel class. A good example arises in the synthesis of different facial expressions through the abstraction of discrete shapes and positions of mouth, nose, eyeballs, and eyebrows. One cel cless could consist of the two members eyebrows raised and eyebrows lowered. An animation sequence may be achieved by a temporal concatenation of selections from a cel class. A changing facial expression may be achieved by the parallel application of several such sequences of selections, one corresponding to each facial component. In DINNERTIME this technique was used to synthesize the movement of the dog's legs, tail, head, and tongue.

A representation of the dynamic selection from a finite set of alternative pictures is an example of the second type of global dynamic description and is called a selection description. The synthesis of selection descriptions is also aided by the use of pictorial representations, such as one consisting of a sequence of steps, where the length of each step is an integer multiple of frames, and the height is limited to transitions to and from positions on a discrete scale. Such pictures appear at the top of Figures 15 and 20. Superposition on a common time axis of pictures of several descriptions facilitates coordinating the counterpoint of the parallel selection strands.

The use of the term selection implies that a mechanism chooses from among a designated set of alternatives. In the previous examples the alternatives are cels, images to be introduced as components of frames in a dynamic sequence. A more general view of a selection description regards it as a sequence of selectors, functions which choose from a designated and finite yet potentially denumerable set of alternatives. Depending upon the picture description capability of the system in which it is used, and the algorithm which it drives, a selection description may choose among alternatives that are subpictures, data, picture-generating algorithms, other global dynamic descriptions, pictorial events or activities, or strands of dynamic activity. For example, the dynamic selection from among alternative picture-generating algorithms would be useful in a system with discrete texture choices, where there is one algorithm capable of filling an arbitrary region with that texture.

Further details may be found in reference 28, which also discusses techniques for the definition and editing of selection descriptions. These are conceptually similar to those used in the synthesis of path descriptions.

Rhythm descriptions

Rhythm descriptions consist of sequences of instants of display time (frames), or intervals between frames. They define patterns of triggering or pacing recurring events or extended picture change. In this context it is suggestive to think of a rhythm description as a pulse train. Each pulse may trigger the same action, or, as is discussed in reference 28, it may trigger one of several activities under the control of a selection description.

Rhythm descriptions facilitate the achievement of coordination and synchrony among parallel strands of dynamic activity. In this context it is suggestive to think of a rhythm description as a sequence of event markers. The marking sequence may be defined with respect to one pictorial subsequence, and then used to guide the construction of another subsequence.

A rhythm description cannot by itself define picture change; it can define a beat, a sequence of cues with respect to which picture change is temporally organized and reorganized. Animators have sometimes used metronomes as generators of rhythm descriptions[2]. Proper synchronization of a sound track to the visual part of a film is most critical. to its success[2].

Hence, rhythm descriptions marking critical instants of time play a key role in the synthesis and editing of movement descriptions. For these operations a rhythm description requires pictorial representation. In Figure 20 it is depicted both as a static pulse train and as a sequence of event markers along the axis of movie time. A direct and simple dynamic input, as we have seen in DINNERTIME, consists of tapping out the rhythm on a push-button.

Dynamic hierarchies

It is easy to conceive of more complex and useful couplings of global dynamic descriptions. Suppose, for example, that a hop, a skip, and a jump have each been synthesized with the aid of several path and selection descriptions. If the animator wishes to experiment with varying dynamic patterns of hop, skip, and jump, he should be able to define a selection description which chooses among these three alternatives. This is equivalent to defining selections among sets of path and selection descriptions. Reference 28 discusses the use of selection descriptions to establish arbitrary hierarchies of structured dynamic behavior, and illustrates the significance of this capability to the animator.

Exploratory studies in interactive computer-mediated animation

Three special-purpose picture-driven animation systems have been implemented on the MIT Lincoln Laboratory TX-2 computer. A common feature is that each has a construction or editing mode, a playback or viewing mode, and a filming mode. In the first mode the animator may begin work on new pictures and global dynamic descriptions, or may recall and continue the construction of pictures and descriptions saved from other sessions. Algorithms embedded in the systems then compute TX-2 display files, in which sequences of frames composed of points, lines, and conic sections are encoded for use by the scopes.

These image files are passed to the playback program, which simulates a variable-speed, bi-directional, video tape recorder. The program normally sequences through the display file representation of successive frames, making each in tum visible for 1/24th of a second. One useful option is that of automatic cycling or the simulation of a tape loop.

When the animator has prepared a satisfactory sequence, he need no longer view it directly on the scope, but may instead want to record it on film. A pin-registered movie camera can be mounted in a light-tight box to a TX-2 scope. Its shutter is always open. The filming program (a variant of the playback program) paints an image on the scope. After a sufficient time interval to allow the decay of the phosphor, approximately 1/5 of a second, a signal from the computer advances the camera. A return signal upon the completion of the advance triggers the display of the next frame. The camera can be operated on one scope while we work at a tablet with another scope. Excellent film quality, with high contrast and low jitter, can be produced with the system.

The first two systems are very special-purpose. ADAM allows one to animate a crude line-drawing representation of a single human figure. EVE is an exercise in abstract dynamic art, in which one can animate a set of points linked by rubber-band straight lines. The animation technique in both cases is the specification, via waveforms and p-curves, of the seventeen path descriptions that define the temporal behavior of the picture's seventeen controlling continuous parameters. A lengthy discussion may be found in reference 28; we shall here content ourselves with three observations:

  1. Clocked hand-drawn dynamics, or the dynamic mimicking of animated behavior, produce life-like, energetic movements, even if used in ADAM to yield stick figure motions that are obviously not physically realizable, and even if used in EVE to yield abstract motions.
  2. Slight modifications to a waveform result in significant alterations to the character of an extended interval of a movement. For example, ADAM's normal walk can be made into a jaunty saunter by the addition of more bounce to the vertical coordinate path description, or can be made effeminate by increasing the scale of the oscillations of the hip's rotational coordinate path description.
  3. Even in a system whose only intended application is cartooning, a dynamic mimicking capability must be augmented by an editing capability, for many motions cannot be mimicked or only so with difficulty, being purposeful exaggerations of real movements.

Although GENESYS is also a special-purpose animation system, it is versatile enough to be used in the generation of a broad class of dynamic images. The term generalized-cel, defined in reference 28, is a generalization of the concept of cel class illustrated in that its appearance in a given frame of the final dynamic display is determined by the values of a set of associated movement descriptions, both continuous and discrete.

The GENESYS animator may sketch, erase, copy, translate, rotate, and scale individual cels consisting of points, straight lines, and conic sections. He may sketch p-curves and dynamically tap rhythm descriptions. There are numerous tools for the manipulation of static representations of dynamic descriptions. Several individuals with varying degrees of artistic skill and training in animation have constructed short cartoon sequences with the aid of GENESYS. Figures 12-20 illustrate some of these experiences.

Figure 12 - This picture, drawn by the author, illustrates the variety of line and texture that may be included in a GENESYS cel as of December, 1968 Free-hand sketches are portrayed by points spaced at an arbitrary, user-controlled density. Straight lines ran be solid or can he dotted. over the same range of densities, Sections of circles, ellipses, parabolas, and regular polygons may be included. Arbitrary sub-pictures may he copied, translated, rotated, and scaled along two independent dimensions
Figure 13 - A parametric curve, the final frame of a p-curve, defining a movement that is life-like and energetic, smooth and graceful. Observe how points cluster at pauses in the motion
Figure 14 - The crocodiless cavorts across the screen, delighted at her recent creation on the TX-2 console. The artistt, Miss Barbara Koppel of Chicago, had little animation experience, no computer experience, a brief introduction to GENESYS, and assistance in using it from the author.
Figure 15 - The four selection deseriptions generate the movements of the jaws, tail, Ier, and body of the crocodiless. Her translational motion is defined by the two path descriptions below. The oscillatory waveform is the vertical coordinate; the waveform sloping downward the horizontal coordinate
Figure 16 - The 1st, 7th, 13th, and 19th frames of the take-off of a bird are shown. The figure is superimposed on the parametric curve which defines its path through space. Mrs. Johnson has mimicked the motion by sketching the p-curve: the bird then reproduces this movement. Observe the switching among discrete shapes and positions of its eye. wing, and feet
#
Figure 17 - All cels used by Mrs. Johnson in the animation of Oopy - he flaps his ear, winks, and sticks out his tongue - are shown first. The second is GENESYS in frame mode, in which the current state of a particular frame is displayed. Also visible are light-buttons representing Cel classes (mouth, tongue, eye, ear, brow). The animator may alter the current frame, switching the selection of a cel from a class by pointing at it, or by changing its position by turning knobs located under the scope. The underlying movement descriptions are automatically selected by GENESYS
Figure 18 - A short cartoon -what the viewer sees: A man, tripping blithely along, kicks a dog lying in his path. The dog rises and trots off to the right (shown above). It then returns, teeth hared (shown in Figure 19), and bites the man. The man jumps and runs away. The dog first follows, then returns once again to rest. The duration of the sequence is approximately 20 seconds
Figure 19 - A short cartoon - how it was made: Mr Ephraim Cohen of Orange, New Jersey, a mathematician and programmer who is also a skilled caricaturist, completed the cels for his cartoon one week-end afternoon at the TX-2. The system then crashed, and he was forced to return home. He sent through the mail four selection descriptions, to choose cels from the classes "man's head", "man's legs", "dog's head", and "dog's body", and two path descriptions, to drive horizontally the man and the dog. The author input the dynamic descriptions, viewed the result, and then refined the movie by several iterations of editing the descriptions and viewing the sequence
Figure 20 - A short cartoon-why it works: The dynamic descriptions defining Mr. Cohen's cartoon as of January, 1969, are shown above. The selection descriptions, from top to bottom, belong to the man's head, the man's legs, the dog's head, and the dog's body. There are 4, 8, 8, and 4 cels in each class, respectively. The two waveforms represent the changes with time of the horizontal coordinates of the man and the dog

Conclusion - the representation of dynamic information - The concept of a picture

Thus the essence of picture-driven animation is:

  1. that there exists a set of abstractions of dynamic information, data sequences which drive algorithms to produce animated displays; and,
  2. that these abstractions may in turn be modeled, generated, and modified by static as well as animated pictures, modeled in the sense that the picture structure represents the data sequence, generated and modified in the sense that the picture represents the process of synthesis as well.

The three kinds of descriptions constitute a rich, expressive, intuitively meaningful vocabulary for dynamics. Each type abstracts an important category of dynamic behavior-flow and continuous change (path descriptions), switching and repetitive choice (selection descriptions), and rhythm and synchrony (rhythm descriptions), The vocabulary is economical, flexible, and general in the sense that it can characterize the dynamic similarities that exist in seemingly diverse animation sequences.

The use of dynamic descriptions couples picture definition by sketching and by algorithm; it furthermore allows both local (of the individual frame) and global (for an interval of time) control over dynamics. We have chosen to stress the latter and adopted the term global dynamic description, for it is the capacity for global control that results uniquely from the use of the computer as an animation medium. Yet a dynamic description is not only a representation over an interval, but a sequence of single elements whose modification also provides local control over individual frames Both local and global control are vital to the successful synthesis of movement. He who accidentally crashes into a wall while running from the police is going from the continuous to the discrete, from a global motion to a local event. He who aims to scale the wall is interpolating the continuous between the discrete, adjusting the global to fit the constraints of the local.

The naturalness and power of the vocabulary is increased by the ability to manipulate it in an interactive graphics environment. There exist, for each kind of data sequence, static pictorial representations such as the waveform which provide a global view of and facilitate precision control of the temporal behavior implied by the sequences. There exist, for each kind of data sequence, methods of dynamic specification such as the clocked sketching of parametric curves which allow the animator's sense of time to be transmitted directly through the medium of the computer into the animated display.

We use the term global dynamic description and the names of the three types somewhat loosely in referring both to the underlying dynamic data sequences and to their corresponding pictorial representations. The imprecision is purposeful, for it is very significant that, in an interactive graphics environment, one can easily traverse in either direction any leg ot the triangle {Dynamic Data Sequence, Static Pictorial Representation, Dynamic Pictorial Representation}. What results is an important plasticity in the representation of dynamics. Characterizations of change can be manipulated (shifted, stretched, superimposed, ... ) within and between the domains of the static and the dynamic.

Several animation sequences can readily be related, coordinated, or unified, regardless of whether or not they ever occur concurrently. Dynamic behavior (data) can readily be transferred from one animation subsequence (including the animator) to another, from one mode of representation or embodiment in a picture to another.

Our concept of a picture is a broad one, and purposely so. For as we stress in reference 28, a computer-mediated picture is not only what is visible but what is contained in its model in the computer system. And the system, i.e., an interactive animation system, includes not only disks and core but an animator and perhaps an ongoing physics experiment as well as a tape-recorded speech. This system evolves continually through real time. Occasionally there occurs a particular reorganization of the system which results in the transfer of information from the animator to the pictorial data base, or in a computation on the data base which results in a sequence of visual images (i.e., data directly convertible by hardware into visual images). Thus, as we have stressed before, the act of mimicking dynamics is a (user-driven) dynamic picture. This unification of the concepts of picture and action is important.

The greater is the number and generality of available models of pictures and of processes of picture construction, the more flexible and powerful is the animation system in its ability to deal with dynamic information. The design of a multi-purpose, open-ended animation language that, allows the animator himself to synthesize new models is outlined in reference 28. With such a language one can describe arbitrary action-picture interpreters that extract movement descriptions from the animator's use of system devices and transform them and existing static and dynamic displays into new static and dynamic displays.

Finally, the use of dynamic descriptions helps establish a conceptual framework which facilitates efficient use of the resources of the animation system: animator, software, and hardware. For details, we again refer the reader to reference 28.

Extensions, applications, implications

This paper is a pointer to a March, 1969, Ph.D. dissertation[reference 28] which includes the material contained herein considerably expanded, some suggestions for future research, and

  1. There is a discussion of major difficulties in implementing systems embodying these ideas, with thoughts on the criteria supporting subsystems (both hardware and software) should satisfy to facilitate interactive computer-mediated animation. The environment in which current implementations exist is described in another paper being delivered at this conference.
  2. There is a lengthy outline of a proposed design of an Animation and Picture Processing Language. APPL is a multi-purpose, open-ended interactive animation programming language, through which the animator may also exercise algorithmic control over a dynamic display. The language will contain quasi-parallel flow of program control, a data structure that is a generalization of all hierarchic ordered data representations, an extensible class of picture descriptors, and a formalism which models the animator's dynamics as it models the dynamics of any picture, that is, as an integral component of animated system behavior. A major design goal is plasticity in the representation of dynamic information and flexibility in the techniques and conventions with which the animator interacts with the system. It has been verified on paper that a language containing these features can gracefully be used to construct dynamic displays, to build system tools that aid the construction process, and to implement special-purpose interactive computer-mediated animation systems.
  3. Finally, there is a description of potential applications of this work in education, psychology, psychiatry, and the arts. In another paper being delivered at this conference, Huggins and Entwisle eloquently describe the role of computer animation in fulfilling the great untapped potential of iconic modes of communication and instruction, in producing visual images that in their ability to communicate ideas are superior to traditional graphical images on paper or blackboard [33]. Instead of static images, words, and mathematical symbols, they suggest, we may create dynamic signs that move about and develop in self-explanatory ways to express abstract relations and concepts. ...A dynamic dimension is now available that requires the invention and development of new conventions and a visual syntax appropriate to this new medium if it is to be fully used for communication and education. May the ideas in our paper contribute towards this goal.

With respect to the arts, we conclude by repeating McLaren's description of animation:

Animation is not the art of DRAWINGS-that-move but the art of MOVEMENTS-that-are-drawn.

What happens between each frame is more important than what exists on each frame

.

Animation is therefore the art of manipulating the invisible interstices that lie between frames. The interstices are the bones, flesh and blood of the movie, what is on each frame, merely the clothing.

This paper may be regarded as a report on a use of the computer in the art of MOVEMENTS-that are-drawn, in the manipulation of the invisible interstices that lie between frames.

ACKNOWLEDGEMENTS

The encouragement, counsel, and insight of the dissertation's mentor, Professor Edward L. Glaser of Case Western Reserve University, and of Dr. William R. Sutherland of MIT Lincoln Laboratory, Professor Murray Eden of MIT, and Mr Eric Martin of Harvard University and Cambridge Design Group, Inc. are gratefully acknowledged. We appreciate the support of numerous individuals, here nameless but not forgotten, many in the Digital Computers Group of MIT Lincoln Laboratory, who have contributed to the progress of this research.

REFERENCES

1 R STEPHENSON Animation in the cinema A Zwemmer Limited London A S Barnes and Co, New York 1967

2 J HALAS R MANVELL The technique of film animation Hastings House New York 1959

3 K C KNOWLTON A computer technique for producing animated movies Proc SJCC 1964

4 K C KNOWLTON A computer technique for the production of animated movies Bell Telephone Laboratories Film

5 The human use of computing machines Bell Telephone Laboratories Symposium June 20-21 1966

6 Conference on Computer Animation Education Development Center Newton Mass July 17-18 1967

7 Proceedings of the 1967 UAIDE Annual Meeting

8 Proceedings of the Fall Joint Computer Conference 1968

9 F W SINDEN Force, mass, and motion Bell Telephone Laboratories Film

10 J L SCHWARTZ E F TAYLOR Computer displays in the teaching of physics Proc FJCC 1968

11 MIT SCIENCE TEACHING CENTER Scattering in one dimension Film available on loan from the Atomic Energy Commission

12 C LEVINTHAL Molecular model-building by computer Scientific American Vol 214 No 6.June 1966

13 C LEVINTHAL Computer construcion and display of molecular models Film

14 E E ZAJAC Computer-made perspective movies as a scientific and communication tool Comm ACM Vol 7 No 3 March 1964

15 E E ZAJAC Two-gyro, gravity gradient attitude control system Bell Telephone Lahoratories Film

16 S VANDERBEECK J H WHITNEY Several animated films made with the aid of a computer

17 Design and the computer Design Quarterly 66/67 Walker Art Center Minneapolis Minn

18 A M NOLL The digital computer as a creative medium IEEE Spectrum October 1967

19 J REICHARDT Cybernetic serendipity, the computer and the arts Studio International London and New York 1968

20 J C R LICKLIDER Man-computer symbiosis Trans IRE PGHFE HFE-1 4 1960

21 I E SUTHEHLAND Sketchpad: a man-machine graphical communication system MIT Lincoln Laboratory Technical Report No 296 Jan 1963 Proc SJCC 1963

22 J C R LICKLIDER Man-computer partnership International Science and Technology May 1965

23 M A DAVIS T O ELLIS The rand tablet: a man-machine communication device Proc FJCC 1964

24 J F TEIXERA R P SALLEN The sylvania data tablet Proc SJCC 1968

25 L G ROBERTS The lincoln wand Proc FJCC 1966

26 J E CURRY A tablet input facility for an interactive graphics system Proc of the International Joint Conference on Artificial Intelligence 1969

27 E MARTIN Private Communication

28 R M BAECKER Interactive computer-mediated animation Ph D Dissertation Department of Electrical Engineering MIT March 1969

29 T MIURA J IWATA J TSUDA An application of hybrid curve generation - cartoon animation by electronic computers Proc SJCC 1967

30 J NOLAN L YARBROUGH An on-line computer drawing and animation system IFIPS 1968

31 W H HUGGINS D R ENTWISLE Exploratory studies of films for engineering education Department of Electrical Engineering The Johns Hopkins University Report to US Office of Education September 1968

32 W R SUTHERLAND J W FORGIE M V MORELLO Graphics in time-sharing: a summary of the TX -2 experience Proc SJCC 1969

33 W H HUGGINS D R ENTWISLE Computer animation for the academic community Proc SJCC 1969

34 N MCLAREN Quotation in animation exhibit in the Canadian Cinematique Pavilion EXPO '68 Montreal National Film Board Canada