Digital Zooming

Roger Nagel, Azriel Rosenfeld

University of Maryland and MAGI

1970

UAIDE

Abstract

Magnifying or demagnifying a digital picture is a nontrivial operation. The gray level to be assigned to each point in the output picture must be computed by examining a neighborhood of the corresponding point in the input picture; this point wise process is time consuming for large pictures. In a zoom sequence, the scale change from frame to frame is small, and great savings in computation time can be achieved. This paper presents an algorithm for digital zooming, and also discusses how to remove some of the artifacts which occur if zooming is performed point by point.

Introduction

The state of the art in computer generated motion pictures is well beyond feasibility studies. The literature abounds with the results of various pilot projects and even contains reports on successful commercial ventures. Computer generated movies began with line drawings on a microfilm plotter and have moved rapidly into full gray scale and color productions. This has been accompanied by a move from the consideration of general feasibility to that of achieving various special effects. In this paper we consider one such effect - the zoom sequence. This problem is interesting and nontrivial because of the need to minimize computing time, and avoid introducing artifacts in the simulation. The desire to have a zoom sequence in a motion picture stems from the idea of being able to simulate the capability of a real camera to move toward or away from a scene.

Analytic zoom

A zoom is a magnification or a change in scale. There exists a simple function (x' = cx, y'= cy) to represent this mapping of points from one frame to another (expanded or contracted) frame. It is important at this point to clarify what is meant by an expanded frame. In any zoom there is always a center of zoom, some point on the picture, not necessarily the physical center. The expanded picture will be physically the same size as the original; however, the information in the picture will be expanded about the center of zoom, and some information in the old picture will not appear in the new picture.

For the class of pictures whose content is limited to line drawings, the zoom function is readily implementable. That is, if the picture information is stored in a data structure rather than in a raster format one can straightforwardly and rapidly perform zooms analytically. The reason for this is that in such pictures the background need not be examined or mapped into the new frame; it always remains invariant under scale change.

For grayscale and color pictures, however, it is necessary to use the raster format to store pictures. This is because each point in the picture can vary in gray level and/or color. The fixed-level background is a rarity which cannot be counted on. A T.V. quality picture requires on the order of a 500 x 500 raster, which is about 250,000 picture points. In this case the analytical method which works on a point by point basis becomes intolerably slow.

In initial studies of the analytic zoom using a Univac 1108 computer, it was determined that the time required to do a 2:1 zoom of a 72 × 72 point picture, lasting 100 frames, was on the order of two minutes. For larger pictures, the time would go up roughly as the picture area. This is clearly very costly.

Before describing the much more economical algorithm in the next section, a few remarks on the analytic zoom are in order. When a picture is stored as a data structure, it can be zoomed by simply adjusting the values of parameters associated with the data; for example, a line segment between two points (x, y) and (x2, y2), represented in the data structure by these two pairs of coordinates, can be expanded by the factor c by replacing these coordinates by the new pairs (cx1, cy1) and (cx2, cy2). (This assumes that the center of the zoom is at the origin.) This simple forward approach cannot be used when the picture is in raster form, since it would leave gaps in the output picture. Instead, we must use an inverse approach, in which a gray level is assigned to each raster point (x', y') of the output picture by mapping it back into the input picture (x = 1/c x', y = 1/c y'). Here (x,y) is no longer necessarily a raster point of the input picture; to pick a gray level for (x', y'), we can do either of two things: (1) assign it the gray level of the raster point nearest to (1/c x',1/c y'), or (2) assign it a gray level which is a weighted average of the gray levels of the four raster points surrounding (1/c x', 1/c y'). These ideas are discussed more fully in [1].

Cracked zoom

On examining a zoom sequence one is immediately struck with the small amount of change which takes place from frame to frame. For example, a 2 to 1 zoom might take place over 100 frames of the motion picture. It in fact turns out that most requirements for motion picture zoom sequences would be satisfied by an algorithm capable only of a maximum change of a few percent per frame.

When a picture represented in raster format is zoomed by a small amount, using the nearest-neighbor scheme described above, groups of points in the picture tend to move in large blocks from their positions in the original picture to positions in the new picture. In fact, for a 1% zoom these blocks are of size 100 × 100, for 5% the block size is 20 × 20 and for 10% it is 10 × 10. An example is in order. Consider the case of a one dimensional zoom:

Original:   ...EEECCCAAAOBBBDDDFFF...  (O is the center of zoom)
Zoomed:  ...EEEECCCCAAAAOBBBBDDDDFFFF...

Here one can think of the zoomed pictured as being constructed by shifting the C's one space to the left, the E's two spaces to the left, the D's one space to the right, the F's two spaces to the right, and so on; the cracks resulting from these unequal shifts are mended by adding a new A, a new c, and so on.

Thus to perform a zoom involving only a small scale change, one can compute the corresponding block size, shift the elements of the picture by blocks, and interpolate to fill the resulting cracks. This process was implemented using the PAX picture processing system [2]. in which shifting operations are quite efficient. It was found simplest for this implementation to treat first the x direction (so that the blocks become vertical strips) and then the y direction (horizontal strips). One can prove that this method does not alter the results. A description of the algorithm as originally implemented follows:

Step 1
Accept as input the amount of zoom (e.g., 1.10 for 10%), center of zoom, and location of input and output pictures.
Step 2
Compute BLOCK FACTOR = 1.0 / (Amount of zoom - 1.0) = N The result here is truncated to greatest integer. For zooms < 10% the resultant error is small.
Step 3
Arrange the strips so that the center of zoom lies at the center of its strip. (This may create narrow strips at the edges of the picture.) Copy each strip into the output picture; then copy its last column (or row) - i.e., that farthest from the center - into the output again. This stretches the strip from a width of N points in old picture to N+1 in the new picture. Repeat until the edge of the picture is reached; then repeat for the opposite direction until the opposite edge is reached.
Step 4
Repeat step 3 in the orthogonal direction. The final result is the desired zoomed frame.

Since the successive frames are supposed to increase linearly in magnification, one cannot produce an exact 2% expansion by performing a 1% expansion twice; rather, one should go back to the original picture and expand it 2%, However, this is evidently costly. Furthermore, after a few steps the zoom factor has become large, and the cracking method has lost its advantages. In a 2:1 zoom sequence, one would like to capitalize on the fact that by the end of the sequence, every row and column has been duplicated exactly once. One can do this by sacrificing the exactness of the intermediate steps. To zoom 2:1 in 100 steps, one can duplicate one row (and column) in every strip on the first step; then duplicate a different one on the second step; and so on until each row and column has been duplicated once, at which point an exact 2:1 expansion has been achieved. The problem now is how to choose the sequence of rows and columns to be duplicated. If this is done in too systematic a fashion, the zoom sequence will appear unnatural, since the eye can spot the periodicity.

Staggered cracked zoom

To solve this problem, an aperiodic method of staggering the cracks is needed. One possibility would have been to use a random selection, but insuring that no column or row is duplicated more than once (in a 2:1 zoom). The method actually used was simpler and still gave a uniform stretching effect over the zoom sequence. The staggered cracks are produced by successive bisection of the strips.

In the case of a 2:1 zoom over 100 frames, the strip width is 100 points. The successive cracks in each strip are chosen at 50; 25; 75; 12; 37; 62; 87; 6; and so on.

This zoom algorithm produced significant savings in computer time, as shown in the table below.

Picture sizeAverage tine per frame (seconds)
Analytic zoomStaggered cracked zoom
72 × 721.20.7
480 × 36026.04.2

The algorithms were implemented in PAX, and were not squeezed for coding efficiency. A Fortran IV machine-independent version of the algorithm is being written; it is expected to yield substantial further savings in computer time.

References

1. E. G. Johnston and A. Rosenfeld, Geometrical operations on digital pictures in B. S. Lipkin and A. Rosenfeld, eds., Picture Processing and Psychopictorics, N. Y.: Academic Press, 1970, 217-240.

2. E. B. Butt and J. W. Snively, Jr. (Revised version edited by E. G. Johnston and Roger Lipsett), The PAX II Picture Processing System, University of Maryland Computer Science Center Technical Report TR-68-67, May 1968 (Revised September 1969).