Chilton::INF::Window Management

12. System Aspects of Low-Cost Bitmapped Displays

David Rosenthal and James Gosling

ABSTRACT

The design of low-cost bitmapped displays is reviewed from the perspective of the implementers of a window manager for a Unix system. The interactions between RasterOp hardware, the multi process structure of Unix software, and the functions of the window manager are discussed in the form of a checklist of features for hardware designers.

12.1 INTRODUCTION

Chip manufacturers are announcing products designed to improve the performance and reduce the cost of bitmapped displays. Workstation manufacturers are marketing systems featuring a Unix operating system and a window manager on such displays. The window manager has an overwhelming effect on the perceived performance of the workstation it is running on.

We have recently designed and implemented a window manager (see Chapter 13) for a 4.2BSD Unix system. It is intended to be easy to port between displays. It runs on the SUN workstation, and throughout its development we have been reviewing designs for other displays that are to be used in the future. In attempting to obtain the best performance from the SUN displays and remain portable to these others, we have encountered many interactions between display hardware features and window manager software; what follows is a compilation of our experiences.

12.2 MODELS

To provide a context for the discussion of these interactions, we set out the range of hardware and software we are concerned with.

12.2.1 Hardware

Workstation hardware typically consists of a processor, memory management hardware, memory, and I/O devices including a mouse, keyboard, and monochrome display. The display may have either:

pixels visible in the CPU's address space, with RasterOps performed by the CPU (possibly with special hardware assistance);
pixels not addressable by the CPU, but manipulated by an autonomous RasterOp processor, communication with which is via registers or command queues in the CPU's address space.

12.2.2 Software

This hardware typically runs a form of the Unix operating system, whose importance for this discussion is that it supports multiprocess interaction. When multiple interactive processes compete for a finite real screen resource, arbitration is undertaken by some form of window manager, either:

a part of the Unix kernel, accessed via special system calls, or
a special user level process, accessed via inter process communication channels.

The RasterOps affecting the parts of the screen resource allocated to each client may be performed by:

the kernel, when the client requests them using system calls;
the user level window manager process, when the client requests them using remote procedure calls;
the client directly, if it has the pixels or RasterOp processor control registers mapped into its address space.

The goal in all cases is to provide clients with a protected RasterOp, one that can affect only those pixels allocated to the client.

12.3 CHECKLIST

After providing memory to store the pixels, and a mechanism to generate the video output, the designer of a low-cost bitmap display is faced with the question of how much RasterOp support is required. With careful design, at least for the MC68000 (which can exploit auto-increment addressing), many cases of monochrome RasterOp are close to the limit set by display memory bandwidth even if they are done entirely in software. Next-generation microprocessors typically have barrel shifters, making the alignment shifts of software RasterOps much faster. If special RasterOp hardware is to be cost effective compared to a software solution, the following points should be considered.

Can user processes operate on the bitmap without system call overhead?
If special RasterOp hardware is provided, can a client access the bitmap without using it?
Can a client given access to the bitmap be prevented from accessing other I/O devices?
If special RasterOp hardware is provided, can multiple user processes access it?
If the RasterOp hardware is an autonomous processor, does it support one or many command queues?
Can the RasterOp hardware be used off-screen?
Does the RasterOp processor implement clipping?
If the display has a colour map, can it be shared between multiple windows?
Does the display support a cursor?
Can the display track the mouse autonomously?
Does the hardware allow non-rectangular RasterOps?
Can the display draw characters fast?

12.4 DISCUSSION

Can user processes operate on the bitmap without system call overhead?

Either the bitmap itself, or the registers and command queues controlling the RasterOp processor, or both must be capable of appearing in the address space of one or more user processes. In this way the process can do RasterOps directly; the cost of a system call (perhaps 0.3ms) per RasterOp is prohibitive.

If special RasterOp hardware is provided, can a client access the bitmap without using it?

The presence of RasterOp hardware does not eliminate the need for the CPU to access the pixels directly. Unless the function set of the RasterOp processor matches the application requirements exactly, some display operations will need to be implemented in software. Inappropriate support that cannot be programmed around is worse than none.

Can a client given access to the bitmap be prevented from accessing other I/O devices?

A corollary of the need for user processes to address the bitmap is that the system's memory management and protection unit must be capable of controlling access to the I/O space at a relatively fine grain. It should not be necessary to trust a graphics process with access to the disc controller hardware.

If special RasterOp hardware is provided, can multiple user processes access it?

If user processes can access the RasterOp hardware directly, its internal state such as source and destination coordinates, function codes, mask bits, and so on must be regarded as part of the state of a process. They must in general be saved and restored across context switches; the performance impact of doing so can be severe (even if hardware permits it).

The overall impact of saving and restoring the RasterOp processor's state can be reduced if processes using it can be identified. The analogous problem for floating point processors has traditionally been solved by initializing them to a state in which an attempt by the processor to use them will cause an interrupt. At that point the process is known to use the processor, and can be marked as needing the extra context. Thus, if the processor has any context to save, either the memory management unit or the processor itself should be capable of generating an interrupt on all access attempts.

If the RasterOp hardware is an autonomous processor, does it support one or many command queues?

Designs with autonomous RasterOp processors can reduce the need to save and restore RasterOp context by implementing several independent command queues and multiplexing these queues together. The window manager can then assign a queue to each window and treat them as if they had independent processors. A means to drain the queues before changing the shape or position of a window will be needed.

Can the RasterOp hardware be used off-screen?

Many window managers require RasterOps that operate uniformly on rasters both on and off the screen. Off-screen rasters are typically in process virtual address space. If the RasterOp support cannot be applied to these rasters, the window manager will have to implement a software RasterOp even if it is not used on-screen.

Does the RasterOp processor implement clipping?

A major role of a window manager is providing client processes with a protected RasterOp, that is in ensuring that a client can draw only within its assigned area of the screen. Thus, much of the window manager's processing is devoted to clipping. The window manager must apply a clipping rectangle to all the client output, though within these limits the client may wish to impose a smaller clipping rectangle. Hardware support for clipping is useful, but it would be much more useful if it provided both a system clip rectangle that could not be changed from user mode, and also a user clip rectangle. Output would be clipped to the intersection of the two. The window manager would set the system rectangle to the window, and the client would set the user rectangle as desired. Even better would be clipping to the union of a set of rectangles for each mode, to permit clipping to partially overlapped windows.

If the display has a colour map, can it be shared between multiple windows?

Just as the pixels are a real resource to be shared among competing clients, so also are the entries in the colour map. Clients should also be able to use a number of different pixel values and corresponding colour map entries without being aware that other windows are also using the colour map. For example, all windows should be able to use pixel values from 0 to some limit.

Hardware assistance for this sharing would be useful. The window manager should be able to specify a number of rectangles, in each of which the relationship between the pixel value and the colour map entry it selected would be different. These rectangles might select from a number of independent full-size colour maps, or provide a base register to be added to the pixel value before the colour lookup (and perhaps a limit register to truncate the range).

Does the display support a cursor?

An essential feature of a window system is a cursor, tracking a pointing device around the screen. The cursor can be displayed either:

by temporarily changing some pixels in the bitmap from which the screen is refreshed, or:
by mixing the video from two separate bitmaps, one for the screen image and one for the cursor.

The first is often preferred for low-cost systems; it requires little or no extra hardware but can impose significant performance loss. The problem is that if the cursor affects pixels in the screen bitmap, it must be absent during any RasterOp that affects those pixels. There are two approaches to ensuring that the cursor gets removed:

The process performing the RasterOp can handshake with the process maintaining the cursor before every RasterOp. (Ideally, it would do so only when the source or destination rectangles overlapped the cursor, but this is normally impracticable.) The cost of the necessary system call on every RasterOp, or at least on every RasterOp when the cursor is displayed, is very significant, and the cursor will flicker badly.

This problem recurs in a milder form even if the process performing the RasterOps is also the process maintaining the cursor. The synchronization cost is less, but the cursor flicker is still present. It is particularly offensive because the cursor is typically the focus of the user's attention.
If the video refresh controller is capable of interrupting when a specified scanline is reached, the interrupt routine can arrange for this to happen shortly before the first scanline containing the cursor. It can then put the cursor into the bitmap, wait at interrupt level until the refresh has passed the cursor, and then remove it. No user RasterOps can occur while the cursor is in the bitmap, and the cursor will not flicker, but the cost is the fraction of the CPU corresponding to the ratio of the cursor height to the screen height.

Video-mixed cursors are normally regarded as too expensive for low-cost displays, because they require either a complete second bitmap, or a smaller bitmap plus extra logic to position the cursor. But their effect on overall performance is so great that this may be a mistake. They should be considered in the design of advanced video generator and controller chips.

Can the display track the mouse autonomously?

Another major load on the system is tracking the mouse. Autonomous display hardware sufficiently intelligent to monitor locations in main memory for the mouse coordinates, and to position the cursor to correspond, would off-load significant processing. The off-load would be greater if it supported a clip rectangle for the cursor, and could interrupt if the cursor tracked across the boundary. Many window systems wish to change the cursor shape as it tracks across windows, or even regions within windows.

Does the hardware allow non-rectangular RasterOps?

Non-rectangular RasterOps such as fill trapezoid can significantly improve the performance of applications using polygonal graphics, but need careful implementation. In particular, if they are to abut correctly it must be possible to save and restore the error parameters of the Bresenham or other algorithms tracing their edges. This is an example of the need for subpixel addressing, which also occurs in greyscale and other antialiased applications.

Can the display draw characters fast?

The overwhelming majority of RasterOps will paint a character. The cost of these will be dominated by setup time unless the font contains very large characters, or the client sends long strings of contiguous characters. Note that single character RasterOps are important as echos of user typeins. What you see is what you get (WYSIWYG) editors are major applications for this class of display, and their performance is dominated by repainting characters as the user types.

Thus, the design of RasterOp engines should consider making character-drawing a special case, so that the hardware understands the font tables, knows the amount of shim space to add between printing and space characters, and so on. This spreads the setup time across as many characters as possible.

Typical windows may contain characters from a large number of fonts; twelve per window is not uncommon. The stored fonts from which characters are drawn are frequently stored in the 224 × 1024 pixel off-screen area of an 800 × 1024 pixel display. Thus, specialized RasterOp hardware can be used to paint characters even if it can access only the bits in the hardware bitmap.

Unfortunately, Parkinson's Law shows that this space is insufficient to store all the required fonts, so that the off-screen space can at best be a cache for active fonts, and cache misses will occur. The code to manage free space in the font cache, to gather usage information, and to perform cache reloads on misses is difficult to write, and imposes disturbing performance irregularities. Font definitions, and the individual glyphs, are variable-size, adding to the normal problems of writing a pager.

If the RasterOp hardware is not restricted to accessing the pixels in the hardware bitmap, but can access the whole of physical memory (even at reduced bandwidth), the problem is easier. The font cache can now be larger, stored in wired-down pages of system physical memory. But it is still only a cache.

If, however, the hardware support for RasterOp is applicable even in process virtual address space the fonts can be stored in virtual memory, and the system's pager can deal with the problem of ensuring that the fonts in use are readily accessible. Virtual memory RasterOps are normally available only if the RasterOp is implemented by the CPU, either in software, microcode, or as a processor extension chip.

12.5 CONCLUSION

We have set out a number of points worthy of consideration in the design of low-cost bitmap displays intended to support multiprocess interaction. Although RasterOp hardware may appear attractive, it needs careful design if its potential is to be realized fully, particularly for drawing characters. Assistance with cursor drawing and sharing of the colour map may be more cost effective uses for limited hardware resources.

12.6 ACKNOWLEDGEMENTS

This work originates from insightful critiques of some proposed hardware designs given by Bob Sproull. Bob Sidebotham, Andy Palay, Fred Hansen and Bruce Lucas helped implement the ITC's window manager.

Anyone attempting to design bitmap displays should read the paper by Pike, Locanthi and Reiser [52].