12. System Aspects of Low-Cost Bitmapped Displays
David Rosenthal and James Gosling
ABSTRACT
The design of low-cost bitmapped displays is reviewed from the perspective
of the implementers of a window manager for a Unix system. The interactions
between RasterOp hardware, the multi process structure of Unix software, and
the functions of the window manager are discussed in the form of a checklist
of features for hardware designers.
12.1 INTRODUCTION
Chip manufacturers are announcing products designed to improve the performance
and reduce the cost of bitmapped displays. Workstation manufacturers are marketing
systems featuring a Unix operating system and a window manager on such displays. The
window manager has an overwhelming effect on the perceived performance of the
workstation it is running on.
We have recently designed and implemented a window manager (see Chapter 13)
for a 4.2BSD Unix system. It is intended to be easy to port between displays.
It runs on the SUN workstation, and throughout its development we have been
reviewing designs for other displays that are to be used in the future. In
attempting to obtain the best performance from the SUN displays and remain
portable to these others, we have encountered many interactions between display
hardware features and window manager software; what follows is a compilation of
our experiences.
12.2 MODELS
To provide a context for the discussion of these interactions, we set out the
range of hardware and software we are concerned with.
12.2.1 Hardware
Workstation hardware typically consists of a processor, memory management hardware,
memory, and I/O devices including a mouse, keyboard, and monochrome display.
The display may have either:
- pixels visible in the CPU's address space, with RasterOps performed
by the CPU (possibly with special hardware assistance);
- pixels not addressable by the CPU, but manipulated by an autonomous RasterOp
processor, communication with which is via registers or command queues in the
CPU's address space.
12.2.2 Software
This hardware typically runs a form of the Unix operating system,
whose importance for this discussion is that it supports multiprocess interaction.
When multiple interactive processes compete for a finite real screen resource,
arbitration is undertaken by some form of window manager, either:
- a part of the Unix kernel, accessed via special system calls, or
- a special user level process, accessed via inter process communication channels.
The RasterOps affecting the parts of the screen resource allocated to
each client may be performed by:
- the kernel, when the client requests them using system calls;
- the user level window manager process, when the client requests them using
remote procedure calls;
- the client directly, if it has the pixels or RasterOp processor control
registers mapped into its address space.
The goal in all cases is to provide clients with a protected RasterOp,
one that can affect only those pixels allocated to the client.
12.3 CHECKLIST
After providing memory to store the pixels, and a mechanism to generate
the video output, the designer of a low-cost bitmap display is faced with the
question of how much RasterOp support is required. With careful design, at
least for the MC68000 (which can exploit auto-increment addressing), many cases
of monochrome RasterOp are close to the limit set by display memory bandwidth
even if they are done entirely in software. Next-generation microprocessors
typically have barrel shifters, making the alignment shifts of software
RasterOps much faster. If special RasterOp
hardware is to be cost effective compared to a software solution,
the following points should be considered.
- Can user processes operate on the bitmap without system call overhead?
- If special RasterOp hardware is provided, can a client access the bitmap
without using it?
- Can a client given access to the bitmap be prevented from
accessing other I/O devices?
- If special RasterOp hardware is provided, can multiple user processes
access it?
- If the RasterOp hardware is an autonomous processor, does it support
one or many command queues?
- Can the RasterOp hardware be used off-screen?
- Does the RasterOp processor
implement clipping?
- If the display has a colour map, can it be shared between multiple windows?
- Does the display support a cursor?
- Can the display track the mouse autonomously?
- Does the hardware allow non-rectangular RasterOps?
- Can the display draw characters fast?
12.4 DISCUSSION
- Can user processes operate on the bitmap without system call overhead?
- Either the bitmap itself, or the registers and command queues controlling the
RasterOp processor, or both must be capable of appearing in the address space of one
or more user processes. In this way the process can do RasterOps directly; the
cost of a system call (perhaps 0.3ms) per RasterOp is prohibitive.
- If special RasterOp hardware is provided, can a client access the bitmap
without using it?
- The presence of RasterOp hardware does not eliminate the need for the CPU to
access the pixels directly. Unless the function set of the RasterOp processor matches
the application requirements exactly, some display operations will need to be
implemented in software. Inappropriate support that cannot be programmed around
is worse than none.
- Can a client given access to the bitmap be prevented from accessing other I/O devices?
- A corollary of the need for user processes to address the bitmap is that the
system's memory management and protection unit must be capable of controlling
access to the I/O space at a relatively fine grain. It should not be necessary
to trust a graphics process with access to the disc controller hardware.
- If special RasterOp hardware is provided, can multiple user processes access it?
- If user processes can access the RasterOp hardware directly, its internal state
such as source and destination coordinates, function codes, mask bits, and so on must
be regarded as part of the state of a process. They must in general be saved and
restored across context switches; the performance impact of doing so can be severe
(even if hardware permits it).
- The overall impact of saving and restoring the RasterOp processor's state can be
reduced if processes using it can be identified. The analogous problem for floating
point processors has traditionally been solved by initializing them to a state in
which an attempt by the processor to use them will cause an interrupt. At that point
the process is known to use the processor, and can be marked as needing the
extra context. Thus, if the processor has any context to save, either the memory
management unit or the processor itself should be capable of generating an
interrupt on all access attempts.
- If the RasterOp hardware is an autonomous processor, does it support
one or many command queues?
- Designs with autonomous RasterOp processors can reduce the need to save and restore
RasterOp context by implementing several independent command queues and multiplexing
these queues together. The window manager can then assign a queue to each window
and treat them as if they had independent processors. A means to drain the queues
before changing the shape or position of a window will be needed.
- Can the RasterOp hardware be used off-screen?
- Many window managers require RasterOps that operate uniformly on rasters both
on and off the screen. Off-screen rasters are typically in process virtual address
space. If the RasterOp support cannot be applied to these rasters, the window manager
will have to implement a software RasterOp even if it is not used on-screen.
- Does the RasterOp processor implement clipping?
- A major role of a window manager is providing client processes with a
protected RasterOp, that is in ensuring that a client can draw only within
its assigned area of the screen. Thus, much of the window manager's processing
is devoted to clipping. The window manager must apply a clipping rectangle to
all the client output, though within these limits the client may wish to
impose a smaller clipping rectangle. Hardware support for clipping is useful,
but it would be much more useful if it provided both a system clip rectangle
that could not be changed from user mode, and also a user clip rectangle.
Output would be clipped to the intersection of the two. The window manager
would set the system rectangle to the window, and the client would set the
user rectangle as desired. Even better would be clipping to the union of a
set of rectangles for each mode, to permit clipping to partially overlapped
windows.
- If the display has a colour map, can it be shared between multiple windows?
- Just as the pixels are a real resource to be shared among competing clients,
so also are the entries in the colour map. Clients should also be able to use
a number of different pixel values and corresponding colour map entries without
being aware that other windows are also using the colour map. For example, all
windows should be able to use pixel values from 0 to some limit.
- Hardware assistance for this sharing would be useful. The window manager should
be able to specify a number of rectangles, in each of which the relationship
between the pixel value and the colour map entry it selected would be different.
These rectangles might select from a number of independent full-size colour maps,
or provide a base register to be added to the pixel value before the colour
lookup (and perhaps a limit register to truncate the range).
- Does the display support a cursor?
- An essential feature of a window system is a cursor, tracking a pointing device
around the screen. The cursor can be displayed either:
- by temporarily changing some pixels in the bitmap from which the screen is
refreshed, or:
- by mixing the video from two separate bitmaps, one for the screen image and one
for the cursor.
- The first is often preferred for low-cost systems; it requires little or
no extra hardware but can impose significant performance loss. The problem is that
if the cursor affects pixels in the screen bitmap, it must be absent during any RasterOp
that affects those pixels. There are two approaches to ensuring that the cursor
gets removed:
-
The process performing the RasterOp can handshake with the process
maintaining the cursor before every RasterOp. (Ideally, it would do so only when the
source or destination rectangles overlapped the cursor, but this is normally
impracticable.) The cost of the necessary system call on every RasterOp, or at
least on every RasterOp when the cursor is displayed, is very significant,
and the cursor will flicker badly.
This problem recurs in a milder form even if the process performing the RasterOps
is also the process maintaining the cursor. The synchronization cost is less,
but the cursor flicker is still present. It is particularly offensive because
the cursor is typically the focus of the user's attention.
- If the video refresh controller is capable of interrupting when a specified
scanline is reached, the interrupt routine can arrange for this to happen shortly
before the first scanline containing the cursor. It can then put the cursor into
the bitmap, wait at interrupt level until the refresh has passed the cursor, and
then remove it. No user RasterOps can occur while the cursor is in the bitmap, and
the cursor will not flicker, but the cost is the fraction of the CPU corresponding
to the ratio of the cursor height to the screen height.
- Video-mixed cursors are normally regarded as too expensive for low-cost displays,
because they require either a complete second bitmap, or a smaller bitmap plus
extra logic to position the cursor. But their effect on overall performance is
so great that this may be a mistake. They should be considered in the design of
advanced video generator and controller chips.
- Can the display track the mouse autonomously?
- Another major load on the system is tracking the mouse.
Autonomous display hardware sufficiently intelligent to monitor locations
in main memory for the mouse coordinates, and to position the cursor to correspond,
would off-load significant processing. The off-load would be greater if it
supported a clip rectangle for the cursor, and could interrupt if the cursor
tracked across the boundary. Many window systems wish to change the cursor
shape as it tracks across windows, or even regions within windows.
- Does the hardware allow non-rectangular RasterOps?
- Non-rectangular RasterOps such as fill trapezoid can significantly
improve the performance of applications using polygonal graphics, but need careful
implementation. In particular, if they are to abut correctly it must be possible to
save and restore the error parameters of the Bresenham or other algorithms tracing
their edges. This is an example of the need for subpixel addressing, which also
occurs in greyscale and other antialiased applications.
- Can the display draw characters fast?
- The overwhelming majority of RasterOps will paint a character.
The cost of these will be dominated by setup time unless the font contains
very large characters, or the client sends long strings of contiguous characters.
Note that single character RasterOps are important as echos of user typeins.
What you see is what you get (WYSIWYG) editors are major applications for
this class of display, and their performance is dominated by repainting characters
as the user types.
- Thus, the design of RasterOp engines should consider making character-drawing
a special case, so that the hardware understands the font tables, knows the amount
of shim space to add between printing and space characters, and so on. This spreads
the setup time across as many characters as possible.
- Typical windows may contain characters from a large number of fonts;
twelve per window is not uncommon. The stored fonts from which characters are
drawn are frequently stored in the 224 × 1024 pixel off-screen area of
an 800 × 1024 pixel display. Thus, specialized RasterOp hardware can be used to
paint characters even if it can access only the bits in the hardware bitmap.
- Unfortunately, Parkinson's Law shows that this space is insufficient to store all
the required fonts, so that the off-screen space can at best be a cache for active
fonts, and cache misses will occur. The code to manage free space in the font cache,
to gather usage information, and to perform cache reloads on misses is difficult to
write, and imposes disturbing performance irregularities. Font
definitions, and the individual glyphs, are variable-size, adding to the normal problems
of writing a pager.
- If the RasterOp hardware is not restricted to accessing the pixels in the hardware
bitmap, but can access the whole of physical memory (even at reduced bandwidth),
the problem is easier. The font cache can now be larger, stored in wired-down pages
of system physical memory. But it is still only a cache.
- If, however, the hardware support for RasterOp is applicable even in process virtual
address space the fonts can be stored in virtual memory, and the system's pager
can deal with the problem of ensuring that the fonts in use are readily accessible.
Virtual memory RasterOps are normally available only if the RasterOp is implemented
by the CPU, either in software, microcode, or as a processor extension chip.
12.5 CONCLUSION
We have set out a number of points worthy of consideration in the design of
low-cost bitmap displays intended to support multiprocess interaction. Although RasterOp
hardware may appear attractive, it needs careful design if its potential is to be
realized fully, particularly for drawing characters. Assistance with cursor drawing
and sharing of the colour map may be more cost effective uses for limited hardware
resources.
12.6 ACKNOWLEDGEMENTS
This work originates from insightful critiques of some proposed hardware designs
given by Bob Sproull. Bob Sidebotham, Andy Palay, Fred Hansen and Bruce Lucas helped
implement the ITC's window manager.
Anyone attempting to design bitmap displays should read the paper by Pike, Locanthi
and Reiser [52].