Chilton::ACD::Methodology in Interaction

10.11 The Structure of Interactive Command Languages

James D. Foley

Department of Electrical Engineering and Computer Science

The George Washington University, Washington, D.C. 20052

Partially sponsored by the Committee on Research, George Washington University

1 Introduction

We are interested in obtaining a better understanding of man-machine interaction, to understand the methodologies relevant to such interactions. To do this it is necessary to organize, to impose structure on, this interaction. Interaction with a computer is typically described as a conversation between user and computer, which leads to the notion that a language is used in the conversation. The purpose of this paper is to describe the hierarchical structure of interactive command languages, to help us impose structure on our understanding of the interaction itself. The hierarchical model will be an elaboration and refinement of the model as originally postulated in FOLEY74. Indeed, the basic model has been long used in linguistics and formal language theory. In this paper we attempt to show the relevance of the model to interactive systems and to the process of designing Interactive systems.

In this paper we not only discuss the hierarchical model in the abstract, but show how the model can be used in the process of designing interactive systems, and also show how the disciplines of human factors engineering and psychology can be integrated into the design process. We know that good human engineering can have a substantial positive impact on the usability of interactive systems. For example, a recent experiment at the U.S. Army Research Institute [FIEL77] showed that the range in error rates among four different ways of entering words was greater than 50%. An experiment conducted by one of Foley's students showed a 40% increase in user productivity when commands were entered using a programmed function keyboard instead of menu selection with a lightpen [PIQUE75], in ENGL67 differences of up to 100% in speed and 200% in error rates are reported in the use of different picking techniques. A recent analysis of interactive drafting systems showed a 2:1 throughput ratio between the fastest and slowest in a benchmark test.

Interactive computer graphics applications are particularly sensitive to human factors engineering. Graphics is a rich communication channel, whose exploitation has both high risk and high payoff. The risk is high because the cost of implementing graphics applications is. typically greater than for non=graphics applications. The payoff is high because of the bandwidth of the communications channel. While this paper focuses on graphics interactions, the basic concepts extend to all interactive systems.

The purpose of this paper, then, is to further develop and define the language-oriented hierarchical model, and to indicate some ways in which the model allows a designer to bring in human factors considerations. To do this, I will describe the process of designing the user-machine interface of an interactive graphics system. The reader will observe that the design process (hence the hierarchical model) is similar to the familiar top-down software engineering methodology.

2 The Design Process

As we design interactive systems, we design two closely-interrelated languages which deal with a single conceptual model of the processes being performed by the machine. With one language the user communicates to the machine, which operates on the conceptual model. With the other language, the machine communicates to the user, depicting the state of the conceptual model.

As described in FOLEY74 and further discussed in BRIT77, each language has an associated semantic, syntactic, and lexical component. The semantics are the meanings associated with the languages' atomic units (words), and with combinations of atomic units (sentences). The languages' syntactic rules define the grammar with which words are combined into sentences, while the lexical rules determine how the hardware's basic primitives are combined into words.

In the output language, the semantic content of a sentence, that is, a displayed image, can be partitioned into problem-domain information, and control and state information. The former type represents some or all of the information being created or operated upon, be it a mechanical drawing, an electrical circuit, or a molecular structure. The latter type of information includes mechanisms for prompt and other messages, control choices (i.e., menus), and relevant problem-solving suggestions.

The syntax of the output language determines the organization of the visual presentation, in terms of where information is placed and how it is encoded. The lexical rules determine the details of the information's appearance, such as the color coding which might he used to convey specific meanings.

In the input language, the semantic information is again either in the problem domain or the control domain. Both types of information are entered by the man using devices such as keyboards, tablets, dials, joysticks, etc. Actions performed with these devices form the lexemes of the input language. These can be combined, as suggested in FOLEY74 and WALL76, into the syntactic units of picks (such as a light pen detect on problem-domain information), locators (as with a tablet and cursor, or tracking cross and light pen), valuators (scalars, as entered with a dial or character string), buttons (programmed function key depressions, light pen detects on menus), and text (text strings for annotation and commands).

There is also a conversation protocol, which defines how the user to computer and computer to user conversations are interconnected or sequenced. There are interrelationships at the lexical, syntactic, and semantic levels. Lexical inputs are typically echoed; syntactic inputs may lead to prompts; semantic inputs cause meaningful changes to the displayed image.

With this overview, we suggest that the design process has four basic steps. Each step is at a different level in the hierarchical model:

Formulate a conceptual model.
Select the operations which can be performed on the model, and the changes caused to the model (semantic design).
Specify the action sequences used to effect the operations and the : basic visual encodings representing the state of the conceptual model (syntactic design).
Bind each possible user action to interaction devices and details of the visual encodings to display device capabilities (lexical design).

This is not a once-through process, but rather is iterative: decisions made at one level affect decisions at other levels. The following figure further illustrates the design sequence.

2.1 Conceptual Model

Suppose we are designing an interactive text editor. Two basic conceptual models come to mind:

A line-number oriented system, with operations on entire lines.
A string oriented system, with operations on arbitrary strings of text.

Hybrids between these two extremes also exist.

In database management systems, conceptual models concern the organizational schemas for data: hierarchical, network, and relational. For drafting systems, some conceptual models are:

Mimic the tools and techniques used by a draftsman.
Extend the tools and techniques to 3D.
Depart from traditional drafting tools - allow user to directly manipulate (position, add, subtract) primitive 3D volumes.
Again departing from traditional tools, provide capabilities to procedurally describe how to draw/construct the 2D or 3D object of interest.

The questions here are Which one to choose?, Which conceptual model will users more readily grasp?. (The underlying, much more difficult question How can I generate all possible conceptual models? is also relevant, but is impossible to answer). In the (likely) absence of an answer to this specific question, we need to be aware of relevant theories or research results which can help us create or at least guess at an answer.

There are of course other relevant questions about alternative conceptual models. The questions relate to human factors, but do not require human factors understanding to answer. Examples are:

Which conceptual model is more powerful?
Which conceptual model will minimize use of system resources (and hence response time)?

2.2 The Semantic Design

The semantic design of the input (user to machine) language has to do with the selection of the semantic units of information which the user conveys to the machine. For a line-oriented text editor, typical semantic units might be:

  delete line x
  replace line x with...
  insert... after line x

An important question here is the balance between richness and minimality in the language. For instance, the replace line... command is redundant, being effectively a delete line... following by an insert.... Few would argue for minimality of this sort (doing without a replace command) in a text editor, but what about in other circumstances, or for other applications? Richness typically yields complexity, which must be mastered, but also yields power, which can be exploited. When is richness good? Minimality? What is the effect of user background and experience on the choice?

The semantic design of the output (machine to user) language has to do with the basic ways in which information will be displayed to the user, as typified by the choice of tables, bar charts, or trend charts to display statistics of various types. The information-transmittal properties of these types of presentation techniques have been examined [SGHUL6l].

A less well-studied area involves techniques available for displaying 3D structures, such as molecules, buildings, or machine-tool parts. There is a choice of using one or more of:

wire-frame drawing
wire-frame drawing with hidden lines removed
shaded drawing with hidden surfaces removed
multiple views of the same object
dynamic rotation
static rotation
stereo
intensity depth cuing
perspective projecting
parallel projection

One suspects that the more we use, the better. The object hypothesis theory advanced in GREG70, as well as our common sense, support this notion. But how many techniques are enough to unambiguously convey information to the user? Many of the techniques have non-trivial implementation costs. What are their benefits? What is the trade-off between using several relatively easily obtained cues (such as intensity depth cuing and dynamic rotation) and using one more difficult cue, such as surface removal?

If the view of an object is to be changed, should the change be continuous (as in panning over a drawing), or discrete (jumping from one view to another)? This question has recently been addressed in the specific context of studying a map to find a good routing from one place to another [MOSES78], but appears to be un-addressed in any more global context. Yet graphics system designers must often make a design decision about this, using as guidance only their intuition and hunches.

2.3 The Syntactic Design

For the input language, syntactic design is a grammatical issue - establishing the order in which basic words are entered. In a line-oriented text editor, the following syntaxes for a line replace command are all feasible:

1. replace         line x         with abc
2. replace         with abc       line x
3. with abc        replace        line x
4. with abc        line x         replace
5. line x          replace        with abc
6. line x          with abc       replace

The prefix forms (1,2) allow simple user prompting, while none of the others do. The postfix forms (4,6) make prompting impossible, except if a line number and replacement string have not been specified when a replace command is entered. Which syntactic form is easiest to learn? To remember? To use?

For the output language, syntactic design is a display layout matter, dealing with the way the graphical representation of the conceptual model's atomic units are organized on the display surface. For displays of real objects, there is little to decide. There are questions of prompt positions, error messages, menus, control information, and problem information. For displays of abstractions (where there is no reality to which one must be faithful), more design decisions are needed. An example is the relative positioning of symbols or objects within the viewing area. It is known [GORD73, GORD74] that in specific circumstances the positioning of data and the spacing between data affects search rate. But what of the relation between operator performance and other spatial characteristics of the display?

Lexical Design

Lexical design represents a binding of specific hardware capabilities to atomic units of the input and output languages. For the input language, this means associating actions or sequences of actions using input devices with words of the language. For the output language, the display primitives of lines, characters, curves, and filled areas, as well as attributes such as color, line style, and character font are used to encode data from the conceptual model.

User input actions appear to separate into five classes, as discussed in FOLEY74 and WALL76. Each action is thought of as being performed by a logical device: pick, button, keyboard, locator, and valuator. This concept has been included in the Core System design.

A pick is used to designate user defined objects, like a line, resistor, window, or curve. The lightpen is the prototype pick, and was developed in direct response to the user's needs to point at displayed objects.

A keyboard is for inputting text strings, and is usually implemented as an alphanumeric keyboard.

A button is for selecting an application program defined object such as an action to be performed. Prototype for this device is the programmed function keyboard which typically contains 16 to 32 such buttons.

A locator is used to indicate a location and/or orientation in the user's conceptual drawing space. It is typified by the tablet, joystick, mouse, and track ball.

A valuator, in contrast, is a device to determine a single value in the real number space. A potentiometer is the classical valuator.

Each logical device has a natural correspondence to a specific physical device or class of devices (for example, the logical locator is easily thought of as a tablet). Any logical device can in fact be realized by any physical device (examples of such realizations are given later). Thus the concept of logical input devices is rather like the logical files of most operating systems: a logical sequential input file may physically be implemented as a card reader, a paper type reader, a magnetic tape, a disk, or a terminal keyboard.

Logical devices can be realized in many different ways, using many different interaction techniques with various physical devices. Consider, as an example, the button device, used to select among alternative commands by the application program. The switch on most lightpens is often used as a button to tell the system that the picked entity is now to be operated upon, or that the tracking cross is now positioned at its final location. Switches on tablet styli are used in a like manner.

If commands are selected with a function keyboard, either single or multiple physical keys (a chord) can be used to invoke a command. Coded overlays, if available, can further expand the number of distinct buttons provided. A pick such as a lightpen can be used to simulate physical buttons. A menu of system-defined light buttons is displayed. The pen is used to pick the desired action from the collection of alternatives. The selection also can be effected by using a locator to move the cursor close to a light button. Even a valuator can be used as a button activator, by indicating a numeric value which has been associated with each button. Touch panels allow a user to point his finger at a light button. Command names can be typed from a keyboard.

If the interaction language has more commands than the screen has space for light buttons, the frequently used commands might be assigned to light buttons, with the remaining commands relegated to physical buttons. Otherwise, a hierarchical selection can be made. A command category is selected from an initially displayed menu, and the desired specific command is chosen from the subsequently displayed menu. This has the advantage over keyboards that the changing buttons are immediately seen at or near a point of visual focus, and the user need not interrupt his thinking to change overlays.

A powerful use of locator devices is to recognize certain combinations or sequences of movements as button activations. If the motions are made with the cursor or tracking cross on top of the element to be operated upon, both the command and operand are simultaneously specified!

Commands need not be given by mechanical manipulation of devices. Speech input is a powerful technique, in use even now. Recognizers with ever-increasing capabilities continue to be introduced, at ever-decreasing cost.

A host of human factors questions arise just for this button logical device. Which of these techniques is better? When? Why? Under what circumstances is pattern recognition a reasonable method? What are the learning, retention, error, throughput, and fatigue characteristics for each? What is the trade-off in depth versus breadth for hierarchical menu selection - how many choices should be available at each level? At what level do error rates in pattern or speech recognition become unacceptable? Do different types of users - regular, irregular, skilled, unskilled, motivated, unmotivated, stressed, unstressed - affect the answers to these questions? A few isolated studies have been done, but there are more unanswered than answered questions.

A similar richness of choice and of resulting questions concerning the choice exist for the other logical devices. For example, in the string oriented text editor, a string to be deleted might be indicated by

Typing the entire string (or enough of the string to establish a context) at the keyboard, or
Using light pen or cursor to directly identify the start and end of the string on the screen.

These alternatives present substantially different interaction styles, with potentially significant differences in usability, throughput, and system cost.

Thus even though the lexical design is done last, it is not necessarily any less important than the other levels of design.

An example of interaction between syntactic and lexical design occurs here. A specific syntax determines the order of user actions. The need for some amount of tactile continuity while inputting a sequence of atomic units then suggests that changes from one input device to another be limited.

Lexical design for the output language is one area which has been extensively studied. There are numerous reports on the selection of fonts, colors, symbols, etc. to maximize recognition, minimize errors, or increase learning rate. Some of these are summarized in KRIL76.

3. Summary

The conceptual framework presented here - concept, semantics, syntax, lexemes - can serve as a hierarchical model as we attempt to formulate a methodology of interaction. The relationship to Human Factors issues is important, as the two should be considered as being inseparable (this is unfortunately not always the case today). The lexical and syntactic levels are the most appropriate for any attempts at general standardization. The semantic and conceptual levels are of course application-oriented and therefore cannot be subject to standards except within a specific application domain.

References

BENN76: Bennett, John. "User-Oriented Graphics Systems for Decision Support in Unstructured Tasks", "Proceedings of the ACM/ SIGGRAPH Workshop on User-Oriented Design of Interactive Graphics Systems, (1976), pp. 3-12.

BRIT77: Britton, E. The Ergonomic Design of Interactive Graphic Systems, Ph.D. Dissertation, Dept. of Computer Science, University of North Carolina, Chapel Hill, (1977).

BROOKS77: Brooks, Frederick P. Jr. "The Computer "Scientist" as Toolsmith - Studies in Interactive Computer Graphics", Proceedings 1977 IFIP Conference, North-Holland, Amsterdam (1977), pp. 625-634.

CHER76: Cheriton, D. "Man-Machine Interface Design for Timesharing Systems", Proceedings ACM 1976 Conference. 362-366.

ENGEL75: Engel, Stephen E. and Richard E. Granda. Guidelines for Man/Display Interfaces, Technical Report TR 00.2720, December 1975, IBM Poughkeepsie Laboratory, New York.

ENGL67 English, W. K., C. Engelbart, and M. Berman. "Display-selection Techniques for Text Manipulation", IEEE Trans. Hum. Factors in Electron., Vol. HFE-8, pp. 5-15, March (1967).

FIEL77 Fields, A., R. Maisano, and C. Marshall, A Comparative Analysis of Methods for Tactical Data Inputting, U.S. Army Research Institute, (1977).

FOLEY76 Foley, J. and V. Wallace. "The Art of Natural Graphic Man-Machine Conversation", Proceedings IEEE 62, 4 April (1974), pp. 462-470.

GORD74 Gordon, I. E. and M. Amos. "Checking Groups of Letters", Journal of Applied Psychology, 59(3), pp. 354-357, (1974).

GORD73 Gordon, I. E. and M, Winwood. "Searching Through Letter Arrays", Ergonics, 16(2), pp. 177-188, (1973).

GREG70 Gregory, R. L. The Intelligent Eye, New York: McGraw-Hill, (1970).

HANS71 Hansen, W. "User Engineering Principles for Interactive Systems", Proceedings 1971 Fall Joint Computer Conference, pp. 523-532.

KRIL76 Kriloff, Harvey Z. "Human Factor Considerations for Interactive Display Systems", in S. Treu, ed., Proceedings ACM/SIGGRAPH Workshop on User-Oriented Design of Interactive Graphics Systems, ACM, (1976), pp. 45-52.

MART73 Martin, James. Design of Man-Computer Dialogues, Prentice-Hall, (1973).

MOSES78 Moses, F. L. and R. E. Maisano. "User Performance Under Several Automated Approaches to Changing Displayed Maps", Proceedings 1978 SIGGRAPH Conference, published in Computer Graphics 12, (1978).

PIQUE75 Pique, M. "A Comparison of User Performance in DRAFTPAD Using Two Different Input Devices for Command Specification", University of North Carolina, (1975) (Unpublished).

SCHUL61 Schultz, H. G. "An Evaluation of Formats for Graphic Trend Displays", Human Factors, 3, pp. 99-119, (1961).