Chilton::ACD::Methodology in Interaction

4. COGNITIVE PSYCHOLOGY AND INTERACTION

It is easier to lure a fish than to
hit it over the head with a club.
-Muggles, Maxims

4.1 INTRODUCTION - T. MORAN

In general it is very difficult to carry over results from one research field to another. In particular, in the field of Experimental Psychology, it is hard to pull out interesting results. In Experimental Psychology literature, one can encounter an article comparing A and B, which ends with the hypothesis that A is better than B. Significance tests may actually show that A is better than B without giving any reasons. This makes it hard to apply these results to any other environments.

In Human Factors literature, one can find studies about display visibility, fonts, flicker, display ergonomics, conditional statements being better than others for hand simulating etc.

Another view, often given, is to use psychological expertise in running experiments, showing how to run a test for a given hypothesis. For example, John Hayes has looked at how people compose essays and this might be useful in the design of a text editor if one plans to use that text editing system for a whole range of tasks.

Over the last 15 years, cognitive psychology has tried to consider man as an information processor. Human architecture is considered as a systems architecture. This actually is a way of carrying things over. The general features of behaviour that are fair generalisations might be carried over also. One might even go beyond and develop, in the future, specific models of interaction.

4.2 COGNITIVE PSYCHOLOGY AND INTERACTION - J.HAYES

4.2.1 What is Cognitive Psychology

Behaviorist psychology was dominant in the USA until the 1950's. The Behaviorists regarded it as unscientific to discuss thought processes. In the 1960's, psychologists became humanised by the appearance of the computer and, as computer scientists could give scientific descriptions of the insides of their computers, psychologists decided that it might be possible to describe scientifically the inside of a person using hardware free (physiology free) information processing models.

This focus on process and on information processing models led naturally to computer simulation (if the hardware doesn't matter, why not use a computer) and protocol analysis to yield rich data about process.

4.2.2 Task Analysis

To do cognitive psychology, we need careful task analysis - that is, we need to understand what the task requires of any information processing system which is to do the task and what the particular information processing system contributes.

For example, if we observe an ant crawling through the grass, its behaviour is very complex. Most of this complexity resides in the task environment. If we want to describe the psychology of the ant, we want to separate the complexity which must be attributed to the task environment, from the complexity which can be attributed to the ant.

SMALLTALK is a good system, but what was Alan Kay thinking about when he designed it. Was most of the information he used about the external task to be performed or about the psychological properties of the user. If task analysis was done beforehand, the psychologist could find out what the really important parts were.

4.2.1 Representations

If you change a person's representation of a task, you change the way the person does the task. Two similar modes of representation may interfere with one another. For example, let us consider division. The experience of the user may determine how well he can use a given display. ^The displays used for division by the British and Americans on the one hand and by Europeans on the other are characteristically different. To divide 21 into 473( Americans and the British typically write:

    21:473

whereas Europeans write:

    473:21

We asked European-trained and American-trained subjects to do mental division in each of these formats. We found that subjects do mental division more rapidly when it is presented in the familiar format than when it is presented in the unfamiliar format. Also, it was not a matter of the additional time required to flip the image over in the mind. The second digit of the result of the division also took significantly longer to generate.

We expect that since experts and novices may differ in their experience with displays, they may use them in different ways. Chess research supports this expectation. Chase and Simon asked subjects to examine a 25 piece mid-game chess position for 5 seconds and then to try to reproduce it. Grand masters were very much more successful than novices in performing this task (93% success compared with 33%). Masters and novices performed equally poorly (15% success) when pieces were placed randomly on the board (rather than representing real chess positions). This suggests that the master's superiority depends on chess knowledge. Chase and Simon estimate that the master has stored some 50,000 images of chess patterns as a result of his experience with the game.

4.2.4 Problem Solving

The semantics of a representation may be very important in determining the difficulty of the problem. We have a number of isomorphs of the Tower of Hanoi problem and let me consider two called Monster Problems 2 and 4:

Tower of Hanoi

Monster 2

Monster 4

In Monster Problem 2, there are three five-handed extra-terrestrial monsters standing on three crystal globes. Because of the quantum-mechanical peculiarities of their neighbourhood, both monsters and globes come in exactly three sizes with no others permitted (small, medium, and large). The medium-sized monster was standing on the small globe. The small monster was standing on the large globe. The large monster was standing on the medium-sized globe. Since this situation offended their keenly developed sense of symmetry, they proceeded to transfer themselves from one globe to another so that each monster would have a globe proportionate to his own size. Monster etiquette complicates the solution of the problem since it requires:

That only one monster may be transferred at a time.
That if two monsters are standing on the same globe, only the larger of the two may be transferred.
That a monster may not be transferred to a globe on which a larger monster is standing.

The problem is to find by what sequence of transfers can the monsters arrive on the correct globes.

In Monster Problem 4, three five-handed extra-terrestrial monsters are holding three crystal globes. Because of the quantum-mechanical peculiarities of their neighbourhood, both monsters and globes come in exactly three sizes with no others permitted (small, medium, large). The medium-sized monster is holding the small globe. The small monster is holding the large globe. The large monster is holding the medium-sized globe. Since this situation offended their keenly developed sense of symmetry, they proceeded to shrink and expand themselves so that each monster would have a globe proportionate to his own size. Monster etiquette complicated the solution of the problem since it requires:

That only one monster may be changed at a time.
That if two monsters have the same size, only the monster holding the larger globe may be changed.
That a monster may not be changed to the same size as a monster holding a larger globe.

By what sequence of changes could the monsters have solved this problem.

Monster problem number 2 involves moving things from one place to another, while Monster problem number 4 involves changing the size of things. It is nearly twice as hard to solve Monster problem number 4 than Monster problem number 2 even though these two problems are exact isomorphs.

Imagery may determine the form of the notation used to solve a problem. When subjects are asked to solve Monster problem number 2 or Monster problem number 4, they often make a preliminary sketch. [Editor: The reader is encouraged to try the problem with sketches. As we realise that most people working in computer graphics cannot draw, we include sample drawings by a graphic artist at the end of the paper.] Many subjects then proceed to develop a notation to describe the sequence of events in the solution of the problem. For example, a typical solution to Monster problem 2 might start:

           M   L   S
       0   L   S   M
       1   -  L,S  M
       2   M  L,S  -

A typical solution to Monster problem 4 might start:

           M   L   S
       0   L   S   M
       1   L   L   M
       2   L   L   S

The particular instances of these forms depends on the initial sketch which the subject drew. The order of elements in the matrix is closely correlated with the order of elements in the drawing. The horizontal or vertical orientation of elements in the matrix corresponds to the orientation of elements in the drawing.

Full image ⇗
© UKRI Science and Technology Facilities Council

4.2.5 Transfer in Problem Solving

When subjects solve two isomorphic problems one after another, there may be very little transfer. The failure of students to transfer among isomorphic problems is a major limitation in the classroom. We propose that people fail to transfer solutions from a problem to its isomorph when the two problems differ in an important way in their semantics. We believe that humans categorise perceptions of the world into (at least) the following six semantic categories:

Object
Event
Location
Time
Property
Action

When corresponding elements in two problem isomorphs belong to different semantic categories, then people will have great difficulty in transferring the solution of one problem to the solution of the other. For example, the categories of corresponding elements in various isomorphs of the Tower of Hanoi problem are:

	Variable Entity	Variable Attribute	Criterion
Tower of Hanoi	Object Disc	Location Disc Loc	Property Disc Size
Monster No 2	Object Monster	Location Monster Loc	Property Monster Size
Monster No 4	Object Monster	Location Monster Size	Property Globe Size

We predict that transfer from Tower of Hanoi to Monster Problem number 2 will be easy, but that transfer to Monster Problem number 4 will be difficult.

4.2.6 Summary

The main points made are:

If you change a person's representation of a task, you change the way the person does the task. Two similar modes of representation may interfere with one another.
The experience of the user may determine how well he can use a display.
Different representations of the same problem may be very different in difficulty.
Transfer between two problems depends on the semantics of the problem representations.

4.3 DISCUSSION

Kay:

Forming an analogy imposes a high cognitive load. If a problem takes over half an hour to solve, the solver might get back more from trying to find the isomorph.

Hayes:

Real problems are often only partially isomorphic, only parts can be extracted. Teaching students the principle of symmetry can be very profitable. We are planning some work in this area. We taught one group of students about Poisson Distributions in a space domain and then posed them a problem set in the time domain. They didn't immediately make the generalisation from space to time.

Perhaps a carefully constructed set of problems that would lead them from a problem set in its classical form to isomorphs which they may encounter later would be of value.

Dzida:

What can we learn from your presentation when we talk about interaction between man and machine.

Hayes:

It is difficult to carry information over into a new world and this may lead to loss of performance. It is difficult to carry knowledge over even if the person has a good knowledge base unless one points out what is relevant and how it is relevant.

van Dam:

Why do all our user manuals start from scratch? Why do they not point out the similarities to everyone else's systems.

Hayes:

That is a good point. Alan Kay described SMALLTALK by saying how it was like SIMULA.

Kay:

I explained the SMALLTALK system to Nygaard, one of the inventors of SIMULA. He immediately picked up the fact that SMALLTALK differed from SIMULA and by probing the differences very soon came to an understanding of SMALLTALK.

Hayes:

The way to progress is to make use of partial isomorphisms. Our experiments use total isomorphisms but this is a special case.

Kay

The six categories you listed are very interesting. Our culture puts these concepts into different buckets but some cultures put some of those categories into the same bucket. It would be interesting to see if this is a wiring problem or an early learning problem in how we choose to see things.

Hayes:

I agree. Some of the work on categorisation is beginning to make us think that some of the categories are wired in.

ten Hagen

Hayes mentioned symmetry in his talk; man/machine systems should be symmetrical. Could you elaborate on this? Should the same six categories be explicit in the machine also? Does this form a basis for languages?

Hayes:

I talked about symmetry as a way of halving the work to be done in certain problems. I did not intend the 6 categories to be used as design criteria. It is just that it can be very difficult to see symmetry when corresponding elements are in different categories.

ten Hagen:

For interchanges of location and time, the detection of symmetry is fairly straightforward! For other entities it is much more difficult.

Hayes:

Interchange of event and object is also fairly common.

Dzida:

I wonder what we can learn from the mismatch between different classes of problem solvers for the design of interactions?

Hayes:

The major message is that there are factors in building representations other than the form of the system. Semantic categories are important.

Odd results are obtained if we change the semantic categories of algebra problems. For example, here are some surrealist algebra problems:

By singing two decibels louder an hour louder than usual, Mr Smith saved half an hour in singing a 110 decibel song. Find the solution to that!

A certain wine takes 7 years to mature properly from the time the grapes are harvested but only 4 years for the return trip! How fast will the wine mature in still time.

One must look at the semantics of the situation one is dealing with. One should not consider form devoid of semantics.

Foley:

For any job domain there are many systems we could build. How can the psychologist help us to chose that which is easiest to understand.

Hayes:

Representations formed by experts and novices will be different. The system must adapt itself to the users and change as the user becomes more experienced.

Dzida:

What would you say to a systems designer who asked you for advice?

Hayes:

I do not know. We need to collect ideas about what troubles users get into. Producing a list would be useful:

The interaction between the user's and designer's model.
The great difficulties of users maintaining accurate knowledge of the state of the machine (this is the mode-full issue).
Different representations interfere with each other.
How do you construct a reasonable sequence of problems to provide training.

A major problem is that the spatial realisation of a model often affects the rate of learning and, often, in not obvious ways.

SYSTEM DESIGN- T. MORAN

The work being done by Stu Carol, Tom Moran and Alan Newell at Xerox PARC is focused on the goal of making psychological expertise available to system designers. The approach is to build models of the user which the designer can apply and thus do the psychology himself. What does a good interactive system mean? There is no obvious metric for good, but systems can be measured against a set of criteria, such as the following:

Dimensions of Evaluation for User-System Performance

Time:: How long does it take a user to accomplish a given set of tasks using the system?
Errors:: How many errors does a user make and how serious are they?
Learning:: How long does it take a novice user to learn how to use the system to do a given set of tasks?
Functionality:: What range of tasks can a user do in practice with the system?
Recall:: How easy is it for a user to recall how to use the system on a task that he has not done for some time?
Concentration:: How many things does a user have to keep in mind while using the system?
Fatigue:: How tired do users get when they use the system for extended periods?
Acceptability:: How do users subjectively evaluate the system?

Initially, the study has concentrated upon the time criteria, and the idealisation of this is given below where the underlinings indicate important restrictions:

The Prediction Problem

Given

A task (possibly involving several subtasks).
The command language of a system.
The motor skill parameters of the user.
The response time parameters of the system.
The method used for the task.

Predict

The time an expert user will take to execute the task using the system, providing he uses the method without error.

The model assumes an expert user, it does not degrade simply to account for naive users. Part of the expert's skill is to clear up errors in the normal course of using the system. The model assumes that no errors occur.

The model predicts the time taken to execute a task. The user's operations are viewed as a set of Unit Tasks, each of which must be acquired and then executed. The total time for a task is the sum of the time to acquire and time to execute. Typical acquisition times are 2 secs for manuscript editing, 5.3 secs for routine design and greater than 5 secs for creative composition. Execution times are generally less than about 20 sec. If the execution time takes much longer, users will usually break the task into sub-tasks. The Keystroke-level Model predicts the execution time as the sum of time for:

K: keystrokes
P: pointing
H: hand movements
D: drawing
M: mental preparation
R: system responses

where TK is the time for keystrokes, TP the time for pointing etc. The times associated with individual operations in each of these domains are as follows:

	Secs
TK-A keystroke or button push
Best typist (135wpm)	0.08
Good typist (90 wpm)	0.12
Average skilled typist (55wpm)	0.20
Average non-secretary typist (40wpm)	0.28
Typing random letters	0.50
Typing complex codes	0.75
Worst typist (unfamiliar with keyboard)	1.20
TP-Pointing on a display with a mouse	1.10
TH-Homing the hand(s) on the keyboard etc	0.40
TD-Drawing N straight lines, total length I cms	0.9N+0.16I
TM-Mentally preparing for executing physical actions	1.35

In fact the time for pointing obeys Fitt's law (i.e. it is proportional to the log of target distance divided by target size). The model predicts the execution time for a particular method, that is a routine sequence of operations to achieve a unit task. Examples of methods and their encodings in terms of the model's operation are shown below. Methods for the replacement of one 5 letter word by another are given for POET (a line editor) and DISPED (a display editor using a mouse):

Example Method Encodings
Replace a word using POET
Jump to next line	MK [linefeed]
Call Substitute command	MK [S]
Specify new 5-digit word	5K [word]
Terminate argument	MK [return]
Specify old 5-digit word	5K [word]
Terminate argument	MK [return]
Terminate command	K [return]
T Execute = 4 TM + 15 TK = 8.4 secs
Replace a word using DISPED
Reach for mouse	H [mouse]
Point to word	P [word]
Select word	K [YELLOW]
Home on keyboard	H [keyboard]
Call Replace command	MK [R]
Type new 5-digit word	5K [word]
Terminate type-in	MK [esc]
T execute = 2 TM + 8 TK + 2

The important decision in encoding a method is where to put the M operators, representing the points at which the user pauses to organise his operations. A set of 5 rules for placing these M operations or, equivalently, for dividing keystrokes into conceptual chunks are given below. Rules 1 to 4, which delete M operations, suggest ways of designing interactions to improve time performance:

Heuristic Rules for Placing M Operations

Begin with a method of encoding that includes all physical operations and response operations. Use Rule 0 to place candidate Ms and then cycle through Rules 1 to 4 for each M to see whether it should be deleted.

Rule 0: Insert Ms in front of all Ks that are not part of argument strings proper (for example, text or numbers). Place Ms in front of all Ps that select commands (not arguments).
Rule 1: If an operator following an M is fully anticipated in an operator just previous to M, then delete the M (for example, PMK becomes PK).
Rule 2: If a string of MKs belong to a cognitive unit (for example, the name of a command), then delete all Ms but the first.
Rule 3: If a K is a redundant terminator (for example, the terminator of a command immediately following the terminator of its argument), then delete the M in front of it.
Rule 4: If a K terminates a constant string (for example, a command name), then delete the M in front of it. But if the K terminates a variable string (for example, an argument string), then keep the M in front of it.

A set of experiments, in which user performance on tasks was compared with the model's predictions, involved the following systems:

Text Editors
POET	Line-oriented with relative line numbers
SOS	Line-oriented with 'sticky1 line numbers
DISPED	Display-oriented, full page, uses mouse for pointing
Graphics Systems
MARKUP	Uses mouse to draw and erase lines on a bitmap display, commands selected from a hidden menu, which must be redisplayed each time
DRAW	Lines defined by pointing with mouse to end points, commands selected with mouse from a menu
SIL	Lines defined by pointing with mouse to end points, boxes defined by pointing to opposite vertices, commands selected by combinations of mouse buttons
Executive Subsystems
LOGIN	TENEX command for logging in
FTP	Program for transferring files between computers
CHAT	Program for establishing a 'teletype1 connection between two computers
DIR	TENEX command for printing a file directory, has a subcommand mode
DELVER	TENEX command for detecting old versions of a file

The tasks were as follows:

Editing Tasks (used for POET, SOS, DISPED)
T1	Replace one 5-letter word with another (one line from previous task)
T2	Add a 5th character to a U-letter word (one line from previous task)
T3	Delete a line, all on one line (eight lines from previous task)
T4	Move a 50 character sentence, spread over two lines to the end of its paragraph (eight lines from previous task)
Graphics Tasks (used for MARKUP, DRAW, SIL)
T5	Add a box to a diagram
T6	Add a 5-character label to a box
T7	Reconnect a 2-stroke line to a different box
T8	Delete a box, but keep an overlapped line
T9	Copy a box
Executive Tasks
T10	Phone computer and login (4 char name, 6 char password)
T11	Transfer a file to another computer, renaming it
T12	Connect to another computer
T13	Display a subset of the file directory with file lengths
T14	Delete old versions of a file

The definition of expert required at least 6 months experience of the system, and not more than 1 week since the last use. There was a 20% RMS error in predicting a single task. The error for n tasks is reduced by a factor of sqrt(n). The validity of the five heuristic rules was tested by omitting each in turn; in all cases the error went up. The time for the M operator was obtained by a best-fit calculation, but for values between 1.2 and 2.0 the RMS error was 23% or less.

The model was used to calculate the result of benchmark tests on editors, and a comparison was made between these predictions and experiment:

	Calculated	Observed	% Error
Text Editors (4 unit tasks)
POET	60	60	0
SOS	50	56	-11
DISPED	27	28	-4
Average			5
Graphics Editors (5 unit tasks)
MARKUP	59	54	+10
DRAW	55	52	+6
SIL	27	31	-13
Average			10

The final example of the use of the model was the evaluation of a proposed additional means of recovering from a mis-typed word. The two existing methods are:

W: hitting control-W to rub out a word at a time and then re-typing the text which was destroyed.
R: leave insert mode, replace the erroneous word go back to end of insert and re-enter insert mode.

Method R takes a constant 12 sec, whereas method W is faster for small distances and slower for large distances. Users select the appropriate method for the distance in question.

Suggested new method S involves skipping backwards over each word with a control-S, rubbing out the erroneous word with a control-W, inserting its replacement, and then returning to the previous position with a control-R. The three encodings were:

Method W (Backword)
Setup Backword command	MK [Ctrl]
Execute Backward n times	n((1/c)MK[W])
Type new word	5.5K[word]
Retype destroyed text	5.5(n-1)K
T execute = (1+n/c) TM + (1+6.5n) TK = 1.6 + 2.16n secs

The term n/c takes account of the user's need for feedback on how far back he has gone. Initially, c=4 was assumed but later a sensitivity analysis was performed.

Method R (Replace)
Terminate type-in mode	MK[esc]
Select target word	H[mouse] P[word] K[YELLOW]
Call Replace command	H[keyboard] MK[R]
Type new word	4.5K[word]
Terminate Replace command	MK[esc]
Select last input word	H[mouse] P[word] K[YELLOW]
Re-enter type-in mode	H[keyboard] MK[l]
T execute = 4 TM + 10.5 TK + 4 TH + 2 TP = 12.1 secs
Method S (Backskip)
Setup Backskip command	MK[ctrl]
Execute Backskip n-1 times	(n-1)((1/c) MK[S]
Call Backword command	MK[W]
Type new word	4.5K[word]
Call Resume command	M2K[ctrl R]
T execute = (3 + (n~1)/c) TM + (n+7.5) TK = 5.8 + 0.62 n secs

The model shows that the S method will be faster for an intermediate range of distances, the precise range depending on the value of c, and the user's typing speed (See Figure 1).

Figure 1

Finally, a number of simplifications of the model are possible, but these reduce the accuracy of the prediction:

Keystrokes only	49%
Prorated mental time	45%
Constant operator time	34%
Full Keystroke-level Model	22%

The percentages give the Root Mean Square error. The Prorated Mental Time model is assuming that mental time is some fraction of the operation time. The Constant Operator Time model assumes all operations take equal time.

4.5 DISCUSSION

Guedj:: In your dimensions of evaluation, the first seven dimensions are objective and the last subjective. Is it possible that we can only give measurements for objective items?
Dzida:: What is the strategy behind the evaluations?
Moran:: Our ultimate goal is to provide a model for each of the eight dimensions. So far we have only done the first and then for only the expert user. We cannot solve the design problem, we only give tools for managing the design. We hope to provide a whole kit of tools for the designer to use.
Sancha:: We have had some experience with draughtsmen using a system. We found that the time to perform a task depends on the length of time they have been using the system.
Hayes:: A subjective factor (like acceptance) may differ from others, but it is important to measure since it may dominate. Acceptability is the marketing issue.
Engleman:: In the context of design, the constant problem is being all things to all people. For example, TECO which is both the best and worst system. Learning it is a problem. You read the manual and it looks powerful, but you do not believe that you could ever learn how to perform a simple task such as changing a single character in a file. The problem is that it does not reduce down for simple tasks well. People get scared of using it. That is what is missing. This is the standard design problem - the compromise between ease of use and generality.
Kay:: The Display Editor that Tom Moran described. We had a poll of 20 users asking them how they rated it. They rated the user interface as poor, the functionality as high but the most outstanding feature that the users liked was crash recovery. The system has a recall mode. All the dialogue of a session is stored in a file and you could go back and replay the commands. They rated this close to infinity. This is a wonderful feature. You could always count on it - even if the roof fell in. Its acceptance completely revolved around its ability to recover front disaster.; There is an attribute for systems called Velvet. It is hard to define. JOSS had it - it made you feel good when you sat down at a terminal. You knew the system would take care of you. It is important that we design systems with Velvet in them. JOSS is the only pleasurable system that I have ever used.
Baecker:: [To Tom Moran] You divide each task into acquisition and execution. Doesn't acquisition depend on the machine?
Moran:: Sure, but it is in the noise.
Baecker:: In my perception of text editors, reliability correlates most highly with acceptability.
van Dam:: I have seen TECO and SOS used in two similar environments and have found one preferred TECO and the other SOS. Why?
Sancha:: Acquisition of model of language or editor is influenced by the first system you learnt. You reapply the model when you go to a new system.
Hopgood:: We have split a user population in two and allowed one half to access system A before B and vice versa. Both sets preferred the editor that they encountered first.