In general it is very difficult to carry over results from one research field to another. In particular, in the field of Experimental Psychology, it is hard to pull out interesting results. In Experimental Psychology literature, one can encounter an article comparing A and B, which ends with the hypothesis that A is better than B. Significance tests may actually show that A is better than B without giving any reasons. This makes it hard to apply these results to any other environments.
In Human Factors literature, one can find studies about display visibility, fonts, flicker, display ergonomics, conditional statements being better than others for hand simulating etc.
Another view, often given, is to use psychological expertise in running experiments, showing how to run a test for a given hypothesis. For example, John Hayes has looked at how people compose essays and this might be useful in the design of a text editor if one plans to use that text editing system for a whole range of tasks.
Over the last 15 years, cognitive psychology has tried to consider man as an information processor. Human architecture is considered as a systems architecture. This actually is a way of carrying things over. The general features of behaviour that are fair generalisations might be carried over also. One might even go beyond and develop, in the future, specific models of interaction.
Behaviorist psychology was dominant in the USA until the 1950's. The Behaviorists regarded it as unscientific to discuss thought processes. In the 1960's, psychologists became humanised by the appearance of the computer and, as computer scientists could give scientific descriptions of the insides of their computers, psychologists decided that it might be possible to describe scientifically the inside of a person using hardware free (physiology free) information processing models.
This focus on process and on information processing models led naturally to computer simulation (if the hardware doesn't matter, why not use a computer) and protocol analysis to yield rich data about process.
To do cognitive psychology, we need careful task analysis - that is, we need to understand what the task requires of any information processing system which is to do the task and what the particular information processing system contributes.
For example, if we observe an ant crawling through the grass, its behaviour is very complex. Most of this complexity resides in the task environment. If we want to describe the psychology of the ant, we want to separate the complexity which must be attributed to the task environment, from the complexity which can be attributed to the ant.
SMALLTALK is a good system, but what was Alan Kay thinking about when he designed it. Was most of the information he used about the external task to be performed or about the psychological properties of the user. If task analysis was done beforehand, the psychologist could find out what the really important parts were.
If you change a person's representation of a task, you change the way the person does the task. Two similar modes of representation may interfere with one another. For example, let us consider division. The experience of the user may determine how well he can use a given display. ^The displays used for division by the British and Americans on the one hand and by Europeans on the other are characteristically different. To divide 21 into 473( Americans and the British typically write:
21:473
whereas Europeans write:
473:21
We asked European-trained and American-trained subjects to do mental division in each of these formats. We found that subjects do mental division more rapidly when it is presented in the familiar format than when it is presented in the unfamiliar format. Also, it was not a matter of the additional time required to flip the image over in the mind. The second digit of the result of the division also took significantly longer to generate.
We expect that since experts and novices may differ in their experience with displays, they may use them in different ways. Chess research supports this expectation. Chase and Simon asked subjects to examine a 25 piece mid-game chess position for 5 seconds and then to try to reproduce it. Grand masters were very much more successful than novices in performing this task (93% success compared with 33%). Masters and novices performed equally poorly (15% success) when pieces were placed randomly on the board (rather than representing real chess positions). This suggests that the master's superiority depends on chess knowledge. Chase and Simon estimate that the master has stored some 50,000 images of chess patterns as a result of his experience with the game.
The semantics of a representation may be very important in determining the difficulty of the problem. We have a number of isomorphs of the Tower of Hanoi problem and let me consider two called Monster Problems 2 and 4:
In Monster Problem 2, there are three five-handed extra-terrestrial monsters standing on three crystal globes. Because of the quantum-mechanical peculiarities of their neighbourhood, both monsters and globes come in exactly three sizes with no others permitted (small, medium, and large). The medium-sized monster was standing on the small globe. The small monster was standing on the large globe. The large monster was standing on the medium-sized globe. Since this situation offended their keenly developed sense of symmetry, they proceeded to transfer themselves from one globe to another so that each monster would have a globe proportionate to his own size. Monster etiquette complicates the solution of the problem since it requires:
The problem is to find by what sequence of transfers can the monsters arrive on the correct globes.
In Monster Problem 4, three five-handed extra-terrestrial monsters are holding three crystal globes. Because of the quantum-mechanical peculiarities of their neighbourhood, both monsters and globes come in exactly three sizes with no others permitted (small, medium, large). The medium-sized monster is holding the small globe. The small monster is holding the large globe. The large monster is holding the medium-sized globe. Since this situation offended their keenly developed sense of symmetry, they proceeded to shrink and expand themselves so that each monster would have a globe proportionate to his own size. Monster etiquette complicated the solution of the problem since it requires:
By what sequence of changes could the monsters have solved this problem.
Monster problem number 2 involves moving things from one place to another, while Monster problem number 4 involves changing the size of things. It is nearly twice as hard to solve Monster problem number 4 than Monster problem number 2 even though these two problems are exact isomorphs.
Imagery may determine the form of the notation used to solve a problem. When subjects are asked to solve Monster problem number 2 or Monster problem number 4, they often make a preliminary sketch. [Editor: The reader is encouraged to try the problem with sketches. As we realise that most people working in computer graphics cannot draw, we include sample drawings by a graphic artist at the end of the paper.] Many subjects then proceed to develop a notation to describe the sequence of events in the solution of the problem. For example, a typical solution to Monster problem 2 might start:
M L S 0 L S M 1 - L,S M 2 M L,S -
A typical solution to Monster problem 4 might start:
M L S 0 L S M 1 L L M 2 L L S
The particular instances of these forms depends on the initial sketch which the subject drew. The order of elements in the matrix is closely correlated with the order of elements in the drawing. The horizontal or vertical orientation of elements in the matrix corresponds to the orientation of elements in the drawing.
When subjects solve two isomorphic problems one after another, there may be very little transfer. The failure of students to transfer among isomorphic problems is a major limitation in the classroom. We propose that people fail to transfer solutions from a problem to its isomorph when the two problems differ in an important way in their semantics. We believe that humans categorise perceptions of the world into (at least) the following six semantic categories:
When corresponding elements in two problem isomorphs belong to different semantic categories, then people will have great difficulty in transferring the solution of one problem to the solution of the other. For example, the categories of corresponding elements in various isomorphs of the Tower of Hanoi problem are:
Variable Entity |
Variable Attribute |
Criterion | |
---|---|---|---|
Tower of Hanoi | Object Disc |
Location Disc Loc |
Property Disc Size |
Monster No 2 | Object Monster |
Location Monster Loc |
Property Monster Size |
Monster No 4 | Object Monster |
Location Monster Size |
Property Globe Size |
We predict that transfer from Tower of Hanoi to Monster Problem number 2 will be easy, but that transfer to Monster Problem number 4 will be difficult.
The main points made are:
The work being done by Stu Carol, Tom Moran and Alan Newell at Xerox PARC is focused on the goal of making psychological expertise available to system designers. The approach is to build models of the user which the designer can apply and thus do the psychology himself. What does a good interactive system mean? There is no obvious metric for good, but systems can be measured against a set of criteria, such as the following:
Initially, the study has concentrated upon the time criteria, and the idealisation of this is given below where the underlinings indicate important restrictions:
The time an expert user will take to execute the task using the system, providing he uses the method without error.
The model assumes an expert user, it does not degrade simply to account for naive users. Part of the expert's skill is to clear up errors in the normal course of using the system. The model assumes that no errors occur.
The model predicts the time taken to execute a task. The user's operations are viewed as a set of Unit Tasks, each of which must be acquired and then executed. The total time for a task is the sum of the time to acquire and time to execute. Typical acquisition times are 2 secs for manuscript editing, 5.3 secs for routine design and greater than 5 secs for creative composition. Execution times are generally less than about 20 sec. If the execution time takes much longer, users will usually break the task into sub-tasks. The Keystroke-level Model predicts the execution time as the sum of time for:
where TK is the time for keystrokes, TP the time for pointing etc. The times associated with individual operations in each of these domains are as follows:
Secs | |
---|---|
TK-A keystroke or button push | |
Best typist (135wpm) | 0.08 |
Good typist (90 wpm) | 0.12 |
Average skilled typist (55wpm) | 0.20 |
Average non-secretary typist (40wpm) | 0.28 |
Typing random letters | 0.50 |
Typing complex codes | 0.75 |
Worst typist (unfamiliar with keyboard) | 1.20 |
TP-Pointing on a display with a mouse | 1.10 |
TH-Homing the hand(s) on the keyboard etc | 0.40 |
TD-Drawing N straight lines, total length I cms | 0.9N+0.16I |
TM-Mentally preparing for executing physical actions | 1.35 |
In fact the time for pointing obeys Fitt's law (i.e. it is proportional to the log of target distance divided by target size). The model predicts the execution time for a particular method, that is a routine sequence of operations to achieve a unit task. Examples of methods and their encodings in terms of the model's operation are shown below. Methods for the replacement of one 5 letter word by another are given for POET (a line editor) and DISPED (a display editor using a mouse):
Example Method Encodings | |
---|---|
Replace a word using POET | |
Jump to next line | MK [linefeed] |
Call Substitute command | MK [S] |
Specify new 5-digit word | 5K [word] |
Terminate argument | MK [return] |
Specify old 5-digit word | 5K [word] |
Terminate argument | MK [return] |
Terminate command | K [return] |
T Execute = 4 TM + 15 TK = 8.4 secs | |
Replace a word using DISPED | |
Reach for mouse | H [mouse] |
Point to word | P [word] |
Select word | K [YELLOW] |
Home on keyboard | H [keyboard] |
Call Replace command | MK [R] |
Type new 5-digit word | 5K [word] |
Terminate type-in | MK [esc] |
T execute = 2 TM + 8 TK + 2 |
The important decision in encoding a method is where to put the M operators, representing the points at which the user pauses to organise his operations. A set of 5 rules for placing these M operations or, equivalently, for dividing keystrokes into conceptual chunks are given below. Rules 1 to 4, which delete M operations, suggest ways of designing interactions to improve time performance:
Begin with a method of encoding that includes all physical operations and response operations. Use Rule 0 to place candidate Ms and then cycle through Rules 1 to 4 for each M to see whether it should be deleted.
A set of experiments, in which user performance on tasks was compared with the model's predictions, involved the following systems:
Text Editors | |
---|---|
POET | Line-oriented with relative line numbers |
SOS | Line-oriented with 'sticky1 line numbers |
DISPED | Display-oriented, full page, uses mouse for pointing |
Graphics Systems | |
MARKUP | Uses mouse to draw and erase lines on a bitmap display, commands selected from a hidden menu, which must be redisplayed each time |
DRAW | Lines defined by pointing with mouse to end points, commands selected with mouse from a menu |
SIL | Lines defined by pointing with mouse to end points, boxes defined by pointing to opposite vertices, commands selected by combinations of mouse buttons |
Executive Subsystems | |
LOGIN | TENEX command for logging in |
FTP | Program for transferring files between computers |
CHAT | Program for establishing a 'teletype1 connection between two computers |
DIR | TENEX command for printing a file directory, has a subcommand mode |
DELVER | TENEX command for detecting old versions of a file |
The tasks were as follows:
Editing Tasks (used for POET, SOS, DISPED) | |
---|---|
T1 | Replace one 5-letter word with another (one line from previous task) |
T2 | Add a 5th character to a U-letter word (one line from previous task) |
T3 | Delete a line, all on one line (eight lines from previous task) |
T4 | Move a 50 character sentence, spread over two lines to the end of its paragraph (eight lines from previous task) |
Graphics Tasks (used for MARKUP, DRAW, SIL) | |
T5 | Add a box to a diagram |
T6 | Add a 5-character label to a box |
T7 | Reconnect a 2-stroke line to a different box |
T8 | Delete a box, but keep an overlapped line |
T9 | Copy a box |
Executive Tasks | |
T10 | Phone computer and login (4 char name, 6 char password) |
T11 | Transfer a file to another computer, renaming it |
T12 | Connect to another computer |
T13 | Display a subset of the file directory with file lengths |
T14 | Delete old versions of a file |
The definition of expert required at least 6 months experience of the system, and not more than 1 week since the last use. There was a 20% RMS error in predicting a single task. The error for n tasks is reduced by a factor of sqrt(n). The validity of the five heuristic rules was tested by omitting each in turn; in all cases the error went up. The time for the M operator was obtained by a best-fit calculation, but for values between 1.2 and 2.0 the RMS error was 23% or less.
The model was used to calculate the result of benchmark tests on editors, and a comparison was made between these predictions and experiment:
Calculated | Observed | % Error | |
---|---|---|---|
Text Editors (4 unit tasks) | |||
POET | 60 | 60 | 0 |
SOS | 50 | 56 | -11 |
DISPED | 27 | 28 | -4 |
Average | 5 | ||
Graphics Editors (5 unit tasks) | |||
MARKUP | 59 | 54 | +10 |
DRAW | 55 | 52 | +6 |
SIL | 27 | 31 | -13 |
Average | 10 |
The final example of the use of the model was the evaluation of a proposed additional means of recovering from a mis-typed word. The two existing methods are:
Method R takes a constant 12 sec, whereas method W is faster for small distances and slower for large distances. Users select the appropriate method for the distance in question.
Suggested new method S involves skipping backwards over each word with a control-S, rubbing out the erroneous word with a control-W, inserting its replacement, and then returning to the previous position with a control-R. The three encodings were:
Method W (Backword) | |
Setup Backword command | MK [Ctrl] |
Execute Backward n times | n((1/c)MK[W]) |
Type new word | 5.5K[word] |
Retype destroyed text | 5.5(n-1)K |
T execute = (1+n/c) TM + (1+6.5n) TK = 1.6 + 2.16n secs |
The term n/c takes account of the user's need for feedback on how far back he has gone. Initially, c=4 was assumed but later a sensitivity analysis was performed.
Method R (Replace) | |
Terminate type-in mode | MK[esc] |
Select target word | H[mouse] P[word] K[YELLOW] |
Call Replace command | H[keyboard] MK[R] |
Type new word | 4.5K[word] |
Terminate Replace command | MK[esc] |
Select last input word | H[mouse] P[word] K[YELLOW] |
Re-enter type-in mode | H[keyboard] MK[l] |
T execute = 4 TM + 10.5 TK + 4 TH + 2 TP = 12.1 secs | |
Method S (Backskip) | |
Setup Backskip command | MK[ctrl] |
Execute Backskip n-1 times | (n-1)((1/c) MK[S] |
Call Backword command | MK[W] |
Type new word | 4.5K[word] |
Call Resume command | M2K[ctrl R] |
T execute = (3 + (n~1)/c) TM + (n+7.5) TK = 5.8 + 0.62 n secs |
The model shows that the S method will be faster for an intermediate range of distances, the precise range depending on the value of c, and the user's typing speed (See Figure 1).
Finally, a number of simplifications of the model are possible, but these reduce the accuracy of the prediction:
Keystrokes only | 49% |
Prorated mental time | 45% |
Constant operator time | 34% |
Full Keystroke-level Model | 22% |
The percentages give the Root Mean Square error. The Prorated Mental Time model is assuming that mental time is some fraction of the operation time. The Constant Operator Time model assumes all operations take equal time.