Chilton::ACD::Methodology in Interaction

10.16 On Computer Problem-Solving Interaction

Gergely Krammer

Computer and Automation Institute of the Hungarian Academy of Sciences

Budapest, Hungary

1 INTRODUCTION

The later chapters consider topics mentioned in a list of tentative areas for the Workshop. The earlier chapters consider topics that the author feels should also be addressed.

The author's approach throughout this paper may be characterised as follows:-

Currently, I cannot define Machine Intelligence
I do not expect the Machine to be Intelligent
I simply try to find out how I can make my machine more intelligent

2 IN SEARCH OF A GENERAL MODEL FOR INTERACTION

2.1 My feeling is that we cannot make Man-Machine interaction like man-man interaction because we do not know what this is. On the other hand, the Machine will never be a man and in some respects, may prove to be better than Man. Machine Intelligence and Natural Language Processing are considered promising fields of contemporary research and will play a major role in tomorrow's technology. Currently, it is only being applied in a few specialised subsystems. What we must do is learn from on-going research and try to make our Machines more and more intelligent.

2.2 We have two partners in the interaction - the Man and the Machine. (Since both start with M, we will use the abbreviation H for human and C for computer - he and she respectively!) Both relate to a common problem space which is called the World (W). All the information exchange between the two is expressed in a common language L. This is sometimes divided into an input language (IL) and an output language (OL), input to the Computer and from the Computer respectively. The two overlap as both are being used to describe the same world. We distinguish between them only when one or other process is considered in isolation (Fig 2.1).

Any further subdivision of the partnership will be made symmetrically not necessarily because this is a good model of human intellect, but because this symmetry will help with the description.

2.3 Any subdivision of the two individuals will depend on the aspect of interaction that is being modelled. Pask (PASK-77) describes the process of learning in which there are no separate input/output organs for the partners.

However, if we are interested in the process of sensation/perception, we must treat the I/O process as more or less separated from other parts of the individual.

Figure 2.1

In this paper, we shall follow this approach and, therefore, we define a separate communication part (COMM-C) and another part, COMM-C may be subdivided into an I-device and an 0-device; the first performs all input, while the second performs all output (Fig 2.2).

Figure 2.2

The other part will contain a representation of knowledge about the world and some sort of internal control. The overall control of interaction cannot be in the hands of a third party. It must be controlled by either C or H or, preferably, be distributed between the two (see also Chapter 4).

An obvious further subdivision is shown in Fig 2.3. The Computer consists of:

The internal control (CTRL-C)
Database (DB-C)
The procedural knowledge (PROC-C)

Figure 2.3

This subdivision is similar to the major functions of a computer - data storage, program, I/O and control. A similar model is used by Kutschke (KUTSCHKE - 78). I am convinced that this straightforward model is easy to implement in practical man-machine systems. However, it gives no details of the Internal structure of DB-C and PROC-C. Our experience with programming computers supports this model in Figure 2.3 but I feel that a new approach would substantially improve current systems. My feeling is that, in a more detailed model of knowledge representation, there would be several levels of representation - at least, different kinds of data in DB-C or different kinds of procedures in PROC-C.

2.4 In the course of a dialog, each partner responds to the other (parallel actions will be considered later). Before one responds, one has to understand the other's previous action, prepare the response and, finally, act. In the global model (Figure 2.1), one partner responds to the other and all parts of the individual are involved in the response.

One would like to try to divide the whole of C into components which correspond to the three steps of the response given above, or any other logical sequence of actions. Thus, the main questions are:

What is the sequence of steps in responding?
Is it at all reasonable to split responding into consecutive steps?
What are the components of C corresponding to the consecutive steps?
What are the major properties of each step (and the components of C)?

In the following sections I will try to give an answer to these questions.

2.5 Let us make the arbitrary assumption that there are three steps and, therefore, systems based on this model will have three components:

The I-device of C accepts and interprets a finite unit of input (a sentence) which depends on a sub-set of the total knowledge base
The central part of C which evaluates the input and generates a response dependent on the total knowledge base
The 0-device which outputs the response

This is one sub-division of the response process but it may not be the only one. Also, the programming language used influences the capabilities of the three parts. For example, if language models of the ALGOL 68 - PASCAL era are used, rather than FORTRAN, (Van de Bos - 77) for the I and 0-device or finite automata (BUTLIN DISTAR), a richer definition may be possible. For example, it may be possible to define the formal rules for perception and sensation and even define a program to analyse these. Languages such as PROLOG and CDL may or may not offer a more comprehensive method of expressing these ideas.

The I-device accepts inputs, in sequence, which conform to the rules defined in this high-level language. If this is a complete grammar and vocabulary for the conversation, all problems that need to be resolved are handed to the core part of the Computer.

2.6 Unfortunately, programming languages have been based on the model of the Computer used in Fig 2.3 (CTRL-C + DB-C + PROC-C + COMM-C). This, however, tends to obscure the meaning of the system. The early languages were defined for FORmula TRANslation and, to define algorithms on data of a non-numerical form, these languages are inappropriate. For example, we need to define new data types and we need to describe concepts such as knowledge, reasoning and the ability to adapt.

We must first understand the concepts, find a formalism to describe them and teach the Computer to accept these descriptions and to behave accordingly. This is the position we have reached so far. We can describe certain knowledge types using PROLOG, say, but we have no good means for describing reasoning (that is, the way the Computer behaves) and we have, so far, not considered adaptive behaviour. There are languages more advanced than PROLOG. It is necessary for the following two questions to be resolved:

What do we know about the three components (steps) described above? How can they be represented in the machine and what is the method of processing them?
What practical techniques are currently available (for example, knowledge-based consultants)?

3 THE INPUT DEVICE

In this chapter we will consider input devices and in particular:

Data handling
Context changes
Communication control

The breakdown of input devices into these sub-classes seems to be appropriate in order that we can model higher level input devices constructed from both physical input devices and software components. Several input devices may be combined to form a single higher-level device. This conceptual I-device must be able to handle all inputs to the Computer.

3.1 General Structure of Input Devices

The input device stands between the Human operator and the Computer. Thus, it has an H-face and a C-face (Figure 3.1).

Figure 3.1

The interior of the device box is divided into three sections (Figure 3.2): the DATA section, the CONTEXT section and the CONTROL section (sometimes called the POLLING section). Each is in charge of different types of information. Each section has a register for the type of information stored in that section with a connection through the section boundary which transforms information from one type to another.

Inputs and outputs to a device may be completely symmetric. Through the H-face a register may be changed using a HANDLE and its contents may be checked via an ECHO. The Computer can both WRITE and READ registers. Not all of the components and functions present in Figure 3.2 may be present in a specific physical device.

The Control or Polling section may define a specific priority for man or machine over the other. The device may be polled from time to time, may be read on occasions or may cause an interrupt. The device may control the operator in a similar manner. The next section gives a few examples.

Figure 3.2

3.2 Examples

Example 1

A simple valuator is defined which inputs a value in the range (0,1) with a limited accuracy specified by the operator (Figure 3.2).

D.READ:       8-bit values
D.WRITE:      none
D.HANDLE:     shift potentiometer's handle
D.ECHO:       D-HANDLE's relative position
D.REGISTER:   follow D-HANDLE
CONTEXTS:     only one context
POLLING:      (a) the Operator may act independently of the Computer
              (b) The Computer may act (read) independently of the Operator

Figure 3.3

The interpretation of the data depends on the program and the Operator must be aware of this. Note that, as the Computer and Operator can act completely independently, some kind of auxiliary synchronisation has to be imposed either by the program or by the device.

Example 2

An enhanced valuator consisting of a simple valuator built into a box containing a microprocessor, some pushbuttons, indicators, etc. The device produces values in the range (a1,a2). However, the device may operate in LOCAL mode where the range (a1,a2) may be adjusted.

D.READ:       8-bit values
D.WRITE:      none
D.HANDLE:     shift potentiometer's handle
D.REGISTER:   D.R.1 follows D-HANDLE
              D.R.2 is a value which is transformed into the range (a1,a2)
C.REGISTERS:  a1 and a2, status bits
C.READ:       a1 and a2, status bits
C.WRITE:      a1 and a2, status bits
C.HANDLE:     power switch, local/on-line switch, keyboard to define a1 and a2
C.ECHO:       power-on light, local/on-line lights, digital indicators for a1 and a2
POLLING:      (a) the Operator may change D.R.1 any time. 
                  When P.HANDLE is pushed, D.R.2 is changed accordingly.  
                  P.HANDLE only has effect if P.REGISTER is on.
              (b) The Computer may P.READ and, if P-REGISTER 
                  is zero, the D-READ will return D.R.2 values. 
                  Otherwise the result will be undefined 
P.REGISTER:   'Last data read' bit
P.READ:       shows P.REGISTER
P.ECHO:       shows P.REGISTER
P.HANDLE:     pushbutton
P.WRITE:      reset command

Communication control could be improved. If the Computer is sufficiently fast, it may take the data as fast as the Operator pushes the button and the Operator may not notice. This could be improved if the Computer delays the Operator's action by delaying P.READ and D.READ

Example 3

An intelligent terminal where several devices have been combined to form a single I-device. This I-device provides input to the Computer at a relatively high level.

D.REGISTER:     a linear buffer to store 'sentences' in a 'canonical' representation. 
D.READ:         reads D.REGISTER
D.ECHO:         (a) alphanumeric screen showing the canonical representation of the sentence in textual form
                (b) low-level echo for each datum near the edges of the field of vision.
D.HANDLES:      keyboard, function keys, valuator, digitiser to define positions and tablet to choose
                between commands.
C.REGISTER:     formal description table and status bits which refer to entries in this
C.WRITE:        formal description table, status bits and references to table
C.READ:         all status bits and local/on-line switch
C.ECHO:         power on, local/on-line lights and references to table
POLLING:       (a) D.READ from Computer is always pending.  
                   Once end-of-sentence is transmitted, Computer is interrupted.
               (b) All input tools are enabled.  The activated one is read and interpreted 
                   according to the current context.  
                   Sentences are formally checked and, in the case of error, 
                   the device changes to LOCAL status to allow ERROR to be corrected.

3.3 The I-Device in the Man-Computer dialog

The last example shows how we can combine all the inputs to form a single I-device. In the example, the I-device was a terminal but it could also include the central processor where a part of the program is in exclusive control of the input. The characteristics of this I-device are those described in section 3.1. Other aspects are described in Chapters 2 and 4.

4. CONTROL

Here we consider the internal control of either C or H and the external control - that is the overall control of the communications between H and C. Two problems that arise when considering external control are attention handling and context changes. The type of attention handling defines whether the expected response is decided by the party asking the question, which parties have this power and the constraints on the type of answer.

The context change control is similar but the distinction between different results is made on the basis of contents of specific registers rather than the input devices used. Thus, it is the Computer core which is responsible for context control and it is COMM-C which is responsible for attention handling. In this section we elaborate the characters of three practical solutions.

4.1 Simple, Wait-mode

The two partners alternate and have to wait for the other's response before a new action can be generated. This is a poor solution but if it can be generated simply, it may be satisfactory. As the Human does not stop thinking while he waits for a response, it is possible that C can be improved similarly by providing housekeeping algorithms that are run in the WAIT state and are automatically killed in the case of an interrupt and not restarted.

4.2 Foreground Dialog with Background Processing

Conceptually, the Computer consists of two processors:

a foreground attention-processor which is fast enough to process most inputs from the Man in real-time.
a background processor for requests which cannot be processed in an acceptable time

The Human is only in direct contact with the foreground processor. If necessary, this processor transfers jobs to the background processor, informs the Human and is then ready to continue the dialog. It is possible to launch any number of background processes which, in the simple case, run independently.

The internal control is always informed about the status of background processes. These store their results in the data base (DB-C) common to all processes (and both processors). The Human is able to enquire about the status of processes or the contents of DB-C. Also, he may ask the foreground processor to kill running processes. In practice, a multi-task or two-task operating system may implement this system using a single processor.

Preferably, the attention processor allows all possible input devices to be used and accepts lexical tokens from these and they are further processed by the syntax processor. The context control is left to the background processor.

Given that widely available hardware/software techniques are used this dialog control may provide a good trade-off considering the flexibility of interactions, the ease of implementation and the ease with which the Operator can comprehend the system.

The consistency of overlapping models used for different aspects of the H-C interaction may need some explanation. The dialog processor, as a conceptual unit of the hardware configuration is COMM-C (= I-device + 0-device) which specifies the format of I/O sentences and the processors associated with them (possibly in some high-level language). In this model, the control (polling) function used for the I-device is the supervision of background processes which may be specified by DB-C + PROC-C. If more than one process may run in parallel, a sophisticated CTRL-C may allow them to cooperate.

Figure 4.1

4.3 Parallel processes in Human and Computer

Currently, we do not know how the Human performs processes in parallel but we can assume a well defined kind of synchronisation for the Computer (we will not define it more precisely here). The Human knows the form taken by this synchronisation and is able to use all the commands available that control processes - start new process, kill process, enquire status, wait for process, etc.

The Computer may ask difficult questions and the Human may not answer immediately. These are processed differently by the Human but, as we are assuming symmetry, the same synchronisation must apply to both Human and Computer.

5 INPUT OF COMPLEX AND STRUCTURED OBJECTS

This is not as simple as pushing a button. Logically, it consists of two parts: the input of structure information and the input of leaf-data at the end of a tree, bush or forest (if the same automatic scanning mechanism is used the structure information may already be in the Computer).

Considering the structure of input data, this may be transformed in the Computer ready for further processing. The input structure is satisfactory for output and, in this respect, as well as many others, input and output are symmetrical. The difference is in the Human and Computer's ability to understand the structure and thus use it in reasoning. Unfortunately this is necessary for checking or accepting the data. The Human is, perhaps, worse than the Computer in expressing things as he uses a relatively strict sequential method of sentence structure.

If you are telling the editor why your paper is late, er .. you may do it in an ill-formatted and redundant natural language conversation. However, you cannot say the height is 3600 feet and the height is an attribute of the hill in a better way than saying exactly that.

Input may consist of two passes - first the structure information followed by the leaf-data - or the two may be intermixed. If most inputs have the same structure it is best to input the structure information first and only use references to this when inputting the leaf-data.

The structure of Human input should be simple, not to help the Computer in understanding it, but to ensure that the Human understands what he is doing and to minimise errors. Large data bases may consist of several separately-maintained files and the Computer may automatically recognise interdependencies on the basis of some structure description.

Leaf-data may be provided by either digital or analog devices (key boards, valuators, or on-line measuring devices). Structures may be defined using formal specification methods (as in data structures). Structure references may be input using function keyboards, menus or text strings.

A major difference between input and output is that while output is produced by some Computer algorithm and is, therefore, error free, the Human input is error-prone and, therefore, requires echoing and local editing. It may not be quite that simple. If the Computer does not have an algorithm for the output, the Human may help to compile the correct output and this is the same as the case where the Computer helps to find the next data value to be input.

6 CHARACTERISTICS OF THE INTERACTION PROCESS PER SE

The quality of an interactive system is subjective - I do not like this or that. This chapter is similar!

6.1 The Dialog is not Intelligent Enough

A natural limitation on the intelligence of the dialog is the intelligence of the Computer. However, the dialog may be less intelligent than the Computer (some intelligent people are poor at conversation). It is possible for the Computer to understand natural language but it may not be possible to implement it cost effectively.

There are two separate problems to solve:

The sentence, I have seen a girl with long blonde hair 170 cm tall is easily understood and it is possible to allow a free order to the phrases in the sentence and the Computer may even add words that are missing, (for example, tall).
The sentence Yesterday ..... blonde having hair as long as you is not as simple to understand. Firstly, the Computer needs to know a wider context (it must recognise that the subject is girls, that the topic is hair and who you is and what is your height. Sentences of this type would not be understood by contemporary CAD systems.

6.2 The Control of the Dialog

If the nature of the subject dictates then let the Computer ask questions or vice versa. However, in a proper conversation, partners respond to each other rather than simply ask questions. How much must the two partners adapt to each other? Sometimes two people cannot adapt at all:

How do I react if my partner does not respond because he is involved in some other processing?
How often does this occur?
May I change the context and return to the current one later?
Do I accept if the Computer wishes to change the context?

In a good dialog, partners trust each other. For example, if one sees the other dreaming, he may repeat the request. Also you may answer a question that is posed as a result of a question posed by you.

6.3 Understanding Context

Context is defined with respect to the I-device which interprets the input rather than expecting the response to depend on the whole information available to the total system. To interpret natural languages, ever-widening circles of context are required. Automata theory suggests that a state Table should be defined, but it may be that the Computer loses control of the complete system.

A possibility is that we define only a few states which reflect natural stages in the dialog.

6.4 Professional Competence

A program should not define a rectangle or a box as such if it defines a, for example, biharmonic exemplifier unit. Users, accepting iconography in graphics, may not be able to implement solutions which include current practice. Both implementors and users need to compromise.

6.5 Professional Completeness

The Computer should understand all possible solutions in a specific context. This is more important than understanding all possible logical solutions or mathematically equivalent ones.

6.6 The Human's Response Time

Man's response time is made up of three components: thinking, sentence forming and action. The action is as fast as the Computer's input devices will allow. The sentence is made up of words and time increases rapidly with the number of words (the magic number 7 is a kind of psychological barrier after which man tends to get confused).

Barkochba is a good game but single word answering gets boring after a while. Sentences with 2 to 4 words are best intermixed with a few single word ones and an occasional long one. The required thinking time might follow a similar distribution.

6.7 Matching input device and parameter

Analogue input should be used when the nature of the parameter is such that a high accuracy digital input is not required.

If something has to be moved, then move the object in preference to supplying the new position.

Inputting a high precision drawing directly might be worse than drawing a sketch first and then specifying accurate coordinates.

Ergonomical considerations

Factors that need to be considered include: chairs and tables, lighting, wallpaper, display screen phosphors, handles within easy reach, the arrangement of indicators, position of the drawing area etc.

7 SUMMARY

H and C
C = COMM-C + OTHERS
COMC = I-Device + 0-Device
OTHERS = CTRL-C + DB-C + PROC-C  ?
Responding = perception -> processing -> action ?
LET Responding = I-Dev-action ->
                 Central action ->
                 0-Dev-action.
I-Device and 0-Device described in a programming language. 
And OTHERS  ? 
Knowledge, reasoning, adaptability. 
I-Device is like an i-device.
Foreground conversation with background task performance. 
Concurrent processing ?

8 BIBLIOGRAPHY

PASK-77 Pask, G.: Aspects of Machine Intelligence, Graphical Conversation Theory MIT Press, 1977

MARKUSZ-78 Markusz, Zs.: Theoretical Problems of Intelligent Interactive Systems (Manuscript in Hungarian)

GAINES-78 Gaines, B.R.: Man-computer communication - what next? Int. J. Man-Machine Studies (1978) 10, 225-232

GRENOBLE-78 Artificial Intelligence and Pattern Recognition in Computer Aided Design Proceedings of IFIP WG 5.2 Working Conference

SANDEWALL-78 Sandewall, E: A Survey of Artificial Intelligence Invited paper at GRENOBLE-78.

IFIP-77 Proceedings of IFIP Congress 77 North-Holland (1977)

ILP-78 Hagen, T., ten Hagen, P.J.W., Klint, P., Moot, H.: The Intermediate Language for Pictures, in IFIP-77

NEGROPONTE-78 Negroponte, N: On being creative with CAD, in IFIP-77

BUTLIN-76 Butlin, G.A.: Techniques for processing interactions in FORTRAN. Notes prepared for the Seillac-I Workshop

DISTAR-73 Forgacs.T., Krammer, G.: A dialogue generator (in Hungarian) MTA SZTAKI TANULMANYOK, 1973

BOS-77 van den Bos, J.: Definition and use of higher-level graphics input-tools, in IFIP-77

KUTSCHKE-78 Kutschke, K.: Beispiel einer interaktiven Programentwicklung Rechentechnik-Datenverarbeitung 1978 Beihelf 4, pp. 25-28

CHANG-73 Chang, C.C., Keisler, H.J.: Model Theory, North Holland (1973, 1977)

OTHERS This bibliography is far from being complete and/or representative.