Chilton::ACD::DCS Projects: Newcastle University

DCS Projects: Newcastle University

DR P E LAUER

DESIGN AND ANALYSIS OF HIGHLY PARALLEL DISTRIBUTED SYSTEMS

January 1979 - December 1981

Many of the problems facing the designer of systems may be regarded as being, in essence, purely synchronic, in the sense that what is required is that the design establishes certain desired relationships between occurrences of certain events. An example of such a synchronic problem is the reader/writer problem, in which one is not concerned with what is being written or read but only with ensuring, as a consequence of a design specification, that occurrences of the (uninterpreted) events read and write satisfy certain synchronic restrictions. Other such problems are those involving the phenomena of deadlock and starvation. In general, these problems involve 1) isolating from a specification or description those aspects which have to do with determining behaviour considered merely as an ordered set of occurrences of uninterpreted events, 2) understanding the relationships between specification and that which is being specified, and 3) using this understanding to verify and/or modify one's specification vis a vis some given design desideratum. A formal study of such problems clearly requires some means of formally representing that part of a specification which has to do with the establishment of desired inter-relationships between occurrences of events, a way of formally representing behaviour as an ordered set of event occurrences and the possibility of formally relating the two for the purpose of analysis and verification.

Peter Lauer speaking at DCS Conference, University of Sussex, 1984

Full image ⇗
© UKRI Science and Technology Facilities Council

The COSY (COncurrent SYstem) notation is a formalism which may be used to describe systems in their synchronic aspects. A system from the COSY point of view consists of a number of sequential processes and a number of distinct resources. Each process is described by a process expression, essentially a regular expression which prescribes the order in which the process may execute an operation named in it. With each resource is associated a set of path expressions [3], also regular expressions, which express constraints on the order of execution of operations associated with the resource. Thus, the activity of the system will consist of the processes activating their particular operations according to their process description and in such a way as to obey, taken together, the constraints on execution order enshrined in the resource descriptions.

Furthermore, a number of ways have been developed to associate with each COSY program a collection of objects which formally model possible behaviours permitted by the program. This means that the notation and its interpretation provide a framework within which may be carried out a formal study of the specification of concurrent systems and of that which they specify, in abstraction from those aspects which are not essential to a consideration of synchronic structure - aspects such as implementation details. This is perhaps analogous to the difference between verifying that an algorithm does evaluate some function (a problem in pure mathematics) and verifying that some specific program in some specific language implements the algorithm (a problem in informatics). This should not be interpreted as meaning that the Newcastle group is not ultimately concerned with implementation - indeed implementation is a problem to which they intend to address themselves - but that it regards the two considerations as capable of being usefully studied independently.

It is, of course, of little use to know merely that an object, say a COSY program, may be regarded as an abstract specification because it defines objects which may be interpreted as describing behaviour. A system is supposed to do something in particular. We require to understand what particular specifications specify. Given the scope of concern of the group to be that of abstract specification, it follows that it has a commitment to the development of formal results for expressing this understanding. But not only that; the work of the group should not be merely the development of abstract mathematics for its own sake. If it is to be of use in the solving of real problems, the notation should be rich enough to express conventional structuring methods (abstraction. hierarchy. modularity) and should be appealing to use as a tool for specification and analysis [17][18].

Thus the group is interested in structure in a number of ways.

Structuring a program to improve ease of writing it, to represent such things as abstraction, hierarchy, modularity and so on; the research effort in this area has led to the development of a macro notation [7][9][17][18] to support concise definitions of regular structures in programs.
Structure of objects in the behaviour domain - as pertinent to important synchronic properties such as deadlock and starvation - and how this is determined by the structure of the specification objects. The formal theory of path expressions and adequacy is a development in this direction.

There is a fruitful interaction between these two initiatives. For example, (b) may influence (a) in suggesting certain structures as useful in helping the designer to see what his specification does; (a) influences (b) in suggesting directions in which to direct the analysis of the relationship between particular structures of a specification and useful properties of their corresponding behaviours.

Both the design notation and the theory have been used to analyse and compare some existing operating system components for parallel and distributed systems with regard to their adequacy, non-starvation, degree of concurrency and degree of distribution.

Furthermore, the design methodology has been used to develop new and more concurrent and distributed systems to perform the same function as more conventional operating systems [1] [4][9][10] [17][18][19]. The intention is to continue these endeavours and especially to explore mechanisms for directly implementing systems specified in the design notation without loss of concurrency or distribution. By the end of the project. it is hoped to have:

A fairly high-level design tool based on the COSY notation and of a high degree of flexibility and reliability.
An analytic apparatus capable of dealing with objects produced by this design tool.
Some notion of the ways in which a specification written using this methodology might be implemented. In particular it is expected to have generated a collection of non-trivial example systems capable of implementation.

Progress to September 1979

At that time the project consisted of Peter Lauer, Mike Shields and a Ph.D. student, J. Y. Cotronis. They have had a number of visitors from abroad who have spent between three weeks to two months with the project. These visitors have made a number of contributions to the development of the notation and results obtained during the project in the past and recent visits by P.R. Torrigiani (GMD Bonn) and R. E. Devillers (Universite Libre, Bruxelles) were particularly beneficial.

In the first nine months of the project since January 1979, the investigators have been occupied with the following tasks:

They have completely revised the final report of the previous project during the period 1976-78 and it has appeared as a 130 page technical report. It is an exhaustive record of the basic COSY notation and the formal results concerning it obtained during that period. It replaces all Newcastle departmental technical memos from ASM/0 - ASM/45. The report has been extensively circulated and in particular it formed the supporting text for a series of four lectures by Peter Lauer at the EEC/CREST Advanced Course on Abstract Software Specification, Denmark, January 22 February 2, 1979. This will form a basic text for new members of the project and visitors to the project [12].
They have written a 50 page paper [17] which has appeared in Acta Informatica this year and which is an Introduction to the COSY Notation developed in the period 1976-78. It contains over 50 example system descriptions.
They have written a 50 page paper which presents in a gentle and didactic manner both the notation and the formal results of the Final Report mentioned above [12].
They have begun a very careful reconsideration of the design notation and this has already led to several useful extensions of the notation. This work is documented in ASM/55 - ASM/63.
They have developed a number of entirely novel, highly concurrent and distributed Banker's algorithms which do much to convince them that the notation developed in the project really does encourage novel system design more than any other notational suggestion they are aware of at present. These Banker's algorithms are documented in ASM/55 - ASM/57 and in [19].
They have substantially strengthened their formal analytical tools and obtained a number of new results enhancing their ability to demonstrate the presence or absence of important global system properties like partial and total system deadlock [[14].

It has been mentioned that COSY programs may be interpreted as abstract system specifications in the sense that each such program determines a collection of objects which model behaviour. Behavioural properties, such as deadlock, could then be defined in terms of these behavioural objects, and rigorously studied. A large part of the work of the project during its initial period was concerned with the study of such a property, the property of adequacy, which corresponds to an absence of partial system deadlock. Several new results were obtained and applied to demonstrate the proof of adequacy of a number of programs written in the notation [8].

The development of these techniques involved treating programs in the notation as grammars generating string languages, whose elements, called firing sequences, could be thought of as sequentialised forms of possible histories of systems specified by these grammars. The problem then could be expressed as one of generating techniques of deducing properties of a string language from the textual structure of the program generating it.

This linguistic approach was used to devise and verify a grammar generating a subset of the notation which had the property that every program generated by this grammar would be adequate.

Technical details may be found in the final report of the previous period of the project.

The ideas underlying both the grammatical approach to programs and the syntactic generation of correct systems have been further developed and extended in the period since the end of the project's initial period. Take the grammar approach first.

Although a sequential representation of possible histories of a concurrent system is sufficient for certain purposes, such as the study of adequacy or freedom from deadlock, it fails to completely model behaviours; information about concurrency is lost. There is a difference between concurrency and arbitrary interleaving. Witness the existence of traffic lights which arbitrarily interleave streams of traffic as opposed to allowing them to attempt to cross the area at the centre of a junction concurrently. Indeed, traffic lights may be thought of as implementations of a solution of a synchronisation problem involving fair access to a shared resource by streams of processes from two classes, where two processes from the same class might be allowed concurrent access to the resource. (Compare with the various subspecies of the reader-writer problem.) It is hard to see how a synchronisation problem such as the above - and indeed any synchronisation problem whose point is that concurrency causes problems could be even stated formally using merely a sequential model of behaviour. To take another example, in [13], it was demonstrated that programs using only Agerwala's extended semaphore primitives [15] could be given a formal semantics in terms of COSY, it had to be shown that an ESP program and its corresponding COSY program gave rise to the same set of concurrent behaviours. A formal means of representing non-sequentiality was therefore necessary.

The model of COSY programs as grammars was therefore modified and they are now interpreted as defining languages whose objects are not strings but vectors whose coordinates are strings, the vector firing sequences. These may be shown to have the same modelling power as more conventional models of concurrent behaviour, such as occurrence graphs [16] but have the advantage that they may be manipulated in the same manner as strings. This has certain technical advantages also. Vector firing sequences were used in [13] and later in [14][18][20].

[14] contains results greatly extending those behind the syntactic generation of adequate programs. The central idea here is that of a compound substitution. Essentially, a compound substitution is a textual modification which involves the replacing of an operation at various points of a program by regular expressions. The operation may be replaced by two different expressions at two different points in the program. They are interested in those substitutions which preserve essential properties, such as adequacy. There are two advantages to be gained from an understanding of such matters: firstly. such results may be used to reduce large specifications to smaller ones having the same adequacy properties - leading to a simplification of verification by, say, simulation - and secondly, an understanding of adequacy preserving transformations of this kind allow one to structure one's specification in a manner that guarantees that it will possess certain properties. This illustrates the fruitfulness of the interaction between notational and theoretical development. The analysis of a specification has the effect of implicitly decomposing it into parts with specific properties. One might well conceive of a catalogue of such parts, together with rules determining how they might be composed together into systems - the reverse of the aforementioned decomposition. Using this catalogue, one may build only adequate systems. It is heartening that notational forms ideal for expressing such composition have already been defined - the class-like objects of [3][9][17][18].

Progress to September 1980

The earlier work on the Banker's problem [10] has led to the consideration of new additions to the formalism. Specifically it is now possible to introduce predicates on integers which may then be used to constrain replication of regular expression schemata. This work is discussed in [12][19][20].

Their interests in the formal aspect of the project have broadened.

In the past, they concentrated on the study of a property called adequacy; a system is adequate if it is free from partial deadlock. Work in this area has continued and further results have been obtained about compound substitutions. A compound substitution is a rather special kind of textual transformation.

P ⇒ S(P) ; P, a path expression.

Such a transformation induces a transformation of the behaviours specified by P.

VFS(P) ⇒ VFS(S(P)).

Here VFS(P) is a set of vectors whose coordinates are strings; such vectors may be used to model discrete asynchronous behaviour.

The interesting cases are those in which the nature of the transformation S allows one to construct a transformation of string vectors

VFS(P) ⇒ T(VFS(P))
with
T(VFS(P)) = VFS(S(P))

From such an equation, much may be deduced about the set VFS(S(P)), given a knowledge of VFS(P) and T. For example, for the right sort of substitutions S one may deduce the adequacy of S(P) from that of P.

As well as continuing their researches into the adequacy property, they have also looked at problems of the following type: does this program do what I want it to do? Examples of such problems follow.

In [13] they gave rules for translating any program E consisting of collections of deterministic cyclic processes synchronised only by Agerwala's extended semaphore primitives (ESP programs) into a COSY program, COSY(E). They ask the question: does the behaviour of COSY(E) accurately model that of E?
In [23] they gave a rule for constructing, for a given train set layout represented as an undirected graph G, and a set of m trains running asynchronously on it, a path program P(G,m). They ask the question: does P(G,m) accurately model the system?
In [20][21] they constructed a class of programs RRM(N), which for any integer N, describes a highly concurrent resource release mechanism. They ask the question: does the program correctly release resources and with the required degree of parallelism?

In all these problems, they have found useful a method of analysis based on the use of interpretive functions and predicates defined on vector firing sequences and operations.

They feel that a general, formal framework for such verification would be a useful part of the development of our formal understanding of the notation.

In addition to the above, work is currently going on to extend the notion of vector firing sequences to vectors whose coordinates may have infinite length. Such a model would be of use in formalising a notion of starvation: current formulations rely on infinite strings. It seems at present that sets of infinite strings have properties which make them unsuitable for describing infinite concurrent behaviour. Work completed so far suggests that sets of infinite vector firing sequences have useful structural properties - in certain circumstances, they are complete partial orders, for example - which would make them interesting objects in a study of asynchronous languages, of which languages defined by path expressions seem to be the regular ones.

In this period the group have also addressed themselves to the problem of the fixed priority symbol >, first introduced by Roy Campbell in his early work on path expressions. Roughly, the interpretation of something like path a > b end appearing in the COSY program was supposed to be that executions of a and b are mutually exclusive and that whenever both a and b are enabled, then a would always be chosen to execute. Following conversations with Dr Lauer, Dr Gerard Roucairol, of the Institue de Programmation, Paris VI. spent some time analysing the consequences of introducing fixed priority into path expressions and has obtained interesting results to do with the preservation of dynamic properties such as freedom from partial or total system deadlock. The analysis was, however, on an earlier, interleaving model of behaviour, the model using firing sequences. An attempt to apply the firing sequence definition of the modification of b behaviour induced by the introduction of > to vector firing sequences, which are based on a non-interleaving notion of concurrency, produces contradictions. the following example illustrates the problem:

path c;a end path a>b end

Here, one might argue that both b and c are enabled initially and since a may not execute until c has, the priority constraint does not prevent b from executing. On the other hand, once c has executed, both a and b are enabled and the priority constraint prevents b from executing. The trouble with this argument is that it implicitly assumes some order in the executions of c and b, which is nonsense, as the two operations are concurrent. This implicit assumption is not easy to see in the above informal description, but shows up very clearly when the description is couched formally in terms of vector firing sequences. This is an example of a situation in which the interleaving model fails to adequately handle concurrency.

Work by Dr Shields, in collaboration with Dr Roucairol, has gone on to devise a semantics for > by defining a translation which transforms a COSY program containing the construct into one not containing it. They are presently engaged in preparing their material for a technical report.

Another area in which the interleaving model seems to break down is that which concerns infinite concurrent behaviours. One might use the notion of an infinite behaviour to formally define concepts involving the idea of something or other never occurring, concepts such as starvation. Here the problem is that an infinite string may not correspond to a behaviour which is maximal and that using an interleaving model one may deduce, from the existence of an infinite string in which some operation does not occur, that there is a behaviour in which this operation never executes, when in fact this infinite string does not represent maximal behaviour and is indeed the beginning of a maximal history in which the operation does occur. The usual way of dealing with this problem is to assume some kind of finite delay property. It appears to the group, however, that it is more natural to use a model in which such assumptions emerge as properties of the model. Dr Shields has been looking into infinite vector firing sequences and has benefited greatly from conversations with Professor Maurice Nivat, from the University of Paris, VII, whose own work on rational relations is strikingly close to the vector firing sequence approach mentioned above.

Work is also under way on the study of a class of path expressions with a freechoice property and some initial results have been obtained [25].

Finally, in the analysis of certain specific examples [20][23], the group has begun to interest itself in the verification of what they call specific properties of COSY programs, properties, that is, which are to do with the program describing some specific system such as a parallel resource releasing mechanism or a train set. This work has led the group to consider the idea of interpreted COSY programs, in which the operations are equipped with interpretations, say as functions on some domain.

The train set is discussed in [23]. The parallel resource release mechanism, which is the kernel of the hyperfast banker described in [[19] was formally analysed in [20].

Progress to September 1981

The banker's problem has led to the consideration of new additions to the formalism. Specifically, it is now possible to introduce predicates on integers which may then be used to constrain replication of regular expression schemata. A paper, using this new notation to specify a highly parallel and distributed non-computational solution to this problem, written by Dr Lauer in collaboration with R Devillers and P R Torrigiani, two frequent collaborators with the project, was delivered at the 4th International Symposium on Programming [19]. These predicate replicators have also been discussed in [12] [20] [21] [18]. The ordinary replicators [17] have been redefined and have been implemented as macros in this new form. The redefinition of these replicators has led to greater power for expressing symmetry where it exists, and makes explicit the close relationship between distributors and replicators [24]. The syntax of the macro notation has now been defined by John Cotronis in such a way that:

any program which is correct under the production rules for the macro notation will after its expansion be a correct program in the basic notation.
the macro syntax rules are an extension of the syntax rules of basic COSY so that the relation between basic and macro COSY are more easily understood.
the macro elements (replicators and distributors) produced from it generate a large class of substructures of basic COSY. Some restrictions imposed on what replicators can generate where necessary to aid the understanding of the semantics (behaviour) of the generated substructures. Distributors were extended to be able to generate more sequences more economically than replicators.

The rules for expanding the macro elements are given by precise mathematical formulae based on iteration.

Dr Lauer has begun to concern himself with the problem of formalising the notation of distribution of control (decision capability) in highly parallel systems. There appears to be at present no formal definitions of this notation in the literature and in [25][26] Dr Lauer develops a number of possible formulations of this notation with the COSY formalism using the notion of free-choice paths developed by Roucairol and Shields.

A careful comparison of COSY with Prof Hoar's models of Communicating Sequential Processes (CSP's) has commenced. Initial thoughts on this topic have been formulated [27]. In particular it is pointed out that notationally COSY and CSP have the same power for expressing synchronisation of parallel processes, but that the semantics of the two notations is radically different. COSY uses a non-interleaving semantics of concurrency and CPS's are explained via an interleaving semantics. The paper points out some fundamentally important differences between these two approaches. The investigators believe they can give a vector firing sequence semantics of CSP's which may allow them to transfer their results concerning deadlock and starvation properties to CPS's [28].

The project has now entered its third phase. In this phase the main emphasis will be on the development of a prototype programming environment which will implement the programming, analysis and vindication tools developed during the project. The investigators will attempt to do this in a manner which will reduce the rules require for writing and analysing COSY systems to a minimum for ease of learning COSY, and which will use the computer to advantage to remove from the designer the tedium of detail (and largely mechanical) but necessary development of this understanding of systems specified in COSY.

To place the COSY effort within the context of software engineering environments, the investigators are concerned with extending the syntax-checking facilities of conventional programming environments to support the interactive design, analysis and implementation of potentially high concurrent and distributed systems.

John Cotronis has mainly concentrated on the development of an interactive program which aids the development of correct COSY programs. The Program, written in SIMULA, provides the user with commands invoking certain actions. The main actions that the program can perform are:

Accept a COSY program in a new form, i.e. no lower case letters or underlining either from a file or the terminal.
List, print or save in a file the given program.
Check if the syntax and the semantics of the grammar are obeyed giving when necessary appropriate error messages. The parsing technique is by recursive descent. The parsing tree is also constructed.
From the parsing tree a structured layout of the COSY program can be produced which tries to fill as much as possible lines of a given width. If however, a whole syntactic unit cannot fit on the same line it is broken into several units by indenting when necessary. This improves the readability of COSY programs, especially when they contain long paths or processes.
Either the original or the structured program can be pretty printed containing lower case letters (for simple operation) or underlining of reserved words.
The program can store several COSY programs, so that the user can develop different programs or several versions of the same program in the same run.

Up to now the program checks correctness of basic COSY programs. It will be extended to check correctness of the macro COSY programs.

Brian Hamshere, who joined the project in February 1981, has been taking some initial steps towards and implementation of the COSY language. To this end the development of a syntax analyser has begun which works with a representation of the source text which makes it randomly accessible rather than restricting one to the usual inflexible serial access enforced by purely linear character strings. The idea of implementing COSY by expanding COSY operations into a target language which is already implemented has begun, in contrast. for example. to Roy Campbell's approach of adding (open) paths to a serial language as done in Path-Pascal.

Staff

The project consists at present of Dr P. E. Lauer, J. Y. Cotronis, and Mr B. Hamshere. Mr Hamshere joined the project on 13 February 1981.

Visits

During September 1979, Dr Lauer visited the Institute de Programmation Paris VI, to collaborate with Professor C. Girault. Dr G. Roucairol and Dr G. Berthelot, who have been working on describing and analysing structural properties of Petri Nets. During March 1980, Dr Shields visited the Laboratoire Informatique Theoretique et Programmation Paris at the invitation of Professor Nivat. Dr Shields also visited Poitiers and IMAG, Grenoble.

References

1. R. E. Devillers, P. E. Lauer, A General Mechanism for avoiding Starvation with Distributed Control, Information Processing Letters, Vol 7, No 3, April 1978.

2. P. E. Lauer, E. Best, M. W. Shields, On the Problem of Achieving Adequacy of Concurrent Programs, Proceedings of IFIP Working Conference on Formal Description of Programming Concepts, Aug 1977, North Holland.

3. P. E. Lauer, R. H. Campbell, Formal Semantics for a Class of High-Level Primitives for Coordinating Concurrent Processes, Acta Informatica, 5, 1975, pp297-332.

4. P. E. Lauer, M. W. Shields, Abstract Specification of Resource Accessing Disciplines: Adequacy, Starvation, Priority and Interrupts, Tech Report 117, University of Newcastle upon Tyne, 1977. SIGPLAN Notices, Vol 12, No 13, 1978.

5. P. E. Lauer, M. W. Shields, Design and Analysis of Highly Parallel and Distributed Systems, ASM/46, University of Newcastle upon Tyne, April 1978.

6. P. E. Lauer, M. W. Shields, E. Best, On the Design and Certification of Asynchronous Systems of Processes: Final Report: Part 1: COSY - a System Specification Language based on Paths and Processes, ASM/49, June 1978. 'Part II: Formal Theory of the Basic COSY Notation'. ASM/45, March 1978, University of Newcastle upon Tyne,

7. P. E. Lauer, P. E. Torrigiani, Towards a System Specification Language based on Paths and Processes, Tech Report 120, University of Newcastle upon Tyne, 1978.

8. M. W. Shields, P. E. Lauer, On the Abstract Specification and Formal Analysis of Synchronisation Properties of Concurrent Systems, Proceedings of the International Conference on Mathematical Studies of Information Processing, Aug 1978, RIMS Kyoto, Japan, Lecture Notes in Computer Science, No 75, 1979, Springer Verlag.

9. P. E. Torrigiani, P. E. Lauer, An Object Oriented Notation for Path Expressions, AICA (Associazione Italiana per il Calcolo Automatico), 1977, Annual Congress 3rd Vol, Software Methodologies, pp349-371, Oct 1977.

10. R. E. Devillers, P. E. Lauer, P .R.Torrigiani, A Distributed Banker, Technical Report Series, Universite Libre de Bruxelles, Belgium, 1979.

11. P. E. Lauer, M. W. Shields, E. Best, Formal Theory of the Basic COSY Notation, Technical Report 143, Computing Laboratory, University of Newcastle upon Tyne, November 1979.

12. P. E. Lauer, M. W. Shields, E. Best, The Design and Analysis of Highly Parallel and Distributed Systems, Advanced Course on Abstract Software Specifications, Lyngby, Denmark, Lecture Notes in Computer Science No 86, Springer Verlag, 1979. Also as Tech. Report 142, Computing Laboratory, University of Newcastle upon Tyne, 1979.

13. M. W. Shields, P. E. Lauer, A Formal Semantics for Concurrent Systems, Proc 6th International Colloquium, on Automata, Languages and Programming, Graz, July 16-20, 1979, Lecture Notes in Computer Science No 71, 1979, pp 569-584. Also Tech Report 132, Computing Laboratory, University of Newcastle upon Tyne, 1979.

14. M. W. Shields, Adequate Path-Expressions, Proc. International Symposium, Semantics of Concurrent Computation, July 2-4, 1979, Evian-Ies Bains, France, Lecture Notes in Computer Science No 70, 1979, Springer Verlag. Also Tech. Report 141, Computing Laboratory, University of Newcastle upon Tyne, 1979.

15. T. Agerwala, Some Extended Semaphore Primitives, Acta Informatica Vol8, pp 201-220, 1977.

16. A. W. Holt, Information System Theory Project, Tech Report RADC-TR-68-305, Applied Data Research Inc., 1968.

17. P. E. Lauer, P. R. Torrigiani, M. W. Shields, COSY: a system specification language based on paths and processes, Acta Informatica, Vol 12, pp 109-158, 1979.

18. P. E. Lauer, M. W. Shields, COSY: An Environment for Development and Analysis of Concurrent and Distributed Systems, Proc. of Symp. on Software Engineering Environments, Lahnstein, June 1980, North Holland . Also as Tech. Report 153, Computing Laboratory, University of Newcastle upon Tyne, 1980.

19. P. E. Lauer, P. R. Torrigiani, R. Devillers, A COSY Banker: Specification of Highly Parallel and Distributed Resource Management, Proc. 4th Int. Symp. on Programming, Paris, 22-24 April, 1980. Lecture Notes in Computer Science No 83, Ed B.Robinet, Springer Verlag, 1980. Also as Tech. Report 151. Computing Laboratory, University of Newcastle upon Tyne, 1980.

20. M. W. Shields, P. E. Lauer, Verifying Concurrent System Specifications in COSY, Proc 8th Symp. on Mathematical Foundations of Computer Science, Rydzyna - Zamek, Poland, 1980. Lecture Notes in Computer Science, Springer Verlag (No 88, 1980). Also as Technical Report 152, Computing Laboratory, University of Newcastle upon Tyne, 1980.

21. M. W. Shields, P. E. Lauer, Programming and Verifying Concurrent Systems in COSY, Proc. 8th Spring School of Theoretical Computer Science, 'Petri-nets and Parallelism', Colleville-sur-mer, France, May 1980.

22. M. W. Shields, A Note on Free-Choice Paths, ASM/71, University of Newcastle upon Tyne, April 1980.

23. M. W. Shields, COSY Train Journies, ASM/67, University of Newcastle upon Tyne, November 1979.

24. J. Y. Cotronis and P. E. Lauer, A syntax for the Macro Notation, ASM/83, University of Newcastle upon Tyne, February 1981.

25. P. E. Lauer, Trip Report on Software Engineering Environments Symposium, held 16-20 June 1980, at Lahnstein, Federal Republic of Germany, ASM/75, University of Newcastle upon Tyne, June 1980.

26. P. E. Lauer, Project on Design and Analysis of Highly Parallel Distributed Systems, Interim Progress Report, December 1980, ASM/79, University of Newcastle upon Tyne, December 1980.

27. P. E. Lauer, Synchronisation of Concurrent Processes without Globality Assumptions, ACM SIGPLAN Notices, Volume 16 , Issue 9 (September 1981) and also Technical Report 163, Computing Laboratory, University of Newcastle upon Tyne, March 1981, to appear in SIGPLAN Notices.

DR P E LAUER

A COMPUTER BASED ENVIRONMENT FOR THE DESIGN AND ANALYSIS OF HIGHLY PARALLEL AND DISTRIBUTED OPERATING SYSTEMS

July 1982 - June 1984

Background. General Goals and Achievements

The advent of LSI and VLSI technology and microprogramming techniques has greatly increased the number of choices a digital system designer has for decomposing a system into ultimate subsystems which grow considerably more powerful and complex in function as technology progresses. Furthermore, designers have been learning how to combine such subsystems in new ways giving rise to systems which perform the same general functions as earlier systems but with a much greater degree of parallelism and distribution.

Concurrent systems are more difficult to specify and analyse than sequential ones, because they require the conceptualisation not only of their sequential subsystems. but also of the complex interactions between them. It follows that the programmer's intuition is not enough. being unreliable in cases of high complexity. Here solution of the problem of verification of correct behaviour of the design becomes crucial. and a satisfactory conceptual apparatus for rigorous verification is essential.

If a combination of subsystems is to cooperate coherently to perform a particular system function, synchronisation is necessary to ensure proper joint behaviours of subsystems with respect to any parts of the system they share. If the resulting system is distributed and capable of parallel behaviours in subsystems, then the required synchronisation must be specified without recourse to either a central clock, central control or global state. Many of the conventional tools the system designer has at his disposal, such as Automata Theory and Formal Language Theory, are only suitable for adequately expressing sequential systems and they do this under the assumption of a global system state. Hence, designers have had to support such formalisms with additional formal and informal notations when applying them to the specification and analysis of concurrent and distributed systems.

The project is developing and investigating a formalism (COSY from concurrent system) extending the conventional notations of automata theory and formal language theory. This extension permits one to formally express and analyse synchronisation of concurrent and distributed systems.

The conventional notions which COSY extends have been traditionally used as a basis for automating the specification and analysis of sequential and centralised systems. The extended notions of COSY are equally suited for automating the same for the concurrent and decentralised case. The project has developed a computer based environment for the COSY formalism called BCS for basic COSY system.

Specific Achievements and Results for the Year

The Basic COSY System has been refined and further developed throughout the year [6]. Incorporation of a central Command Interpreter Class has made it easier to define extensions to the command repertoire, and ensured that the command language syntax is the same for all the different dialogues that the system allows. The Environment Class provides a means of version control, by organising systematic names for the files in which descriptions of a concurrent system are kept.

The internal representation of a Basic COSY program has been revised to make possible a more direct implementation of the COSY semantics. The functions of the concurrent simulator have been modularised and extended; there are two styles of translation selection, two styles of transition echo (immediate trace of firing), and four styles of replaying the history of a system. Extending any of these choices is now a fairly straightforward programming task.

Some effort has been devoted to improving the portability and clarity of the program. An ICL Perq was delivered to the project in November 1982; unfortunately it has not proved practicable to install the Basic COSY software on the Perq, nor will it be until SIMULA is available on that machine. But the use of the Perq as a sophisticated terminal to the IBM/370 has brought more benefits than might have been expected - the production of lucid and well laid out coding has been made much more efficient by the facility to run a screen editor displaying 48 lines of program text (compared with 20 on a typical VDU), and by use of a mouse to speed the selection of text; paper consumption has been considerably reduced.

Dr Lauer has continued to make extensive use of BCS and has just completed a detailed and self-contained users' introduction to the BCS environment [2]. The document consists of three main parts; the first is the users' guide to the design environment; the second explains the syntax and formal semantics of the COSY specification notation; and the third contains a detailed analysis of a series of examples developed by means of the COSY environment. The latter serves as a vehicle for indicating to the user how he might develop this own type of analysis within the environment.

BCS has also been used both by the third year honours students in connection with their formal semantics of programming languages course, and by CSSD students and PhD students in connection with their lectures on concurrency and Petri Nets.

During the past year Dr Lauer has continued to work on some of the broader issues involved in producing self-documenting programming environments which has led him to suggest the notion of computer system dossier as a means of supporting such environments. Dr Lauer was invited to lecture at a conference on Synchronisation. Control and Communication in Distributed Computing Systems organised by the Polytechnic of Central London and INRIA. His contribution will appear as part of a book to be published by Academic Press [1].

During the period, Dr Lauer has also concerned himself with a number of more theoretical aspects of concurrency in general and the COSY formalism in particular. This activity was greatly enhanced by collaboration with visitors to the project. in particular Professor R Janicki of the Institute of Mathematics, Warsaw Technical University; and Professor R Devillers of the Universite Libre de Bruxelles, Belgium. The three individuals have separately and jointly authored a number of papers on the maximally concurrent evolution of non-sequential systems and they will be reporting on their work at the Fourth European Workshop on Applications and Theory of Petri Nets, Toulouse, France, September 1983 [4].

Finally. Dr Lauer, Professor Janicki and Dr J R Just, the latter at the Institute of Electronics Fundamentals, Warsaw Technical University. have been collaborating on the use of Petri Nets in the description of hardware configurations. They are reporting on their work in a joint paper, a shorter version of which will appear in the Proceedings of the International Conference on Microprocessors and Microprogramming, Budapest, Hungary, 1983 [3].

A visiting graduate student, Detlef Hillen has been concerned with the design and verification of concurrent systems. He considered rules which change the structure of such systems but preserve some dynamic properties such as adequacy and periodicity. He introduced five classes of such rules and investigated the applicability of these rules [5].

References

Items identified with an ASM number are project memoranda available from Dr Lauer.

1. P. E. Lauer, Computer System Dossiers, in Synchronisation, Control and Communication in Distributed Computing Systems, Academic Press .

2. P. E. Lauer, User Introduction to BCS, A Computer Based Environment for Specifying, Analysing and Verifying Concurrent Systems, ASM/107, February 1983.

3. P. E. Lauer, J. R. Just and R. Janicki, On the Description of Simple Microprocessor Configurations by Means of Petri Nets, International Conference on Microprocessors and Microprogramming Budapest, Hungary, 1983.

4. P. E. Lauer, R. Janicki and R. Devillers, Maximal Concurrent Evolution of Non-Sequential Systems, Proceeding of Fourth International Workshop on Applications and Theory of Petri Nets, Toulouse, France, 1983.

5. D. Hillen, Adequacy-Preserving Substitution and Reduction Rules, ASM/108. June 1983.

6. P. E. Lauer, B. C. Hemshere, Project on a Computer Based Environment for the Design and Analysis of Highly Parallel and Distributed Computing Systems: Interim Progress Report, ASM/112, June 1983.

DR I MITRANI

MODELLING AND PERFORMANCE EVALUATION OF DISTRIBUTED COMPUTING SYSTEMS

April 1979 - March 1981

Summary

The project addressed two problem areas in the performance evaluation and optimisation of distributed computing systems. One area has to do with the extent to which the performance of a distributed system can be influenced by the scheduling algorithms used (ie the decisions concerning the distribution of load, allocation of processing capacity, allocation of priorities to different job types. etc). The first problem studied in this connection concerned the trade-offs between centralised and distributed computing: for a star-like computer network with one central and several local sites, the optimal way of partitioning the available processing capacity, and the streams of demands, among the different sites was considered. Both congestion and communication costs were taken into account. Next, the scheduling of jobs belonging to different classes on a multi-processor system was examined. One can give better treatment to a particular job class at the expense of others, to varying degrees, by appropriately allocating the processors. A family of scheduling strategies, including the pre-emptive priority disciplines as extremes, was defined. analysed and evaluated numerically. As a by-product of that study, a solution method for a rather general class of two-dimensional Markov processes was developed.

Isi Mitrani speaking at DCS Conference, University of Sussex, 1984

Full image ⇗
© UKRI Science and Technology Facilities Council

The second problem area concerns the performance of multi-processor systems in the presence of breakdowns. One large processor is more efficient than many small ones, but a breakdown in a single-processor system is more catastrophic than in a multi-processor one. The dependence of the optimal number of processors on the parameters of demand, breakdown and repair was investigated in some detail.

Trade-offs between centralised and distributed computing

A central (large) computing family is connected to a number of geographically dispersed (small) computing sites. Each site is faced with an optimisation problem: what proportion of the jobs submitted locally should be executed locally and what proportion should be sent to the central facility? lf too many are executed locally, the congestion at the local site may lead to unacceptable increases in response time; if too many are sent to the central facility then there might be. (a) too much to pay in communication costs, and (b) (if all other sites did the same), congestion at the central facility. Another optimisation problem arises when planning the network. Each local site has a sum of money which it may either invest in hardware for itself, or contribute to the purchase of the central facility, or split in some proportion between the two. Assuming that the speed of a processor is a certain function of its price, what is the best money and load allocation among sites?

These trade-offs were studied in [5], using as an optimisation criterion a cost function that took into account congestion (as measured by the average response time) and communication costs. The optimisation can be either selfish (each site tries to do the best for itself. no matter what happens to the others), or joint (best for the whole community). The result depends, of course, on the parameters, but more often than not the best money allocation is extreme (either invest the whole sum locally or invest it all centrally). while the best load allocation is not extreme (process some of the load locally and some centrally).

This work was the result of a collaboration with Prof K. C. Sevcik. of the University of Toronto. It was presented at the First International Conference on Distributed Processing (Huntsville. 1979) and has subsequently been included in a tutorial book on Distributed Processing Systems.

Scheduling strategies for multi-processor systems

Consider a system of n parallel processors. where the demand consists of two job types, with different characteristics. It is usually desirable to provide different quality of service to the two job classes; better for one at the expense of the other. One way of doing this is to designate k of the n processors as type l and the other n-k processors as type 2. Type 1 processors normally execute class 1 jobs but, if there are less than k class 1 jobs in the system, they may be borrowed by class 2 jobs, and vice versa. A family of scheduling strategies is thus defined. ranging from pre-emptive priority class 1 jobs (k=n) to preemptive priority class 2 jobs (k=0).

The models that were used to analyse the performance of these scheduling strategies fall into the general category of two-dimensional Markov systems; that is, systems whose state is described by a pair of integers (in our case, the numbers of class 1 and class 2 jobs in the system). A solution method for a wide class of such systems was developed in [1] and applied to the cases k=1, 2 ... n-1. The pre-emptive priority disciplines (k=0 and k=n) are not covered by that method; since they are interesting in their own right, a different analysis was worked out for them [4].

The analytical solution methods were implemented and were compared with a brute force numerical approach based on solving efficiently very large sets of very sparse linear equations[2]. The comparisons gave an indication of the parameter region for which the analytical solution is more efficient and that for which the direct approach is more efficient (broadly speaking. the analysis wins for small and medium values of n, while force is better for large n). During the empirical investigations it was noticed that a conservation law holds, or nearly holds, for multi-processor systems: for the two classes it states that (W is the average response time for class i jobs and p is a constant depending on the load; i= 1,2)

p₁W₁ + p₂W₂ = const

regardless of the scheduling strategy. In other words, any attempt to reduce the average response time for class 1 jobs by preferential scheduling will necessarily result in a proportional increase of the class 2 average response time and vice versa. A similar conservation law had been known to hold for single processor systems.

The study reported in [1] benefited considerably from the participation in it of Dr G. Fayolle. of INRIA. France Most of the theoretical foundations of the analysis were laid during his visit to Newcastle in 1979. The numerical solution techniques were the topic of Dr P. J. B. King's Ph.D. thesis (1979); the package of programs that he implemented as part of that thesis has been useful to the investigators on several occasions since. Both papers [1] and [2] were presented at the international 'Performance 80' conference (Toronto, 1980).

One more point should be perhaps emphasised, namely that a pre-emptive priority scheduling discipline can be used to model a system with breakdowns: a breakdown can be treated as another type of job which has pre-emptive priority over the normal ones and whose execution represents the repair of the processor. This is relevant to the next problem considered.

Multi-processor systems subject to breakdowns

One of the basic questions that arise in connection with the installation of a multi-processor system is how many and what kind of processors should it contain?. One large processor? A few medium sized ones? Many small ones? Assuming that the total processing capacity of the system is fixed (limited by a financial constraint, perhaps), the classic answer to that question is that a single-processor is best. However, that answer is on the grounds of efficiency only, and ignores the possibility of breakdowns. A breakdown does more damage in a single-processor system than in a multi-processor one, since in the former case all processing stops, while in the latter service may continue, albeit at reduced rates. These conflicting tendencies imply that there is an optimal number (and hence size) of processors which depends on the characteristics of demand, breakdown and repair.

Breakdowns were modelled in two ways: in the first, each processor breaks down from time to time, independently of the others. Repair is started immediately and its duration is also independent of everything else (a possible generalisation of this model is to admit the possibility of waiting for a repairman to become available). In the second model, the system as a whole is subjected to a stream of breakdowns, which are treated as high pre-emptive, priority jobs. Then, even when all the processors are broken down, new breakdowns may arrive and are queued.

Both models display similar qualitative behaviour, although different quantitative ones. If the average response time is plotted against the number of processors, keeping the other parameters constant, then in general the curve has a decreasing portion, a minimum and an increasing tail (it should be remembered that the total processing capacity is constant, so that the more processors implies slower processors). The optimal number of processors may be arbitrarily large in heavily loaded systems (ie. when the breakdown rate is such that the operative processors are occupied nearly all the time), but under normal loading conditions it is usually quite small.

The paper describing the results of this study [3], will be included in the proceedings of 'Performance 81'.

Staff

Dr P. J. B. King was employed as an RA on this project.

References

1. G. Fayolle, P. J. B. King and I. Mitrani, The solution of certain two-dimensional Markov models, Performance Evaluation Review, pp 283-289, February 1980.

2. P. J. B. King and I. Mitrani, Numerical Methods for Infinite Markov Processes, Performance Evaluation Review, pp 277-282, 9 February 1980.

3. P. J. B. King and I. Mitrani, The Effect of Breakdowns on the Performance of Multi-processor Systems, Proc of Performance 81 Conference.

4. I. Mitrani and P. J. B. King, Multi-processor Systems with Pre-emptive Priorities, Performance Evaluation Journal, 1 February 1981.

5. I. Mitrani and K. C. Sevcik, Evaluating the Trade-offs between Centralised and Distributed Computing, Proc 1st International Conference on Distributed Computing, Huntsville, 1979.

DR I MITRANI

MODELLING AND PERFORMANCE EVALUATION OF DISTRIBUTED COMPUTING SYSTEMS

April 1981 - March 1983

The complexity of distributed computing systems makes the task of assessing and predicting their performance a non-trivial one. An efficient way of approaching that task is by the construction and analysis of mathematical models. The nature of these models, as that of the systems they represent, is probabilistic.

This project set out to examine two aspects of distributed computing systems. One concerns the performance of local area networks - bus networks and ring networks in particular. The other problem has to do with the benefits that can be gained from multiprocessing: to what extent can the execution of a program be speeded up by using many processors instead of one. Approximately two thirds of the project effort (the entire first year, and a few months of the second year) were devoted to the network investigation. Some of this was carried out in collaboration with colleagues from the University of Paris-Sud. namely Professor E Gelenbe and Dr Brigitte Plateau. Two papers have been published [1][2], on Ethernet and the Cambridge Ring, respectively; a third paper - on a token ring network - will be prepared when certain simulation and numerical results are obtained.

A study of program execution by many processors has benefitted from the collaboration with Dr G Fayolle, of INRIA. A paper describing the model and some theoretical results has been prepared [3].

References

1. E. Gelenbe and I. Mitrani, Control Policies in CSMA Local Area Networks: Ethernet Controls, in Performance Evaluation Review, Vol 11, No 4, pp233-240. 1982.

2. P. J. B. King and I. Mitrani. Modelling the Cambridge Ring, In Performance Evaluation Review, Vol 11. No 4. pp250-258, 1982.

3. P. J. B. King. G. Fayolle and I. Mitrani. On the Execution of Programs by many Processors, Proceedings of the 9th International Symposium on Computer Performance Modelling, Measurement and Evaluation, 1983.

4. P. J. B. King, I. Mitrani and B. Plateau, Modelling a Token Ring Network, Research Report 12. ISEM. University of Paris-Sud, 1983.

PROF B RANDELL

A PROJECT TO INVESTIGATE THE DESIGN OF HIGHLY CONCURRENT GENERAL PURPOSE COMPUTING SYSTEMS

July 1978 - September 1980

The project on the design of highly concurrent computing systems is investigating computer and program organisations for a tightly-coupled. multi-instruction-multi-data stream (MIMD) computing system which is aimed at supporting highly concurrent general purpose computation. Given such a computing system, its power may be exploited by multiprogramming (ie by running a number of independent programs concurrently) or by utilising the natural structure of a highly parallel program.

The project is in fact a fairly direct continuation of Dr Treleaven's early work on data flow computing, both at Manchester University [1] and subsequently at Newcastle [2][3]. However, investigations have not been restricted solely to organisations based on data flow computation. Initial work was based on a generalised concept of control flow [4] and the current belief is that various methods of evaluation, including data flow, control flow and demand driven schemes, can naturally co-exist in a single concurrent computer.

Progress to September 1979

A simple classification developed by the project for models of computation illustrates the inter-relationships of the various schemes. Two mechanisms are common to all models of computation, firstly the data mechanism which defines the way a particular argument is used by a number of instructions, and secondly the control mechanism which defines how one instruction causes the execution of other instructions. There are two types of data mechanism: by value where data is passed directly between instructions, and by reference where data is passed via a shared memory cell addressable by the source and destination instructions. Also there are two types of control mechanism: by availability where controls signal that input data is available for use and by need where controls signal the request for output data.

The following table shows the appropriate data and control mechanisms for various computation models:

Model of Computation	Data Mechanism		Control Mechanism		Mnemonic
Model of Computation	By Value	By Reference	By Availability	By Need	Mnemonic
Data Flow	Yes	-	Yes	-	Y/A
Control Flow	-	Yes	Yes	-	R/A
Data-Control Flow	Yes	Yes	Yes	-	VR/A
Reduction	Yes	-	-	Yes	Y/N
Combined Model	Yes	Yes	Yes	Yes	YR/AN

From this classification it can be seen that data flow schemas are by value/by availability, control flow schemas are by reference/by availability and demand driven reduction machines are by value/by need.

In reaching this classification a number of computer organisations have been investigated. Initially a slotted ring architecture based on a generalised control flow (GCF) model - class R/ A was studied. The slotted ring is a circular conveyor belt used for the communication of information between resources (processors, memories) that each have access to one slot on the ring.

To investigate the ring-based computer two simulators were constructed, one to study the logical behaviour of the computer [4], and the other to execute the computer's machine code programs. In addition a single assignment language called VORPAL [5] was designed and implemented. The conclusions from this exercise are that the ring-based computer provides a suitable basis for a MIMD architecture, but that the GCF model of computation, like data flow, is inadequate for general purpose program representation.

To overcome these inadequacies the project has designed a computer that integrates the concepts of data flow, control flow and updateable memory [6]. (This computer is in class VR/A.) The design can be used either as a pure data flow computer or as a control flow computer, or, more interestingly, as some combination of the two. The group is in the final stages of implementing a simple version of this computer using microprocessors and producing documentation [7] that will allow the machine to be quickly built by any other group interested in a research vehicle for concurrent computing.

The conclusions from this experiment are that the by value and by reference data mechanisms may co-exist, but that the by availability control mechanism should be enhanced with the by need mechanism. This leads to machines in class VR/ AN.

So as to fully understand the concepts of by need driven computing (ie class V /N), the group have gone through the exercise of designing a reduction machine [8]. In the next few months research students will implement this machine in the same way as the data flow-control machine has been implemented.

In summary, having studied the various models of computation and their implications in terms of computer architecture, they believe that the two data mechanisms and two control mechanisms naturally go together and that a suitable concurrent architecture for their support will be a ring-based computer. With this thesis the group is pursuing a computer design in class VR/AN which it is hoped will satisfy the aims of the project and allow an application to be made to the Science Research Council for a modest grant to build a prototype.

Progress to September 1980

The synthesised model of computation and the computer architecture to support it, are currently being developed. Newcastle expect that the model will form the basis for a number of novel parallel computer designs. One such parallel architecture based on a single-chip building block is discussed below.

In the parallel computer each of the chips, referred to as a computing element, is a computer in its own right containing a capability for memory (M), processing (P) and communications (C), as shown below.

Parallel Computer

Two levels of architecture exist in such a machine, namely that of the parallel computer and that of an individual computing element. The parallel computer is a linear sequence of computing elements, with each element connected to its left and right neighbours. Two adjacent computing elements have their memories connected together and their communications connected. Thus the memories and communications network of the parallel computer may be viewed as two bi-directional shift registers. The roles of the memories, processors and communications units are as follows: (i) the memories of the computer hold the program structure being processed, (ii) the processors search their respective memories for work to perform and also respond to accesses for information in these memories, and (iii) the communication units handle memory accesses to non-adjacent computing elements.

The program or information structure of the computer is binary strings of 0 and 1, delimited by left and right bracket symbols ( and ). For example, a vector of digits 0 to 9 is represented as

( (0) (1) (10) (11) (101) ... (1001) )

and an arithmetic expression is represented as:

( (5) (4) (+) ...)

The program structure is thus hierarchical consisting of possibly nested strings. When information is inserted into a part of the program structure in the local memory of a computing element and the storage capacity is exceeded, the outermost information will migrate to adjacent elements.

Addressing within the program structure is all done relative to the context or point of the reference (as in the telephone network). An address is a series of selectors which move the reference from context to context in the delimited strings. Selectors are based on a four character, context changing, alphabet: left brother (<-), right brother (->), superior (up-arrow), and inferior (down-arrow). The latter selector being a one-to-many mapping must be followed either by a <- character which is defined as selecting the leftmost son or by a -> which is defined as selecting the rightmost son.

When the selector of an address chooses a context contained within the current memory, the processor makes the appropriate access. However if the access is in some other computing element the address - the sequence of selectors - is passed to the communications and passed as a message to an adjacent computing element, which has the task of recognising whether the delimited string is within its memory or whether the message should be passed on. Thus a computing element must keep track of which part of the program structure it is holding in its memory.

The processor in a computing element, logically, obtains work by searching the part of the program structure in its memory. In fact the processor only needs to inspect the parts of the structure that are changed. Therefore it examines structures that immigrate from adjacent elements and also structures that are changed by memory accesses which it performs or at least supervises. It is the decentralised flow of messages in this parallel computer that controls execution.

Currently the group are starting the construction of two simulators to support the above work. The first, a software simulator, will be used to study the kernel computation model and the second, a hardware simulator using M6800 microcomputers will support the architecture investigation. The application to the SRC to extend the project to two research associates for three years from October 1980 has been granted and the second research associate appointed.

Visits

Reports on the project have been presented at three SRC-sponsored workshops on Computer Architecture in Newcastle; at the Universite de Paris and Rennes; at INMOS Limited; at one NATO and one CREST Summer School; and at some ten universities in the UK.

Staff

One full-time research assistant, Dr P.C.Treleaven is employed under this grant.

1. P. C. Treleaven, Exploitation of Parallelism in Computer Systems, Ph.D. Thesis University of Manchester (Feb. 1977).

2. P. C. Treleaven, Principle Components of a Data Flow Computer, Euromicro Symp. (Oct 1978) pp 366-374.

3. P. C. Treleaven, Exploiting Program Concurrency in Computing Systems, IEEE Computer, Vol 12, No.1 (Jan 1979) pp 42-50.

4. E. P. Farrell, N. Ghani and P. C. Treleaven, A Concurrent Computer Architecture and a Ring Based Implementation, 6th Annual Symp. on Computer Architecture (Apri11979) pp 1-11.

5. R. P. Hopkins, VORPAL - A Single Assignment Language for Programming the GCF Simulator, M.Sc. Dissertation, University of Newcastle upon Tyne (1978).

6. R. P. Hopkins, P. W. Rautenbach and P. C. Treleaven, A Data Flow Computer with Addressable Memory, Data Flow Workshop in Toulouse (Feb. 1979).

7. R. P. Hopkins, P. W. Rautenbach and P. C. Treleaven, A Computer Supporting Data Flow, Control Flow and Updateable Memory, Tech. Report No 144, Computing Laboratory, University of Newcastle upon Tyne (1979).

8. P. C. Treleaven and G. F. Mole, A Multi-processor Reduction Machine for User-defined Reduction Languages, Internal Report ARM/6, Computing Laboratory, University of Newcastle upon Tyne (1979).

9. P. C. Treleaven, Program Evaluation in Perspective, Internal Report ARM/5. Computing Laboratory, University of Newcastle upon Tyne, (presented at the Distributed Computing Workshop, Universite de Paris, Orsay, July 1979).

10. R. P. Hopkins et al. A Computer Supporting Data Flow, Control Flow and Updateable Memory, Technical Report 144, Computing Laboratory, University of Newcastle upon Tyne (September 1979).

11. D. Brownbridge, A Simulator for Concurrent Architectures, M.Sc. Dissertation, Computing Laboratory, University of Newcastle upon Tyne (1979).

12. T. L. Wat, The Implementation of a "JUMBO" Computer on Three M6800 Microcomputer Systems, M.Sc. Dissertation. Computing Laboratory, University of Newcastle upon Tyne (1979).

13. P. C. Treleaven and G. F. Mole, A Multiprocessor Reduction Machine for Userdefined Reduction Languages, Proc. Seventh Int. Symp. on Computer Architecture (May 1980).

PROFESSOR B RANDELL

THE RELIABILITY AND INTEGRITY OF DISTRIBUTED COMPUTING SYSTEMS

August 1981 - August 1984

Background and Nature of Research

The project is a continuation of a broad programme of research into the design of computing systems that can operate continuously and satisfactorily. despite suffering from faults. Distributed computing systems have some obvious advantages over conventional systems, with respect to the possibility of providing some forms of fault tolerance. However they can also suffer from reliability problems that do not occur in conventional systems - for example, one part of the system may be taking decisions based on data that another part has already found to be erroneous.

Research Goals

The overall goal is to investigate general techniques for designing distributed computing systems which can cope effectively with situations in which it is belatedly detected that erroneous data has been spreading from computer to computer within the system. Its short term goals are mainly focussed on the incorporation of the various fault tolerance techniques the project has developed to date into a distributed system based on the UNIX operating system. Its long term goal is to gain a deeper understanding of fault tolerance and fault avoidance applied to the design of both software and hardware.

Achievements

The project has developed, and documented in a number of books and papers, general principles and practical techniques for achieving fault tolerance. Most recently it has developed a software subsystem, called the Newcastle Connection, which can be added to each of a set of physically connected UNIX or UNIX-lookalike systems so as to construct a distributed system which can be programmed like an ordinary UNIX system, thus hiding the problems normally associated with using distributed systems. The first implementation of this UNIX United system was constructed at Newcastle on a set of PDP-11 computers connected by a Cambridge Ring. The Connection software is now being distributed to other organisations. and has already been used to construct UNIX United systems based on PDP11s, Motorola 68000s, VAX/750s, and ICL PERQs, variously using Cambridge Rings or Ethernets. Prototype extensions and additions to the system, for incorporating triple modular redundancy, stable storage, multi-level security and discless workstations have been completed, and various others are in progress. This work in fact constitutes a first series of explorations of a very powerful method of system implementation that has been developed. which is characterised by the use of recursive structuring.

Work in Hand

The currently available version of the Newcastle Connection is being extended so as to permit the construction of large scale UNIX United systems based on multiple local and/or wide area networks. Work is continuing on the incorporation of various fault tolerance schemes. More long term investigations are concentrated on general naming issues, and on the constructing of forward as well as backward error recovery in distributed systems.

Applicability

Many companies are actively interested in using and/or marketing systems incorporating the Newcastle Connection. The exploitation activity is being coordinated by the Newcastle-based Microelectronics Applications Research Institute (MARI), who are also responsible for distribution of the software at a nominal fee, to organisations eligible to obtain educational licences. To date MARI have negotiated commercial licences, in some cases for quite large numbers of systems, with various organisations including CERN (Switzerland), DEC (USA), ICL (UK), Research Triangle (USA), PCS (W.Germany), Portable Software Inc. (USA), SG2 (France) and Siemens (W.Germany), as well as a considerable number of educational licences. The University and MARI are also involved with several companies in major applied research and development projects that take advantage of the Newcastle Connection, and the University's work on fault tolerance and system security. These include a pilot Information Exchange System for the EEC ESPRIT Project, and a Distributed Secure System for the Ministry of Defence.

References

1. T. Anderson and P. A. Lee, Fault Tolerance: Principles and Practice, Prentice-Hall 1981.

12. D. R. Brownridge, L. F. Marshall and B. Randell, The Newcastle Connection - or UNIXes of the World Unite, in Software Practice and Experience, Vol. 12. No 12, December 1982.

3. J. M. Rushby and B. Randell, A Distributed Secure System, IEEE Computer. Vol. 16. No.7, pp. 55-67, July 1983.

4. B.Randell, Recursively Structured Distributed Computing Systems, in Proceedings of the Third Symposium on Reliability in Distributed Software and Database Systems, IEEE, Florida, October 1983.