The project arose originally out of the conjecture that a general purpose computer system could be configured in a cost-effective way from a collection of interconnected minicomputers. It was taken as axiomatic that some application areas exist where most tasks are suitable for processing within the resources offered by a single minicomputer with a few tasks requiring greater power (faster speed) than could be offered by a single processor. A previous research contract investigated the feasibility of linking dissimilar minicomputers together while providing a uniform command language interface. It also demonstrated that a twin processor system where the processors shared part of the memory could be used to achieve execution speeds close to twice those of a single processor for particular algorithms for problem solution. The present research seeks to extend the investigation to demonstrate that it is feasible to link together both more and more sophisticated machines cheaply, without affecting their capability to act as uniprocessor systems and yet still yielding a high percentage of the possible speed up for the greater number of processors. Fundamental assumptions were that the work should be based on commercially available hardware; that a minimum of changes should be made to the operating system and utilities; that special software should be avoided wherever possible.
The main goal was to establish a working multiprocessor system with support for parallel program development, and to make it available to the DCS community as soon as possible. Subsidiary goals were: to investigate alternative methods of managing the resources that were shared between the processors (specifically shared memory); to repeat the demonstrations of reliable code which have been given on the earlier system; and to investigate the provision of algorithms for various tasks on the multiprocessor system to ensure that the promise of such a system could in fact be realised in practice.
Despite problems in obtaining the hardware and suitable research staff, a working system was made available to the DCS community within nine months of the acceptance of the hardware. A seminar was held at Loughborough and several external users have subsequently used the system to develop and test algorithms. A large number of algorithms have been written in the Department to exploit the parallelism of the system. Many of these have already been published as research papers in journals or at conferences, and they provide a convincing demonstration of the power and flexibility of the multiprocessor system. A new method of coordinating access to shared resources is also a worthwhile product of the research.
The work on different algorithms to exploit parallelism is being continued by several research students as well as research assistants. Improved algorithms for managing the shared memory and multiprogramming multiprocessor and uniprocessor tasks are being developed and an integrated management scheme for the shared disc is under investigation.
The system demonstrates the viability of assembling and programming multiprocessor systems for general purpose use. Texas Instruments are the only firm to whom the results are directly useful and they are showing some interest.
A Guide to Using the NEPTUNE Parallel Processing System, Internal Publication.
Many other publications. A list can be obtained from the investigators.
Parallel computers, that is computers with more than one processing unit, are the only solution to enormous demands for computing power from extremely large problems or tasks with severe real-time response constraints. A further impetus to the viability of parallel computing comes from noting that the advent of large scale mass production of computing units could offer a more cost effective, more flexible and potentially more reliable provision of computing resources through the coupling together of a number of basic units.
However, the provision of computing power through a number of such units does not automatically mean that all the units can be effectively used. The workload has to be shared out evenly. This particularly applies to using more than one computing unit on the same task: this is the usual interpretation of parallelism.
Furthermore there are overheads which are associated with loading the program into the various units, the communication of results between the units and finally the possible synchronisation of the states of subtasks on different units.
The existence of several accessible parallel computers has oriented the research to both practical and theoretical aims which include the following:
The work on sparse representation referred to above can be embedded into a well known iterative method for the solution of linear systems (Conjugate Gradient Method), and eigenvalue problems (Lanczos Algorithm).
1. R. H. Barlow, D. J. Evans and J. Shanehchi, Comparative Study of the Exploitation of Different Levels of Parallelism on Different Parallel Architectures, Proceedings of the 1982 International Conference on Parallel Processing (IEEE Computer Society Press), pp 34-40.
2. R. H. Barlow, D. J. Evans and J. Shanehchi, Sparse Matrix Vector Multiplication on the ICL-DAP, Internal Report No. 161.
3. R. H. Barlow, D. J. Evans and J. Shanehichi, Performance Analysis of Algorithms on Asynchronous Parallel Processors, Computer Physics Communication, Vol. 26, pp 233-236 (1982).
4. R. H. Barlow and D. J. Evans, Analysis of the Performance of a Dual Minicomputer Parallel Computer System, Proceedings Eurocomp 1978, Online Conferences, Uxbridge, pp 259-276.