Chilton::ACL::Time Sharing Aspects of the Stretch Computer

Time Sharing Aspects of the Stretch Computer

J A Nash

July 1962

Proceedings of a Symposium held at the London School of Economics, Editor Peter Wegner

John Nash, IBM Hursley, was assigned to AWRE Aldermastion during the period 1962-63 and with Alick Glennie developed the S1 and S2 Fortran Systems on Stretch.

Introduction

Stretch has only one Central Processing Unit (CPU), and only one Instruction Counter (IC). In the strict sense of the words, it cannot therefore execute more than one program at once. Machines exist on which this can be done. Since there is only one Instruction Counter, a program must capture control of it in order to get anything done. Possession of the IC may be surrendered spontaneously by some other program, or it may be necessary to snatch control away forcibly by means of an interrupt, which is a mechanism for breaking into a program in order to perform some other urgent task. Timesharing on Stretch is therefore some sort of battle between two or more routines, not necessarily related, for possession of the Instruction Counter. A brief description of the hardware is necessary in order to appreciate the programming aspects.

Hardware

1. General Layout (Fig. 1)

The Memory Bus Control Unit (MBCU) is the communication centre of the whole system. The CPU, Core Storage, and the two input-output Exchanges are all hung on the MBCU. Each of these major units is autonomous and proceeds independently with any task it is required to perform. Even sub-units within core storage and the CPU operate asynchronously. The Basic Exchange services autonomous input-output channels serving magnetic tape units, card readers, card punches, line printers, and operating consoles. The High-Speed Exchange serves autonomous Disk channels. STRETCH is thus essentially an asynchronous system. A great deal of local overlapping is built into the hardware, so that, in a sense, many circuits are time-shared automatically, without any effort on the part of the programmer.

Fig 1

2. Core Storage (Fig. 2)

Each 16K storage box is autonomous, and has a cycle-time of 2.18µs. Words have 64 data bits, and eight check bits, which allow automatic correction of single bit errors, and detection of double bit errors. The MBCU can address any box even though other boxes may be executing access cycles due to previous requests. The effective information-rate is therefore much faster than one word every 2.18µs, and in practice storage access time is not a limitation on the speed of the system.

Fig 2

3. Central Processing Unit (Fig. 3)

The instruction processing unit (I-box) fetches instructions from storage under control of the IC, and processes them ready for execution. Instructions which can be executed entirely by the I-box, e.g. index arithmetic and some branch instructions, are then completed, provided this will not upset the logic of the program. Other instructions, which require execution by the arithmetic unit (E-box), e.g. floating-point arithmetic, are passed into one of 4 levels of Look Ahead, and fetch request is sent to the MBCU for the data, if any. Look Ahead holds decoded instructions, together with their data, until the E-box is ready to process them, in correct logical sequence. Each of the units is autonomous, and up to 11 instructions may be in the CPU at any time. The interrupt circuits can break into a sequence of instructions when signals are received from various sources.

Fig 3

4. The Basic Exchange (Fig. 4)

All channels, and the Basic Exchange itself, are autonomous, and can operate simultaneously. On a tape channel, only one tape unit can be operating at any instant, but other units may be rewinding. Operators' Consoles are not intimately associated with the CPU, as in most computers, but are just input-output units. It is programming function to respond appropriately to signals and messages given by the operator. I/O operations are initiated by the CPU, and on completion of the operation, or on discovery of errors, signals are sent back to the CPU to activate the interruption mechanism. Signals can also be injected manually at each channel by the operator, to cause an interrupt.

Fig 4

5. The High-Speed Exchange (Fig. 5)

Only one disk unit can be operating at any instant, but others may be in the process of locating access arms to the desired track on the disk. Each disk file has two sets of access arms, one addressing odd-numbered tracks and the other even-numbered tracks. When a Locate instruction is given by the CPU, the appropriate arms move to the specified track, and the other arms move to the next sequential track. When operating, one set of arms is transmitting information while the other is automatically moving to the next track, to provide continuous operation. Programming for disk operations is rather similar to programming magnetic tape units, except for the relatively quid access to any part of the disk area. For this reason we loosely refer to the disk as an I/O unit, although it is really backing storage.

Fig 5

6. The Interruption System (Fig. 6)

The successful implementation of time-sharing depends largely on the interruption system, and it may be worth spending time on a fairly detailed description.

The CPU can be in one of two states - enabled or disabled. When enabled, interruptions can occur as soon as signals are received from the source of interruption. When disabled, interruptions are inhibited, and are stacked in the CPU to be released in order of priority when the system is re-enabled. The instructions for enabling or disabling are Branch Enabled (BE) and Branch Disabled (BD).

Fig 6

Within the CPU there are some special registers and circuitry associated with the interrupt system. The indicator Register has 64 bits, each of which reflects the state of the machine in some particular respect. The main types of indicator, in order of priority, are as follows.

1. Machine Malfunction              1
2. Time Signals                     1
3. I/O Reject                       1
4. I/O Status                       1
5. Memory protection                1
6. Program exception conditions     m
7. Data Flagging                    m
8. Arithmetic & Index Result        0

Types 3 and 4 are of particular interest for time-sharing. The I/O reject indicators are set on if the CPU sends an instruction to the exchange which cannot be accepted. This is usually the result of programming errors. The I/O status indicators are set on when a channel completes an I/O operation, to indicate to the program how that operation turned out. The most important are the Unit check indicator, which is put on when data errors or unit malfunction has occurred, End-Exception, which is put on when the unit runs out of material during the operation (Tape-mark in the case of magnetic tape); the End-of-Operation indicator is set on when the operation is completed; the Channel Signal indicator can be turned on by pressing the channel signal key at the channel concerned. The identification of the channel is entered automatically in another register at the time the I/O status indicators are set up.

Another special register is the 64-bit MASK. This register specifies whether the corresponding indicator shall be able to cause an interrupt. The bits corresponding to categories 1-5 are always masked on. Types 6 and 7 can be set by the program. Type 8 is always masked off, and can never cause an interrupt.

When an indicator is turned on, e.g. EOP by a tape channel successfully completing an operation, provided the system is enabled and the corresponding mask bit is set to one (for EOP it always will be), an interrupt occurs.

The Interrupt Address Register can be set by the program to contain the base address of a so-called interrupt table. When an interrupt occurs, the instruction currently being executed is normally completed, and serial number of the indicator causing the interrupt is automatically added to the address contained in the Interrupt Address Register. The instruction at the resulting address is sandwiched into the program at that point. That instruction may or may not change the contents of the instruction counter.

The memory protection feature is also of interest in the context of timesharing. The Memory Boundary Register is another special register in the CPU, and can be set by programming to specify the storage limits within which a given program is expected to operate. Any attempt to address storage outside that area while the system is enabled, will cause one of the memory protection indicators to be turned on. A supervisory routine can then gain control and throw the offending program off the machine.

Software

It is quite possible to use STRETCH without any supervisory routine. However, to do so for anything but the most trivial programs involves the programmer in a massive task of housekeeping and special routines. Apart from this, between-job delays would be quite intolerable from an operational point of view. For these reasons a supervisory program is a must for a system as large and complex as STRETCH.

The supervisory program provided by IBM for use on STRETCH is called the Master Control Program, MCP, which was devised and written by IBM in consultation with prospective users. Also part of the STRETCH Programming System are three language processors - the STRETCH Assembly Program, STRAP, which is a one-for-one assembler; the STRETCH Macro Processor, SMAC, and the FORTRAN IV Compiler. The processors are under the control of MCP. The main functions of MCP are (1) automatic operator; (2) job control and system supervisor; (3) system input; (4) system output; (5) operating messages; (6) simultaneous on-line utility programs; (7) input-output for the problem program (PP); (8) error control.

As an automatic operator, MCP accepts commands from the human operator, to alter its mode of operation or to perform various functions as required, e.g. to terminate the current output tape and begin a new one. In conjunction with Job Control it automatically loads jobs in sequence and types instructions and comments for the operator. As a System Supervisor, MCP provides the necessary system programs for each job (processors, loaders subroutines), and monitors jobs during execution, when it throws off any jobs which don't obey the rules. The System Input Program inspects the stream of information entering the system up to 20 jobs ahead, in its so-called SCAN phase, and passes operational requirements to the automatic operator for advance instructions to be typed. The number of jobs to scan ahead is actually a parameter within MCP, and can be varied to meet varying operational requirements. It also buffers the input data in the execution phase, to provide rapid core-to-core response to requests for data from user programs. The System Output Program buffers system printer and punch output on to magnetic tape, for later processing by a separate utility program. The Commentator program types messages for the operator at the request of any other part of the system or the PP. MCP allows simultaneous utility programs to be run quite separately from the rest of the system, though using MCP facilities. Routines for actuating I/O units are part of MCP and other programs can invoke their use by executing suitable calling sequences. The Error Control package handles all error conditions, both machine and program, and takes appropriate action. Above all, MCP at all times retains control of the interrupt table. When an interrupt occurs, it is MCP's job to investigate it, decide who it belongs to, and transfer control appropriately.

Most of MCP operates asynchronously, and therefore MCP itself can be regarded as a number of separate, though intercommunicating routines, all time-sharing the machine. For example, the Output Program, when asked to put out some data, gathers the data into its buffers, and if a buffer becomes full, it initiates a tape write operation, using one of the actuator routines. Once this is initiated control is returned to the requesting program, which continues its operations (it might be a problem program (PP), or some other part of MCP, e.g. the logging routine). When the tape channel signals completion of the write operation by turning on the EOP indicator, the routine then in control is interrupted and an MCP program called the Receptor takes over. The Receptor receives all interrupts from I/O units, decides to whom they belong and the address at which that routine requires control when that particular interrupt occurs. In the present case, control would be given to a specified point in the Output program, which could then carry on with further work. When it no longer has any work to do, it gives control to a Return routine, also in MCP, which returns control to the interrupted routine.

Fig 7

At the PP level, time-sharing may be going on between PP I/O and computing. Simultaneous utility routines may also be time-sharing the machine for their appointed tasks.

MCP has three operating modes, called ONLINE, OFFLINE and BYPASS. In the ONLINE mods (Fig. 7), a card reader is the system input source. Cards are ready by the Input Program and blocked on to a magnetic tape called the WRITE tape. During this process, control cards at the head of each job are interpreted, and advance tape-mounting instructions are typed for the operator. All I/O units required by a PP are specified symbolically by these control cards, and the actual physical units to be allocated are decided by MCP depending on the availability of units and the requirements of previous jobs in the queue. Magnetic tape reels are labelled, and MCP checks to see the right reel is mounted. At the same time, the READ tape, which was earlier a WRITE tape, is being buffered into storage by the Input Program, and Job Control is loading jobs and giving them control in turn. There is a queue of some 10 to 20 jobs in the system at any time, waiting for execution.

The OFFLINE mode is similar to ONLINE, except that the input has been written on to tape as a separate operation by an off-line utility program, e.g. on an IBM 1401 (Fig. 8).

Fig 8

In the BYPASS mode a card reader is the system input source. There is no overlapped scanning of the input, and cards are buffered directly into core storage. This mode is intended for use by priority programs, and effectively enables jobs entered in the BYPASS mode to overtake any jobs already queuing in the system in one of the overlapped modes. There is no pre-assignment of I/O units for PP's and this mode is therefore very inefficient operationally.

Now consider what time-sharing is going on in a typical situation in the ONLINE mode (Fig. 9). The Input Program is time-sharing the card-to-WRITE tape and the READ tape-to-PP operations. The Commentator is time-sharing the typing of operator messages. The Output Program is timesharing the writing of the output tape. The simultaneous print and punch routine is processing an output tape, previously written by the Output Program, as a totally independent, time-shared function. The PP is timesharing its computation with its I/O, to magnetic tapes and disk.

Fig 9

It should be noted that at its present stage of development, MCP cannot handle comprehensive multiprogramming. Only one PP, or user program, can be in the execution phase at any time. Except in the limited sense of preprocessing control cards, and processing output tapes on a simultaneous basis, there is no provision in MCP for time-sharing between users. This does not prevent programmers from dressing up a number of logically separate programs to look like one job, to time-share their activity between them. But there is no automatic mechanism to provide user time-sharing.

Perhaps three main lessons have so far been learnt through experience with MCP. In its present form it may seem to some of you a rather modest and unambitious supervisory program. Nevertheless, the complexities that can arise are such that the implementation of MCP proved a major undertaking, although rather less than was originally estimated for the project. This indicates that the sophisticated programming systems of the future will have to be very carefully planned in order to keep cost and effort down to reasonable proportions.

Again due to the variety of interrupt situations that can occur, MCP spends a non-trivial percentage of CPU time over its housekeeping activities, saving and restoring index registers through a number of buffers, table-management, and so on. This has been done to relieve the problem programmer of the burden of doing this housekeeping himself. Such generality is bound to result in needless work being done for quite a lot of the time. With the prospect of multi-level programming, these complications will be even more acute, and it will be necessary to consider very carefully how much should be done for the user, and how much left to the user's discretion.

The third lesson we have learnt concerns the incidence of unrecoverable machine errors. In a system the size of STRETCH it can be shown on statistical grounds that at any time there is a certain number of components that have failed. For example, a given small number of transistors in a STRETCH system is probably unserviceable at any particular moment. Awareness of this problem led to the inclusion of extensive error checking and correction circuits in the STRETCH system, to reduce the number of unrecoverable errors and to eliminate, as far as possible, the occurrence of undiscovered errors. When operating under MCP, an unrecoverable error normally means that the system has to be re-initialized when the error has been investigated and corrected. It would be extremely risky to do otherwise, since parts of the machine may have been contaminated as a result of the error. In an overlapped mode, the operator then has quite a headache to sort out which jobs had been run and which were still waiting on the input tapes when the error occurred. This would normally take at least five minutes, and often more, to which must be added the time lost by the last job to be run, the time to initialize the system, perhaps a minute, and the time required to re-scan the jobs that had already been scanned. It is more than likely that all the tape, assignments will now be different, and operators will have a jolly few minutes unloading and reloading tape reels. The total time penalty may be as much as 15 minutes of CPU time. To keep wastage down to acceptable limits, say 5%, this requires a mean error-free period of about 4-5 hours. This is a very stiff target to meet, from an engineering point of view, although STRETCH systems are achieving it after the initial settling-down period following installation. With more elaborate supervisors in future, it will be necessary to design them so that a minimum of time is required to sort out the debris after a machine failure, and restart the system.

REFERENCE

1. W. BUCHHOLZ (Ed.) (1962). "Planning a Computer System". McGraw-Hill, New York.