Chilton::ACL::Atlas technology by David Aspinall

The Atlas Computer - The Technology

David Aspinall, Emeritus Professor of Computation, University of Manchester Institute of Science and Technology

May 2001

Introduction

The Atlas computer burst onto the scene, in Manchester, during the early 1960s. It was the result of a project, which began in the Department of Electrical Engineering of the University of Manchester in 1956.The group, led by Professor Tom Kilburn, combined those electronic engineers investigating the new Transistor switching circuits and Magnetic Core Storage devices, with the programmers who were familiar with the problems and aspirations engendered by 8 years of using earlier Manchester computing machines for scientific calculations. By 1959 Ferranti Ltd. had joined the project and their collaboration with the University led to the development of the ATLAS computer. Ferranti brought not only their computer design and manufacturing expertise to the project but also their experience in business data processing. The Manchester ATLAS began to provide a computer service in 1962. It went on to provide a reliable service for both scientific and commercial users until 1971.

It embodied many pioneering features, which we now take for granted. These include system features such as Timesharing of several concurrent computing and peripheral operations, Multiprogramming, and the One-Level Store (Virtual Store). Design features included, High-speed arithmetic, Asynchronous control, interleaved stores, paging, V-Store (Image Store), Fixed store (ROM), and autonomous transfer units. These both required and enabled Software developments such as the Supervisor (Operating System), the Compiler-Compiler and High level languages.

Ferranti manufactured two further versions of ATLAS. For a while, in the early 1960s, it was the fastest computer anywhere until being caught by the early Control Data computers from Minneapolis. The other contenders from the United States, the UNIVAC LARC and IBM STRETCH computers, were left behind on their blocks. It is doubtful if any of these developments would have occurred, in the UK, but for the ability of the electronic engineers, during the mid 1950s in Manchester, to demonstrate that the new-fangled Germanium Junction Transitors and Diodes were capable of high performance digital circuits. In this lecture I hope to give you a flavour of the situation which prevailed, in the Electrical Engineering Department of Manchester University, during the time when the technology to produce what became known as the ATLAS computer was being developed. How the Semiconductor technology and the burgeoning Magnetic Core Storage technology enabled a complete rethink of computer design, or of Computer Architecture as it is now called.

Switching Circuits

By the mid 1950s the Thermionic Vacuum Tube, or Valve, had a competitor in the form of the Junction Transistor. It led to a boom in portable Transistor Radios before being able to replace the Valve in computers. The Transistor was much smaller in size and consumed much less power than the Valve. The absence of a heater meant that the Transistor was inherently more reliable and had a longer life than did the Valve. These factors allowed for the possibility of parallel computing machines.

In the earlier Manchester machines the word was processed in bit-serial form. The Ferranti Mercury was a production version of MEG, the second computer designed and built by the University. It had a random access magnetic core store but within the processor, individual words were held in temporary storage in electromagnetic delay lines. A parallel computer would hold an n-bit word on a register of n separate flip-flops instead of a single n-bit delay line. Whereas in the serial machine n separate clock beats would be required to transfer a word into a delay line, in a parallel machine a transfer would require only one beat. In computers where word lengths of 40 bits were common, an order of magnitude speed-up seemed possible, by going parallel.

Next let us consider the duration of the beat, the time taken to perform a basic operation. In the Ferranti Mercury the clock frequency was one megahertz. In other words the beat duration was one microsecond. Early experiments with Junction Transistors suggested that the beat duration could be of the order of one hundred nanoseconds, or one tenth of a microsecond. Thus a parallel machine based on these transistor circuits could be two orders of magnitude faster than the fastest available computer.

Ferranti engineers developed the logic circuits for the production of the ATLAS. They were based on germanium semiconductors, including Diodes and the OC170 Drift transistors, manufactured by Mullard in bulk for the transistor radio market. (Semiconductor manufacturers in Europe did not think there was sufficient market in computers to justify the development of the switching transistors they required). The typical delay through such a circuit was 25 nanoseconds. Two such circuits could be connected to form a flip-flop or single bit storage cell. A transfer from one cell to another required a pulse of 100 nanoseconds duration to set the data onto the receiving cell. Thus the earlier predictions were confirmed.

Whilst a parallel machine can be justified for rapid transfer between registers, when it comes to performing arithmetic we have a problem. The addition operation includes not only the generation of a carry but also its propagation, which is intrinsically a serial operation. An addition circuit, to add numbers of 40 bits, consisting of circuits similar to the basic circuits used elsewhere in the machine, could require at least 40 circuits in series to process the carry signal. Using these basic circuits, each with a delay of 25 nanoseconds, there would be an overall delay of 1000 nanoseconds, or one microsecond. A loss of one order of magnitude at least. Now the addition operation is fundamental to each instruction cycle of a computer. The programme counter must operate at least once in each cycle. Normally one is added to select the next instruction, but occasionally a number is added to perform a relative jump. Furthermore a computer used in scientific operations demands long word lengths and the addition and multiplication operations are used frequently. It was clear at the outset that the addition circuit required special consideration.

The High Speed Adder

Consider the operation z: = x + y performed in a serial adder. The inputs x, y are presented sequentially to a half adder to produce a sum s and a carry c as shown in the Left Hand Truth Table:

Input Half Adder				Output half Adder
x	y	s	c	s	Cin	z	k
0	0	0	0	0	0	0	0
0	1	1	0	0	1	1	0
1	0	1	0	1	0	1	0
1	1	0	1	1	1	0	1

The sum s must now be combined with the carry Cin from the previous less significant bit sequence, as shown in the Right Hand Truth Table. The output z is the true sum whilst k is a component of the carry out of this bit, Cout, to become Cin for the next, more significant bit. (In a serial computer it would be delayed and presented as Cin during the next time period). Cout = (c or k)

Now consider Z: = X + Y

Let the operands X and Y both be stored on parallel registers of eight bits.

As they are presented to the Addition Unit, the least significant bit produces a half sum which is always equal to the final sum in that bit position. It also produces a Carry Out signal to the next more significant stage where it arrives as Carry In to be processed with its half sum to produce a final sum bit and a Carry Out signal to its next more significant bit position. And so on to the most significant bit position, which computes the most significant bit of the sum.

In the example below two eight-bit numbers X=85 and Y=43 are added to form the sum Z=128.

The initial Half Sum formed by X and Y only is shown. The Half Sum in the least significant bit position is the final sum in that position.

The initial Carry Out formed by X and Y only is shown next. The Carry In is computed with the Half Sum, to yield the final sum in the next more significant bit position and a new Carry Out, which will propagate up the adder producing final sum bits and new Carry out. Then Carry In, to eventually combine with the Half Sum in the most significant bit position to produce the last most significant bit of the Sum.

0	1	0	1	0	1	0	1	X	85
0	0	1	0	1	0	1	1	Y	+43
0	1	1	1	1	1	1	0	Half sum
0	0	0	0	0	0	0	1	Carry-out
0	0	0	0	0	0	1	0	Carry-in
0	0	0	0	0	0	1	1	Carry-out
								etc
1	1	1	1	1	1	1	0	Final carry-in
1	0	0	0	0	0	0	0	SUM=Z	128

The carry signal generated in the least significant bit position propagates through all other bit positions to form the sum in the most significant bit. The addition is not complete until this most significant sum bit is set.

Furthermore, the Carry propagates through a bit position when (x or y) & (NOT (x & y)) is true.

Also Carry= 1 when (x & y)

Carry= 0 when (NOT x & NOT y)

Thus it is possible to consider a chain of switches. Each bit position comprising three switches to produce the Carry Out (which in turn becomes Carry In to the next bit position).

Switch Carry Propagate connects Carry In to Carry Out and is closed by (x or y) AND ((NOT(x AND y))

Switch Carry=1 connects 1 to Carry Out and is closed by (a AND y)

Switch Carry=0 connects 0 to Carry Out and is closed by (NOT x AND NOT y).

Each switch in every bit position would be set at the same time as the parallel transfer to set up X or Y occurred.

Carry signals would pass through the closed Carry Propagate switches, with the speed of light at one foot per nanosecond.

Very early in the project experiment were carried out with special Surface Barrier Germanium Transistors manufactured in the United States by a method of construction which led to a device better suited to a switch than the asymmetric Drift Transistors used elsewhere. Using these it was demonstrated that a carry path could be implemented in a parallel addition circuit of 40 bits. Also that it would take 80 nanoseconds to set all the switches in parallel plus 34 nanoseconds to propagate the carry signal from the least to the most significant bit, 114 nanoseconds in all. Not quite achieving the speed of light. The extra time to perform the other logical operations led to an overall time of 250 nanoseconds. An order of magnitude speed improvement of parallel over serial computing had been recovered. The circuit was refined as an Adder/Subtractor and used throughout the machine, most notably in the Floating Point Arithmetic Unit. The power of this circuit is recognised, to this day, by the designers of VLSI semiconductor systems.

To them it is known as The Manchester Carry Chain.

The Floating Point Arithmetic Unit

In 1951 the university had pioneered the design of a floating-point arithmetic unit in the MEG Computer. It was natural to plan to include one in the parallel machine. It was to have a 40 bit Fractional part with an eight bit Exponent. The problem with the Addition circuit was solved but there remained the Multiplication problem. A 40 bit multiplier would require 40 ADD-SHIFT operations. Fortunately a new multiplication algorithm was invented. The multiplier was divided into three-bit groups. In other words the 40 bit multiplier became 13 octal characters plus one bit. By forming 2D+D to give 3D, where D is the multiplicand, before the multiplication proper began and providing D, 2D and 3D as possible inputs to an Adder/Subtractor, it was possible to reduce the number ADD/SUBTRACT- SHIFT operations to 14.

The programmers were quite pleased when it was realised what the engineers were proposing. They gave their strong support to the extent of requesting that the base for the floating point exponent should change from two to eight Thus the normalisation of the fractional part could also be speeded up, since fewer shifts would be needed.

We were unable to find an algorithm to speed up the Division operation.

The central processor

The central processor and its instruction set began to take shape early on in the project. It comprised a floating point arithmetic unit for 48 bit words, a 24 bit control register (Program Counter) and 128 B-lines (Index Registers) in a special core store with an associated 24 bit fixed point arithmetic unit.

The Main Store

The economics of the available store technology was quite simple. One bit of Magnetic Core store cost three shillings (1 pound=20 shillings) whilst one bit of magnetic drum store cost six pennies. Core was six times more expensive than drum. Therefore the main storage would be a combination of the two technologies.

The Magnetic Core Store was being developed by the Plessey Company in the South of England. It consisted of several stacks. Each stack had 4096 words of 48 bits operating with a cycle time of 2 microseconds. The Manchester ATLAS had four stacks. Arranging the store into pairs of stacks, with a selection mechanism for each stack reduces the effective access time. Each pair consists of an Even and an Odd stack. The even stack contains words with even addresses and the odd stack the words with odd addresses. Consecutive words in a block are thus stored alternately in even and odd stacks of the pair containing the block.

Each block consisted of 512 words and was contained in a page of the core store. There were 16 pages in each pair of stacks.

The Magnetic Drum Store, developed jointly by the University and Ferranti, was to act as the backing store. In all there were four drums each of 24k words, giving a total of 96k words. The revolution time was 12 milliseconds, a drum latency of six milliseconds. The rate of transfer was one block of 512 words per two milliseconds, nearly one word every four microseconds.

The Manchester users were familiar with this arrangement from the earliest days of the Mk 1 machine. Then the Drum was located on the floor above the rest of the computer and blocks were brought down from the Drum into the CRT Store. Today this would be called Downloading. Some users became skillful in writing their programs so that the request for a download would be made when the requested block was about to come under the read heads of the drum. Thus reducing the time waiting for the transfer to begin.

It was suggested that all users of the ATLAS would rather concentrate on the numerical analysis aspects of the problem rather than the idiosyncrasies of the hardware. Also these new fangled high level languages were being developed. The compilers of these would have enough to worry about with out using dynamic programming techniques to achieve efficient drum transfers. Something had to be done to hide the drum transfers from the user.

One Level Store

The Drum and Main Core Store were referred to as the Main Store of the machine. Words within this store were addressed in blocks of 512 words up to 192 blocks, the drum capacity. When a block was transferred into a page of the core store, the block address was recorded in a Page Address Register located in the core store controller. There were 32 such registers, one for each page in the core store. When an address was decoded as referring to a word in the store, the block address bits were compared with the Page Address Registers. If the block was in the core store an Equivalence signal would cause the word transfer to occur. If the block was not down in the core store a Non Equivalence signal would cause the main program to be held up, or interrupted, and a drum transfer routine entered to bring the block down from the drum. After the drum transfer the main program would continue and this time an Equivalence signal would permit the word transfer.

Thus the programmer treated the drum and core store as a One Level store of 98,304 words (576k bytes); massive by the standards of the day. Words were processed within the central processor when they were down in the core store Transfers between the core and drum were organised not by hardware but by special routines we called software.

Coping with Software

The provision of Hardware functions being implemented by Software provided by the computer manufacturer, was a significant step for the machine designers to take. Several questions were raised:

Where would the instructions be stored?
Would they use the same control register?
Would they use main storage?
Would they use the same fixed and floating point arithmetic units?
Would they need special functions in the instruction set?

For speed and safety the software routines were to be stored in a special fixed, read-only, store of 8192 words with an access time of 0.3 microseconds. They would have their own control register. Since these software routines would Interrupt the locus of control of the main program, this was termed the Interrupt Register.

For speed and safety a special System core store of 1024 words was provided.

By taking appropriate precautions, in the software, it was deemed possible to use the existing arithmetic units.

Specific new functions would be needed.

Extracode

The need for Interrupt routines to cope with the One-Level store was seen as the thin end of the wedge. We did not know what these routines would require as special functions in order to decide upon the choice of block to return to drum to make way for the block to be brought down etc. Furthermore the main programmers were asking for peculiar functions of the floating-point arithmetic unit. Also several different types of peripheral equipment were to be catered for in the future, each requiring special functions.

The solution lay in the Fixed Store. The special functions were to be implemented as routines held in the fixed store. The function code would appear within the instruction format but interpreted as an address in the Fixed Store pointing to the routine which would be implemented under a separate locus of control managed by a separate Extracode Control Register using the existing accumulators, main one-level store and special system store. This liberated the hardware designers, enabling them to get on with the task of implementing a Restricted Instruction Set Computer (RISC) of high speed.

V-Store

It was conceivable that the special functions would require some special hardware. The one-level store learning program needed the hard ware to keep a record of page utilisation to assist in deciding which page to discard etc.

The big worry was the peripheral equipment. Previous machines had limited peripheral equipment. A paper tape reader and punch were common. Special functions were wired into the hardware to control the operation of each peripheral. It was felt that this powerful machine would be expected to include not only a wide variety of known and unknown peripherals, such as line printers, but also several of each type, notably Magnetic Tape Decks to provide ancillary storage. The discussion, as to best solution, went on for several weeks until someone had the bright idea of the V-Store. (It was seen as a victory and V-Store seemed an obvious name, nowadays it is termed Image Store).

Part of the available address space was reserved for this V-Store. For example the Non Equivalence signal from the page address registers would be stored on a flip-flop which would be allocated a position within the V-Store. This Interrupt flip-flop would cause the locus of control to switch from Main Control to Interrupt Control, the Interrupt Routine would then read the V-store location and transfer it to the fixed point accumulator for processing. In the case of peripheral equipment, an example would be the flip-flop, which tells the paper tape reader to start/stop. It would be given a location in the V-store where it could be accessed by the appropriate Extracode routine. The flip-flop would be set by writing to the V-Store from the fixed-point accumulator, using a standard Accumulator to Store operation.

Autonomous Transfer Units

Most of the slow speed peripherals were routed through a special unit, which was mainly an access mechanism to the peripheral V-Store, which would be physically scattered throughout the suite of rooms housing the total ATLAS System. The Drum Store and Tape Decks were each treated as a special case and autonomous transfer units were designed to cope with the volume of traffic. Each unit was allocated space in the V-Store to enable Software control.

The Tape Coordinator unit was designed to cope with eight simultaneously transferring tape decks. The tape decks were Ampex TM2 each with a transfer rate of 90,00 six bit characters per second. The maximum transfer rate was 90,000 words per second between the unit and the main core store, one word every 88 microseconds.

The unit had to enable the pre-addressing of the fixed blocks, of 512 words, on the tape and the testing for flawless areas on the tape. Close collaboration between the hardware and software designers was essential to achieve success.

Performance

The machine, unlike its predecessors, did not have a clock in the central processor. It was felt that to tie down everything to the slowest operation, which was implied by the use of a clock, would be against the principle of the machine. Instead a single Pre-Pulse wandered round the various elements of the machine where it would initiate an action and wait for the self timing of the action to complete before wandering off to the next element. Occasionally it would initiate an action and move on to another element that could operate concurrently. For example the floating-point arithmetic unit would be completing a division operation whilst the program would be executing several fixed-point operations. Also there was a pipeline between the processor and the main core store. Here was an early example of asynchronous control, which is now commonly used in the VLSI circuits of modern computers.

It was difficult to give precise figures for the performance of the machine. An indication is given below:

Fixed point B-addition 1.59 microseconds
Floating point add, no modification 1.61 microseconds
Floating point add, double modify 2.61 microseconds
Floating point multiply, double modify 4.97 microseconds

Summary

The ATLAS project had passed several significant milestones involving many engineers at the University and within Ferranti and the Plessey Company before it could provide a reliable computing service. Mundane problems such as the provision of power supplies and cooling of the fragile germanium transistor circuits posed new problems. Computer aided design programs were developed to cope with the complexity of the design and manufacturing information. When it all came together a powerful computing machine set the bench mark for many machines in the future.

Early in 1962, when most of the Manchester ATLAS was operational and a few privileged patient users were allowed on the machine, it was said that the computing power in the UK was halved, when it was switched off.

Acknowledgements

The author has drawn heavily on A History of Manchester Computers by Simon Lavington and published by the British Computer society in 1998. Also the PhD Thesis Some Aspects of the Design and Construction of ATLAS, a High Speed Parallel Digital Computing Machine by David Aspinall, Computing Machine Laboratory, Department of Electrical Engineering of the University of Manchester in 1961.