Chilton::ACL::Statistical Fortran Programs

Statistical Fortran Programs (IBM 7090, IBM 7030, ICT Atlas)

B E Cooper

March 1965

ACL/R 2

Full image ⇗
© UKRI Science and Technology Facilities Council

1 INTRODUCTION ON THE STRUCTURE OF DATA DECKS
2 THE MULTIVARIATE ANALYSIS PROGRAM
3 THE REGRESSION-WITHIN-GROUPS PROGRAM
4 THE COMPLETE FACTORIAL EXPERIMENT PROGRAM
5 THE DIALLEL TABLE ANALYSIS PROGRAM

Sections 2 to 5 of this report describe four statistical programs written in FORTRAN for the IBM 7090, IBM 7030, and ICT-ATLAS. Each program uses a package of data input subroutines described in reference 1 and therefore have a number of features in common. Section 1 is an introductory section describing these common features and it is therefore relevant to all four programs. Thus the reader interested in only one program should read section 1 and the section describing that program.

Introduction and Acknowledgments

The programs to be described in this report are developments of programs written by members of the statistics Section of the Atomic Energy Research Establishment, Harwell. The Multivariate Analysis program is at present written in S2 for the IBM-7030 and has been written by Mrs. M. C. B. Russell. The later debugging of this program has been continued, on a voluntary basis, by Mrs. Russell after having left the Atomic Energy Authority. The author is particularly grateful to Mrs. Russell for this very valuable work. The complete Factorial Experiment program was written originally for the IBM-7090 by Mrs. C. M. Whiteside and has since been re-written for the ICT-Atlas using new methods by T. Gover of the Atlas Computer Laboratory. The Regression-Within-Groups has been written by the author for the IBM-7090 and the Diallel program has been written by the author in S2 for the IBM-7030. It is intended to have versions of all four programs available on the IBM-7090, the IBM-7030 and the ICT-Atlas. The data preparation will be the same on all three machines. Slight operational differences will exist, however, because of the operating systems used to operate the machines.

The main difference is concerned with the use of a common working tape as temporary storage on the IBM-7090 and IBM-7030 which is not necessary on the ICT-Atlas. Brief details of these differences are given in section 1.5.

References

(1) DIP - A Package of Data Input Routines written in Package, by B. E. Cooper, Atlas Laboratory Report ACL/R 1 (in preparation).

(2) The Presentation of Experimental Data to Computer, by B. E. Cooper and Mrs. C. M. Whiteside, A.E.R.E. R4250.

(3) The Analysis of Variance of Diallel Tables, by B. I. Hayman, Biometrics Vol. 10, 1954.

(4) The Analysis of Continuous Variation in a Diallel Cross of Nicotina Rustica Varieties, by J. L. Jinks, Genetics, 39, 1945, p 767-788.

1 INTRODUCTION ON THE STRUCTURE OF DATA DECKS

The programs to be described in this report make use of a package of data input subroutines which are described in detail in a separate report (reference 1). A similar package of subroutines have been described in reference 2. Detailed knowledge of the package is not necessary to an understanding of the use of these programs although a quick reading of section 1 of reference 1 or reference 2 would be found helpful. The data presentations to these programs have a number of general features in common. These are described in this first section and the detailed structures of the data presentations are described in the remaining sections.

1.1 Data Deck Formation

The information required by a program for one case normally consists of a title for identification purposes, information specifying the amount of data and the analyses required, and the data to be analysed.

We define a case as that collection of information which is self-contained and which could be presented to the program without reference to any other information or data. A case may consist of information for several variables and more than one analysis may be requested. A user of a program (a customer) normally wishes to present several cases to the computer at one time, and two or more customers may wish to use the program at the same time. These programs allow this to be done and the data deck presented to the computer may consist of several subdecks originating from different customers. Each subdeck may consist of any number of separate cases. It is clear that the cases presented must be adequately labelled so that the solutions can be identified with the original data. It is also clear that the separate subdecks must also be labelled so that the solutions can be returned to the correct customer. This is achieved by the use of three control cards which are defined and used as follows:

A CUSTOMER card is the first card of a subdeck and it contains the word CUSTOMER followed by any information necessary to identify the customer. For example
```
CUSTOMER B. E. COOPER ATLAS COMPUTER LABORATORY. 
```
A TITLE card is the first card of a case and it contains the word TITLE followed by any information necessary to identify the case. For example
```
TITLE DATA FOR EXPERIMENT 14B PERFORMED 6/6/64. 
```
The FINISH card marks the end of the data deck and it contains the word FINISH only. The operation of the program is terminated on reading the FINISH card.

Thus the data deck is divided into subdecks by CUSTOMER cards, each subdeck is divided into cases by TITLE cards and the complete deck is terminated by a FINISH card. The following example, which consists of three cases for one customer followed by two cases for a second customer, shows the structure of a data deck.

Example Data Deck 1. 
CUSTOMER B. E. COOPER ATLAS COMPUTER LABORATORY  |
TITLE EXAMPLE CASE 1                             |
Cards for example case 1                         |
TITLE EXAMPLE CASE 2                             | 3 cases for customer B. E. COOPER
Cards for example case 2                         |
TITLE EXAMPLE CASE 3                             |
Cards for example case 3                         |
CUSTOMER A. N. OTHER XYZ UNIVERSITY     |
TITLE EXPERIMENT 12 DATE 4/5/64         |
Cards for experiment 12                 | 2 cases for customer A. N. OTHER
TITLE EXPERIMENT 14 DATE 6/5/64         |
Cards for experiment 14                 |
FINISH                                  | End of data deck

1.2 The Structure of an Individual case

The detailed structure of an individual case depends, of course, on the particular program but the following general structure is common to all programs. The presentation for one case consists of the following four sections.

The Title Section: The title section consists of the TITLE card only.
The Specification Section: The specification section consists of cards containing words introducing important information necessary to the input, the labelling, and the checking of the data. Such information as the number of points, the number of variables, and the variable names, for example, may be introduced by the following card:
```
POINTS 40 VARIABLES 3 NAMES ALPHA BETA GAMMA 
```
The specification section may also contain equation cards defining operations to be performed on the data. Variables may be transformed in this way or new variables may be defined in terms of those to be presented as data. Example equation cards are:
```
LOGA = LOG(A) 
DIFF = CHEST - WAIST 
```
The rules determining the types of equation depend on the equation interpretive sub-routine included in the data input package. The rules which apply to the use of equation cards in these programs are given in references 1 and 2, and briefly summarised in section 1.7. The rewriting of the equation interpretive sub-routine would introduce a fresh set of rules to all programs.
The Data Section: The data section contains the data to be analysed - presented in an arrangement which depends on the particular program.
The Instruction Section: The instruction section contains a number of separate instructions selecting particular analyses to be performed on the data. Each instruction is punched on a separate card and equation cards may also appear in this section. The same preparation rules apply to equation cards in this section as apply to those in the specification section. It must be remembered that an equation card defining a new variable must precede all instruction cards referring to the new variable. An example of an instruction card is:
```
COMPONENT ANALYSIS ON ALL VARIABLES 
```
In some programs the total absence of instruction cards implies that a standard analysis is to be performed. The Regression-within Groups program described in section 3 does not allow the use of equation cards in its present form but transformation of the data is achieved by other, less elegant, means. Instruction cards are not used in this program since only one analysis is possible.

The specification and instruction sections use words to introduce the parameters, for example VARIABLES 3 is used to specify that three variables are to be presented. Since each parameter is introduced by a distinct word the omission of the word altogether may be taken to imply a standard value for the parameter. The omission of the word VARIABLES, for example, could be taken to imply that only one variable is to be presented. Description of the specification section and the instruction section of a program may therefore be made by listing the words which may be used and by describing the type of information introduced by each word.

1.3 The Commentary Output

Fach program produces two output streams. The first stream contains all the results as well as any necessary diagnostic comments drawing attention to presentation errors that have been detected. The second output stream or commentary output contains a very brief record of the success or failure of each case and is split up according to CUSTOMERS. If a case is successful the case title and the comment CASE COMPLETE are recorded in the commentary output. If a case is not successful the case title, one or more diagnostic comments drawing attention to the error, or errors, and the comment CASE NOT COMPLETE are recorded in the commentary output. Thus the commentary output enables unsuccessful cases to be identified quickly and it provides a list of those cases that have been presented to the computer. The organisation of the commentary output depends on the machine on which the program is being run and these details are given later in section 1.5.

1.4 The CHECK and IGNORE Facility

The arrangement of data on the cards in the data section depends on the particular program but the following important features are independent of the data layout and are available in all programs. It is permissible to include a statement in the specification section to specify checks to be applied to the data. The example statement

CHECK 1 ASCENDING 4 IDENTICAL

specifies that the first item on each data card is expected to form an ascending sequence from card to card and that the fourth item on each data card is expected to be the same on all cards. Only one CHECK statement is allowed but any number of checks may be specified in the statement. The statement is introduced by the word CHECK and each check is specified by a number - (the position of the item to be checked) - and a word - (the type of check to be performed). There are three types of checks that can be performed and these are selected by the words

ASCENDING
DESCENDING
IDENTICAL

If an error is discovered on reading the data a comment is produced in the main output stream and in the commentary output stream specifying the card on which the check failed and the item being checked.

The second feature allows the customer to specify that certain items (numbers or words) are to be ignored since they do not form part of the data to be analysed. This facility is selected by the inclusion of an IGNORE statement in the specification section. The statement consists of the word IGNORE followed by a list of integers specifying the positions on the cards of the items to be ignored. The statement

IGNORE 1 2 7 10

specifies that the 1st, 2nd 7th and 10th items on each card are to be ignored. This facility allows extra information such as labels or data not required for the present analyses to be included on the data cards so that these can form a complete and clearly labelled record of all the data that was collected.

The same item may be both checked and ignored so that an item, or items, may be included on each card solely as a check that the presentation order is correct. It is also possible to check an item which is not to be ignored.

1.5 Computer Differences

The main difference between the operation of these programs on the three computers concerns the commentary output stream. On Atlas the commentary output is produced automatically by the supervisor as output stream 7 and the main output is produced as output stream 0. The job description for Atlas must therefore include statements specifying that output streams 0 and 7 are required to appear on a line printer. That is the statements

OUTPUT 
0 LINE PRINTER 3000 LINES
7 LINE PRINTER 500 LINES

The number of lines specified for each output stream depends on the particular program, the analyses required, and the number of cases presented.

On the 7090 and 7030 computers the commentary output is written on to tape 7 and copied from tape 7 onto the main output stream when the Finish card is read at the end of the data deck. Thus tape 7 (B3 on the 7090) is used as a working tape which may be returned to common use at the end of the job. It is clear, therefore, that the omission of the FINISH card would cause the commentary output to be lost. The operating instructions for the 7090 and 7030 must include the use of a temporary working tape as tape B3 and tape 7 respectively. That is the tape is a COMMON tape before and after the program execution.

1.6 Card Punching Rules

The following rules apply to the punching of words and numbers on specification, instruction and data cards.

A number may be made up of a sign, an integral part, a decimal point, a fractional part, an exponent (power of ten) introduced by the letter E, and an exponent sign. Any of these parts may be omitted if they are unnecessary. The complete number is normally punched in consecutive columns on the card although blank columns are allowed between the sign and the beginning of the number itself and between the E for exponent and the actual exponent. The following are all legal numbers
```
1.4, +4, -3.14, 27F2, 27.8E-1, -0.43E+6, +1.8, 44.39E-4 
```
A word is any collection of the letters A to Z punched in consecutive columns with the exception listed below in rule 4.
A word and a number are normally terminated by a blank column or comma but if a word follows a number or a number follows a word the blank column between them may be omitted. It is also possible to omit the blank column between two numbers provided the division between them is still clear. For example the division between the two numbers 14.7-13.8 is clear because of the second sign but the division between two numbers punched as 14.713.8 is not clear. The normal rule is to place a blank column (or more than one if required) between items. This means that the print out of the cards is clear.
The letter E is used to introduce the exponent part of a number so that a little care must be exercised in using the letter E as a one letter word. There is no problem if E is the first letter of a longer word. If the letter E is punched immediately after a number it is taken as introducing an exponent. In all other contexts the letter E is taken as a word.
Only the first 8 letters (first 6 on the 7090) of a word are stored; the remainder are ignored.
One card may be continued on to the next, if context requires this, by punching the character $ on the end of the card (or cards) to be continued.

1.7 Equation Card Rules

The basic rules for punching words and numbers on equation cards are the same as for other cards and these have been given in section 1.6.

The additional punching rules concern the special characters:

 =  (  )  *  /

as well as + and - signs punched before a word. Since words and numbers on an equation card are separated by one of these special characters the division between items is clear. Equation cards may therefore be punched with or without spaces as desired.

Equation cards specify arithmetic operations to be performed on variables or on single values (referred to as parameters). Certain equation cards define a parameter as a function of a variable. An example of this type of function would be

AMEAN = MEAN(A)

in which the parameter AMEAN is defined as the mean of the variable A. Such parameters may be used in further equation cards but their values are not available to the program so that their definition is only useful if they are used in further equation cards. Four types of equation cards are possible and these are defined below.

1.7.1 Type 1 Equations

Type 1 equations define simple arithmetic operations to be performed on variables or parameters.

Examples are:

DIFF = CHEST - WAIST 
A    = A - AMEAN

and the following rules apply:

The right hand side may consist of one or two terms only.
Either term on the right hand side may be a variable, a parameter, or a number.
The left hand side may not be a number.
If a variable is used on the right hand side the left hand side must also be a variable and if the left hand side has not been previously defined a new variable is created.
If no variable is used on the right hand side the left hand side must be a parameter and if the left hand side has not been previously defined a new parameter is created.
The first term may be preceded by a + or - sign if desired. no sign is given the first term is taken as positive.
The second term must be preceded by one of the signs +, -, *, or / selecting addition, subtraction, multiplication or division respectively.

Violation of these rules causes the variable or parameter on the left hand side to be deleted so that analyses involving this variable will not be performed.

1.7.2 Type 2 Equations

Type 2 equations define functional relations between two variables, between two parameters, or between a parameter and a number. The following rules apply:

The right hand side is made up of a + or - sign followed by a function name followed by a variable name or a parameter name or a number enclosed in brackets. The sign may be omitted if a positive value is intended.
The left hand side must be a variable if a variable is given on the right hand side.
The left hand side must be a parameter if a parameter or a number is given on the right hand side.
The item within brackets may be preceded by a sign only if the item is a number.

The functions that are available are listed below with their mathematical definition. The definitions given apply to variables, similar definitions apply to parameters.

A = LOG(B)      a_i = log_{10(b_i)             i = 1,n   
A = ANGLEA(B)   a_i = Arcsin((b_i)^½)         i = 1,n   
A = ANGLEB(B)   a_i = Arcsin((b_i/100)^½)     i = 1,n   
A = RECIPE(B)   a_i = 1/b_i                  i = 1,n   
A = CUBERT(B)   a_i = (b_i)^⅓)                i = 1,n   
A = SIN(B)      a_i = sin(b_i)               i = 1,n   
A = EXP(B)      a_i = e^-b_i                       i = 1,n}

Violation of these rules cause the deletion of the variable or parameter on the left hand side.

1.7.3 Type 3 Equations

Type 3 equations define the value of a parameter as a function of the values of a variable. The following rules apply:

The right hand side is made up of a + or - sign followed by a function name followed by a variable name enclosed in brackets. The sign may be omitted if a positive value is intended.
The left hand side must be a parameter and if the left hand side is not yet defined a new parameter is created.
The variable included in brackets on the right hand side must not be signed.

The functions that are available and their operation are listed below.

P = SUM(B)    The sum of the values of B   
P = MIN(B)    The lowest value of B   
P = MAX(B)    The maximum value of B   
P = MEAN(B)   The average value of B   
P = VAR(B)    The variance of B

Violation of these rules cause the parameter on the left hand side to be deleted.

1.7.4 Type 4 Equations

Type 4 equations are a miscellany of equations and are listed below:

1. OUTPUT (A): Output the values of the variable A
1. OUTPUT (P): Outputs the value of the. parameter P
2. TEST (B,N1,N2): Test that all the values of variable B lie within the range N1 to N2. N1 and N2 are both numbers. If values are found outside this range the variable B is deleted.
3. NAME (A,B) or RENAME (A,B): Replace the variable name A by the name B. The name B will be used in future references to the variable and the subsequent use of the name A on the right hand side of an equation (without redefinition of A) will be diagnosed as an error.
4. DELETE (A): Delete the variable A. This instruction could be used to delete a variable which is no longer required so that the storage space occupied by the variable can be used to store a new variable. This instruction is only required if the storage space allocated to variables is completely allocated.

The rules listed above are those which apply because of the particular version of the equation interpretive subroutine that is included at present in the data input package. If this subroutine is rewritten, and it is hoped to rewrite it soon; a fresh set of rules will apply. The purpose of rewriting would be to implement more flexible rules but a new set of rules would still allow the use of equations that are consistent with the rules given above.

1.8 Repeats or Replicates

Three programs allow distinction between repeat and replicate observations. The regression within groups program allows several observations of the dependent variable for each value of the independent variable, the complete factorial experiment program allows several observations for each combination of the factor levels, and the diallel table program allows several observations for each parental combination. Observations are repeats rather than replicates when sources of variability present between points are not present between several observations for one point (the word point is used to refer to different values of the independent variable, different factor level combinations, or different parental combinations). If, for example, different points correspond to different solutions in a chemical problem and several observations are made on each solution the variability due to differences between solutions will not be present between observations made on the same solution. If this is the situation although this source of variation is present in the analysis it is not valid as a residual against which other sources may be tested because a significant result may simply reflect the known differences between solutions. The programs recognise this situation by the use of the word REPEATS instead of the word REPLICATES.

1.9 The Arrangement of Data on Data Cards

Each program expects (with one exception described in section 3.4.2) each data card in the data section to contain the same number of items and defines a minimum number (n) of items (excluding items to be ignored) to be included on one card. The descriptions of the data sections begin by defining this minimum number and go on to explain that any integral multiple (k) of n items may be punched on one physical data card. We therefore define a conceptual card as one set of n items and we define a physical card to be a card, actually read, containing kn items. Thus one physical card may represent k conceptual cards. Each program expects a fixed number of conceptual cards arranged on any number of physical cards.

The position on data cards of items to be ignored or checked refer to positions on physical cards. That is the number of items expected on any one physical card is kn + i where i is the number of items to be ignored. Thus if the number of items to be presented on a conceptual card is 6 and the items are of such size that 20 may be punched on a physical card then we may punch

a)   6 + i   where i ≤ 14      (k = 1)   
b)  12 + i   where i ≤ 8       (k = 2)   
c)  18 + i   where i = 1, or 2 (k = 3)

items on any physical data card.

2 THE MULTIVARIATE ANALYSIS PROGRAM

The multivariate analysis program has been written as a general program capable of performing a number of different analyses. In its present form only two types of analyses are included but two further analyses are being added. The presentation has been arranged so that other analyses can be added easily. The program can perform principal components analysis and a number of variants of regression analysis including multiple regression and variation descriptive regressions. Polynomial regression, canonical correlation and factor analysis are being added to the program and these are described in this report. It is intended to include group analyses such as Hotelling's T and discriminant analysis in the near future and the presentation has been already arranged to accept several groups of data.

2.1 The Data Presentation

The specification section consists of cards containing the values of parameters necessary to the input of the data such as the number of points, the number of variables and the names of each variable. Each parameter, or set of parameters, is introduced by a word, for example, POINTS 30, VARIABLES 6 and NAMES HEIGHT WEIGHT CHEST BACK LEG WAIST. The introductory words may appear in any order and may be punched in any columns on the cards in this section with the only restriction that the value of the parameter, or set of parameters, must immediately follow the introductory word on the same card. The CHECK and IGNORE facility described in Section 1.3 are available in this program. Equation cards (section 1.7) defining new variables or transforming existing variables may be included in the Specification Section.

The data section consists of one card for each point and the values of each variable are punched across the card in the order in which the variables are named in the specification section. The analyses that can be performed by the program in its present form involve only one group of data. Analyses involving more than one group are planned and the data presentation has been arranged to accept more than one group. The data section thus consists of a number of separate groups of data all prepared in the same way. Each group of data is preceded by a GROUP card consisting of the word GROUP followed by a number. This number will be used in any diagnostic comments that are produced by the program. If only one group is presented the group card may be omitted. If several groups are presented to the program in its present form each analysis requested in the instruction section is performed separately for each group of data. It is possible to include an identification number for each point which will be used as a label during output. It is also possible to present the variable means and the variance-covariance matrix instead of the raw data.

The instruction section consists of any number of instructions requesting analyses to be performed on the data or on part of the data. Words are used to request analyses so that the instructions are given in a language which is very close to normal English. Equation cards are also allowed in the instruction section.

All cards are punched according to the card preparation rules given in section 1.6.

Equation cards conform to the additional rules given in section 1.7.

2.2 An Example Presentation

The following example presentation will form the starting point for the detailed description of the presentation rules.

CUSTOMER A. N. OTHER BUILDING A.9 
TITLE EXAMPLE 4 (MULTIVARIATE PROGRAM) 
VARIABLES 6 NAMES HEIGHT WEIGHT CHEST BACK LEG AND WAIST 
POINTS 10 IGNORE 1 2 6 CHECK 1 IDENTICAL 2 ASCENDING 
HEIGHT = HEIGHT * 12.0 
DIFF = CHEST - WAIST 
41   11   6.01   175   40.4  A  24.3  32.1  30.2     |
41   14   5.73   171   36.9  B  23.1  30.8  31.9     |
...............................                      |   10 lines of data
41   32   6.24   182   38.4  B  27.4  34.0  34.6     |
COMPONENTS ANALYSIS USING ALL VARIABLES EXCEPT WEIGHT AND DIFF 
REGRESSION ANALYSIS OF WEIGHT ON HEIGHT CHEST AND LEG 
REPEAT OMITTING LEG 
POLYNOMIAL REGRESSION ORDER 3 OF WEIGHT ON CHEST 
RATIO= CHEST/BACK 
REGRESSION OF RATIO ON WAIST 
TITLE EXAMPLE 5 (MULTIVARIATE PROGRAM) 
VARIABLES 7 NAMES A B C D E F G IDENTITY 3 
POINTS 30 IGNORE 1  2 CHECK 1 IDENTICAL 
LOGA = LOG(A) 
H=B-F 
TEST (B 10.0 20.0) 
7 29.4  11  1.43  12.42  3.04  0.413 4.31  8.29  11.31    |
7 27.2  23  2.91  17.49  3.09  0.427 5.79  3.94  10.98    !
.................................                         | 30 lines of data
7 24.1  53  7.83  15.28  4.17  0.719 12.21  6.21  20.01   |
CANONICAL CORRELATION OF LOGA, B, C WITH E AND G 
REPEAT OMITTING LOGA 
REGRESSION OF A ON BEST SINGLE OMITTING H 
REGRESSION OF A ON BEST PAIR OMITTING B AND F 
FINISH

2.3 The Specification Section

The parameters that are specified in this section are introduced by words as described in the following list, listed below together with description of the information they introduce. The use of some words is optional and their omission implies a standard value(s) for the parameter(s) introduced.

Introductory Word	Information following word	Description of information introduced
GROUPS	One number.	The number of groups. Optional word; one group is assumed if this word is not presented.
POINTS	One number or K numbers, where K is the number of groups.	The number of points. If one number is given all groups are taken to have the same number of points. If K numbers are given the ith group is taken to have the number of points given as the ith number.
VARIABLES	One number.	The number of variables to be read as data.
NAMES	r words, where r is the number of variables.	The words are the names of the r variables to be read as data.
VARIANCE or COVARIANCE or both	No item following.	The presence.of either, or both, of these words specifies that the variance-covariance matrix is to be presented instead of the raw data.
MEANS	No item following.	The presence of this word specifies that the variable means are to be presented with the variance-covariance matrix. This word is ignored if the raw data is presented.
CHECK	Alternate numbers and words.	A number specifies the position on each data card of the item to be checked, the following word specifies the type of check to be made. See section 1.3.
IGNORE	A list of integers.	Fach number specifies the position on each data card of an item to be ignored. See section 1.3.
IDENTITY	One number.	Optional heading. If not used the successive data points are labelled 1 to n. If used the number following the word IDENTITY specifies the position on the card occupied by a point identification number that will be used to label the points in the output.

The following words may be used in any position in the specification section to make the specification more like English. They have no information content for the program.and may be used more than once if required.

MATRIX   AND   VALUE

The words in the list above may be punched in any columns and any order on any number of cards. The information introduced by a word must, however, be punched to follow that word on the same card. That is, for example, the variable names introduced by the word NAMES must follow the word NAMES and the complete list must appear on the same card.

Equation cards must be prepared according to the rules given in section 1.7, and they may appear either before or after, or even mixed with the specification cards. No additional information may appear on an equation card. That is parameters such as POINTS 30 must not be punched on the end of an equation card.

2.4 The Data Section

All groups of data are prepared in the same way so that it is sufficient to describe the preparation of one group of data. Checks specified to be applied to the data are made on one group of data at a time. The statement CHECK 4 ASCENDING specifies that the fourth item on each card within one group is expected to form an ascending sequence and not that the fourth item on each card for all groups is expected to form an ascending sequence. IGNORE statements refer to all groups of data.

The first card of a group is a GROUP card consisting of the word GROUP followed by a number. The number is used to refer to the group in diagnostic comments and in the output of results. If only one group of data is to be presented this card may be omitted.

2.4.1 The Presentation of the Raw Data

If the raw data is to be presented, that is if neither of the words VARIANCE or COVARIANCE have been included in the specification section, the following preparation rules apply. The values of all variables for one point are prepared on one card. The number of items on each card is expected to be the same and equal to the number of variables plus the number of items to be ignored plus one if a point identification number is given. In case one in the example presentation given above no point identification number is given, there are six variables and three items to be ignored so that nine items are expected on each card. The 1st, 2nd and 6th items are to be ignored so that the 3rd, 5th, 7th, 8th, and 9th items are the values of the six variables presented in the order given in the NAMES statement included in the specification section. In the second case the values of the seven variables A, .... ,G are punched as the 4th to the 10th items respectively since the 1st and 2nd items are to be ignored and the third item is the point identification number. The items to be ignored may be words so that word labels could be included on the data cards if desired. A check is made on the data cards that each card contains the correct number of items and that the ith item is either, a number on all cards, or a word on all cards. Error diagnostics are given in both the main output stream and in the commentary output stream.

2.4.2 The Presentation of the Variance-Covariance Matrix and Means

In some problems it is convenient to present the variance-covariance matrix instead of the raw data. If the variance-covariance matrix is to be presented either or both of the words VARIANCE and COVARIANCE must be included in the specification section. If the variable means are to be presented as well as the variance covariance matrix the word MEANS must be included in the specification section. If the variable means are not presented in this way the value zero is assumed for all variable means.

The presentation of this information in the data section is as follows. The first card is a GROUP card consisting of the word GROUP followed by the group number. This GROUP card may be omitted if only one group is presented. The second card contains the variable means if these are to be presented. The variance-covariance matrix then follows with one row of the matrix per card. If the number of variables is n the data section consists of either n or n+1 cards each containing n values required by the program.

The CHECK and IGNORE facility is available in this situation so that extra information may be included on these cards if required. The point identification number facility described above does not apply here so that the number of items on each card is n plus the number of items to be ignored.

The presentation of the variance-covariance matrix and the variable means is now illustrated by the following example cases.

             
CUSTOMER   B. E. COOPER   ATLAS COMPUTER LABORATORY     
TITLE EXAMPLE WITH V-C MATRIX AND MEANS       
VARIABLES 6   NAMES WEIGHT HEIGHT SHOULDER CHEST WAIST AND HIP   
POINTS 120   CHECK 1 ASCENDING   IGNORE 1       
VARIANCE COVARIANCE MATRIX AND MEANS       
0   138.250    67.558   16.3658   35.1950   28.0700   35.3083   
1   194.4080   18.4842   5.4817   17.6895   15.5479   17.7164   
2    18.4842    6.4296   0.5142    1.0467    0.1488    1.4945   
3     5.4817    0.5142   0.5503    0.7929    0.4071    0.5617   
4    17.6895    1.0467   0.7929    2.9892    1.9221    1.7032   
5    15.5479    0.1488   0.4071    1.9221    3.1667    1.3634   
6    17.7164    1.4945   0.5617    1.7032    1.3634    2.5421   
TITLE EXAMPLE WITH V-C MATRIX BUT NO MEANS       
VARIABLES 6   NAMES WEIGHT HEIGHT SHOULDER CHEST WAIST AND HIP   
POINTS 120   CHECK 1 ASCENDING   IGNORE 1       
1   194.4000   18.4842   5.4817   17.6895   15.5479   17.7164   
2    18.4842    6.4296   0.5142    1.0467    0.1488    1.4945   
3     5.4817    0.5142   0.5503    0.7929    0.4071    0.5617   
4    17.6895    1.0467   0.7929    2.9892    1.9221    1.7032   
5    15.5479    0.1488   0.4071    1.9221    3.1667    1.3634   
6    17.7164    1.4945   0.5617    1.7032    1.3634    2.5421

2.5 The Instruction Set

Each instruction is punched on a separate card but if one card is not sufficient to contain the instruction it may be continued onto a second (or more) card by use of the continuation character $. The character $ is punched onto the end of the card (or cards) to be continued and not on the continuation card unless that is also to be continued. Instructions may be made up of introductory words, listed below, and the information that these words introduce. Examples of the use of these words in instructions is given after this list.

Introductory Word	Information following word	Description of information introduced
REGRESSION	No item following	The presence of this word selects a regression analysis of some form.
COMPONENTS	No item following	The presence of this word selects a principal components analysis of some form.
CANONICAL	No item following	The presence of this word will select a canonical correlation analysis of some form when canonical correlation is included in the program.
FACTOR	No item following	The presence of this word will select a factor analysis of some form when factor analysis is included. in the program.
OF FOR	List of variable names	Either of these words introduces a list of variable names referred to later as list 1. List 1 is a list of variables for one side of a canonical correlation analysis or the single name of the dependent variable in a regression analysis.
DESCRIBE	A variable name or the word VARIATION	This word may be used to introduce an independent variable for a regression analysis in the same way as the words OF or FOR or it may be used to introduce the word VARIATION for a form of components analysis (see description of component analysis forms in section 2.5.1).
AT	List of percentages or proportions	This word is used with the word DESCRIBE in statements specifying either that a variable is to be described in regression terms at each of a number of significance levels (section 2.5.2) or that the total variation is to be described at these levels by the first few principal components (section 2.5.1).
ON USING WITH	List of variable names or a selection from a list of special words described after this tabulation	Anyone of these words introduce a list of variable names referred to later as list 2. List 2 is a list of independent variables for a regression analysis, a second list of variables for a canonical correlation analysis or a list of variables for components or factor analysis.
REPEAT	No item following	The presence of this word causes the previous type of analysis to be performed again. Other words included with the word REPEAT specify how the new analysis differs from the previous analysis (see next words). This word may not be used on the first instruction card.
EXCEPT OMIT OMITTING	List of variable names	ny one of these words introduce a list of names of variables to be excluded from an analysis. Used together with the word REPEAT or with a statement that all variables are to be included (EXCEPT those listed).
ADDING	List of variable names	This word introduces a list of variable names to be added to the list in the previous analysis. Used together with the word REPEAT.
FIRST	One integer	This word introduces the number of components required in a components analysis. If this is not specified all components are computed.
PRINT	No item following	The presence of this word selects additional printing in one type of regression analysis. This is described below.
GRADUATE	No item following	The presence of this word selects additional printing in regression analyses and in components analysis.

The following words may be used in instructions in any position to make the instruction more like English. They have no information content for the program and may be used more than once if required.

AND       THE      ANALYSIS       LEVEL
PRINCIPAL COLUMN   CORRELATION    LEVELS
PER       CENT     PERCENT        COMPUTE

In the table given above there are pairs or triples of words that perform the same function. The use of any one of these words is allowed so that the one which fits better from an English point of view may be chosen.

2.5.1 Instructions for Components Analysis

The variables to be included in a components analysis may be listed in a number of different ways, for example the following instructions are legal:

 
1. COMPONENTS ANALYSIS USING ALL VARIABLES 
2. COMPONENTS ANALYSIS USING ALL VARIABLES EXCEPT HEIGHT 
3. COMPONENTS ANALYSIS USING WEIGHT CHEST HIP AND WAIST

The word EXCEPT may be replaced by either of the words OMIT or OMITTING and the word USING may be replaced by either of the words ON or WITH. In the third instruction the word USING is used to introduce a list of variables to be included in the analysis, whereas in the other instructions the word USING introduces the specially recognised words ALL VARIABLES. These two words are two of the words in the special list referred to above in the description of the use of the words USING, ON and WITH. The word EXCEPT in the second instruction introduces a list of variables to be excluded from the analysis.

The components analysis subroutine will normally compute all components unless instructed to the contrary. There are two methods by which the number of components to be calculated can be reduced. The first method specifies that only a fixed number are to be computed by including the word FIRST followed by the number of components required. A complete instruction of this type is:

 
COMPUTE FIRST 3 PRINCIPAL COMPONENTS USING ALL VARIABLES.

The second method specifies that sufficient components are to be computed to describe certain percentages (or proportions) of the total variance. This is achieved by using the words DESCRIBE and AT in the following way:

DESCRIBE VARIATION AT THE 95 PERCENT LEVEL 
DESCRIBE VARIATION AT THE 95 AND 99 PERCENT LEVELS 
DESCRIBE VARIATION AT THE 0.95 LEVEL

Notice that in these instructions the word COMPONENT is not necessary. A components analysis is selected by the presence of the first two words.

The inclusion of the word GRADUATE with a components analysis instruction causes the additional output of a graduation of the values for each point on each component that is calculated. This graduation is produced in numerical order for each component and the point identification number is printed with each component value. If no point identification numbers were provided points are numbered sequentially from unity in the order of presentation of the points.

A components analysis may also be selected by a REPEAT instruction repeating a previous components analysis with additional variables or with fewer variables.

Examples are:

COMPONENTS ANALYSIS USING ALL VARIABLES EXCEPT HEIGHT 
REPEAT OMITTING WEIGHT 
REPEAT ADDING HEIGHT

2.5.2 Instructions for Regression Analysis

The normal regression analysis instructions use the word OF to introduce the dependent variable and the word ON to introduce a list of the independent variables, for example:

 
REGRESSION OF WEIGHT ON HEIGHT 
REGRESSION OF WEIGHT ON HEIGHT SHOULDER AND WAIST

The word ON (or USING or WITH) may be used with the words ALL VARIABLES as described in the previous section and the word EXCEPT (or OMIT or OMITTING) may be used to introduce a list of variables to be excluded from the analysis.

The use of the words ALL VARIABLES implies all variables except the dependent variable. Examples are:

 
REGRESSION OF HEIGHT ON ALL VARIABLES EXCEPI WAIST 
REGRESSION OF WEIGHT ON ALL VARIABLES

The word REPEAT may be used to specify a regression analysis if it follows a regression analysis instruction and the words ADDING and EXCEPT (and OMIT and OMITTING) may be used to change the list of independent variables as described in the previous section. The word FOR may be used with the word REPEAT to change the dependent variable.

Example instructions are:

REGRESSION OF WEIGHT ON HEIGHT SHOULDER AND CHEST 
REPEAT OMITTING CHEST AND SHOULDER 
REPEAT ADDING SHOULDER 
REPEAT FOR WAIST

The last instruction card above repeats the previous analysis with the dependent variable WAIST instead of WEIGHT.

The word DESCRIBE followed by a variable name will select a regression analysis with that variable taken to be the dependent variable. The word DESCRIBE is used with the word AT in an instruction of the type

DESCRIBE WEIGHT AT THE 0.99 LEVEL 
DESCRIBE WEIGHT AT THE 95 AND 99 PERCENT LEVELS

These instructions cause a large number of analyses to be performed. Firstly WEIGHT is regressed on each of the remaining variables in turn. If any of these variables. produce a sum of squares due to regression which is more than the required percentage (or percentages) of the total sum of squares then no more analyses are performed. The results for all analyses satisfying the conditions are output. If no single variable is sufficient then WEIGHT is regressed on each of the possible pairs of variables and the results for all pairs satisfying the conditions are output. If no pair of variables is sufficient the process continues taking three variables at a time, then four and so on. If n variables are found to be necessary the results for all groups of n variables that satisfy the conditions are output.

Several regression analyses may be requested at one time by the instructions

REGRESSION OF WEIGHT ON ALL SINGLES PAIRS AND TRIPLES 
REGRESSION OF WEIGHT ON BEST PAIRS

The first instruction causes output of the results for the regression of WEIGHT on each variable followed by the regression of WEIGHT on all possible pairs of variables followed by the regression of WEIGHT on all possible triples of variables. The second instruction causes the output of the results for the regression of WEIGHT on the two variables that give the best description of WEIGHT. That is the two variables which give the highest sum of squares due to regression. In these two instructions we see further special words that may be introduced by the word ON (or USING or WITH). The special words are BEST and ALL and each word introduces a list of words selected from the following ten possibilities

SINGLE PAIRS TRIPLES FOURS FIVES SIXES SEVENS 
EIGHTS NINES TENS

Thus we see that the word ON may be used to introduce:

A list of variable names
ALL VARIABLES
ALL followed by a selection from the above ten possibilities.
BEST followed by a selection from the above ten possibilities.

A WARNING should be given that the terminal S in these ten words may not be omitted even though the English sense of an instruction using one of these words may suggest this.

The inclusion of the word GRADUATE with a regression analysis instruction causes the additional output of a graduation of the fitted function at the original values of the dependent variable. The graduation consists of the point identification number if provided, the observed value of the dependent variable, the fitted value predicted by the fitted function, the difference between the observed and fitted values and the standard error of the fitted value for each point.

2.5.3 Instructions for Canonical Correlation

The instructions which will select a canonical correlation analysis when this is available use the word OF to introduce the first list of variables and the word WITH to introduce the second list of variables to be used in the analysis, for examplea:

CANONICAL CORRELATION OF WEIGHT AND HEIGHT WITH WAIST AND HIP

The REPEAT statement may be used with the word OMITTING to select further analyses for example:

 
CANONICAL CORRELATION OF WEIGHT AND HEIGHT WITH WAIST AND HIP 
REPEAT OMITTING HIP

The word ADDING normally used with the word REPEAT to add further variables to an analysis may not be used to add variables to a canonical correlation analysis because the normal ADDING statement does not include specification of which list of variables is to be increased.

The number of co~relations to be computed may be restricted in the same way as the number of components in a components analysis by using the word FIRST. For example:

COMPUTE FIRST 2 CANONICAL CORRELATIONS OF WEIGHT HEIGHT AND 
  BACK $ WITH WAIST HIP AND SHOULDERS

Use of the word GRADUATE selects additional output of a graduation of the correlation functions for each point in their original order, for each correlation computed. Each point will be labelled with its point identification number if these have been presented.

2.5.4 Instructions for Factor Analysis

The instructions which will select a factor analysis when this is available use the word ON (or USING or WITH) to introduce the list of variables to be included in the analysis, for example:

 
FACTOR ANALYSIS ON WEIGHT HEIGHT WAIST AND HIP

The word ON may also introduce the words ALL VARIABLES and the word EXCEPT (or OMIT or OMITTING) may be used to introduce a list of variables to be excluded from the analysis, for example:

FACTOR ANALYSIS ON ALL VARIABLES EXCEPT SHOULDER

The REPEAT statement may be used with the words ADDING or OMITTING (or OMIT or EXCEPT) to select further factor analyses in the way already described in sections 2.5.1 and 2.5.2 for example:

FACTOR ANALYSIS ON ALL VARIABLES 
REPEAT OMITTING SHOULDER AND HIP 
REPEAT ADDING HIP

The number of factors to be computed may be restricted, in the same way as the number of components in a components analysis, by using the word FIRST, for example:

 
COMPUTE FIRST 3 FACTORS USING WEIGHT HEIGHT WAIST AND HIP

Use of the word GRADUATE will select additional output of a graduation of the factor loadings for each point, in their original order, for each factor computed. Each point will be labelled with its point identification number if these have been presented.

2.5.5 Instructions for Other Analysis

Instructions selecting other analyses will be added later and the words making up these instructions will be published when the analyses become available. The introduction of further instructions will not affect the presentation rules described here. The new instructions will be punched on cards in the same way as those described above and will be included at any position in the instruction section.

2.5.6 Equation Cards in the Instruction Section

Equation cards prepared according to the rules given in section 1.7 may be included in the instruction section provided that equation cards defining new variables or redefining old variables appear before all instructions using the new or redefined variable.

2.6 Output

The output obtained from the program consists initially of the variance-covariance matrix, the variable means and the correlation matrix all clearly labelled with the variable names listed in the specification section. This output is produced just before any instruction cards are obeyed and the output which follows depends on the analysis specified:

The output for a regression analysis consists of:

The regression equation.
Multiple correlation coefficient.
Analysis of variance.
Graduation of the fitted function at the original data points if requested.

The output for a components analysis consists of:

Eigen values or variances of components.
Proportion of total variance seperated by components.
Eigen value check matrix checking the accuracy of computation.
Eigen vectors or component coefficients.
Component coefficients corrected to original observation scale. This introduces a constant term.
A graduation of the components in numerical order if requested.

The output for a canonical correlation analysis will consist of:

Canonical correlation coefficients.
Coefficients of canonical variates.
A graduation of the canonical variates for each point in numerical order of the first variate values if requested.

All output is clearly labelled and the variable names are used to identify quantities associated with variables. It is believed, therefore, that detailed description of the output is unnecessary.

3 THE REGRESSION-WITHIN-GROUPS PROGRAM

The regression within groups program fits straight lines to each of a number of groups of data and tests for consistency between the fitted lines. Only one analysis is possible so that there is no instruction section in the data presentation. Only two variables are allowed but the dependent variable may be replicated evenly or unevenly. Since replication is allowed the number of observations in the two variables may be different so that the use of equation cards cannot be allowed in the program until the equation interpretive subroutine is rewritten. It is possible however to transform variables by selecting a transformation from a list of eight permitted transformations. It is an easy task to include further transformations if the present list is inadequate.

3.1 The Data Presentation

The specification section consists of cards containing the values of parameters necessary to the input of the data such as the number of points in each group of data, the names of the two variables, the number of replicates in each group of data, and the variable transformations required. Each parameter, or set of parameters, is introduced by a word, for example POINTS 10 20 30, or NAMES DOSE SFRACT. These words may appear in any order and may be punched in any columns on the cards in this section with the only restriction that the value of the parameter, or set of parameters, must immediately follow the introductory word on the same card. The CHECK and IGNORE facility described in section 1.3 is available in this program.

The data section can sometimes be prepared in two ways. The first arrangement is the normal and covers all situations. The second arrangement is a more compact form of presentation which is available if there is no replication and if the values of the independent variable are the same in all groups. The compact form can also be used if there is no replication and if the values of the independent variable differ from group to group in only one or two points since a value may be declared missing by the punching of the letter M instead of a genuine data value. The form of data presentation required is specified by the presence or absence from the specification section of the word COMPACT. The normal presentation consists of several groups of data prepared in the same way each introduced by a GROUP card. The compact presentation consists of the data for all groups punched as one block of data.

All cards in the data presentation are punched according to the card preparation rules given in section 1.6.

3.2 An Example Presentation

The following example presentation will form the starting point for the detailed description of the presentation rules.

CUSTOMER B. E. COOPER ATLAS COMPUTER LABORATORY 
TITLE EXAMPLE 1 NORMAL DATA SECTION 
POINT 4 3 GROUPS 2 YTRANS 1 REPLICATES 4 2 
NAMES DOSE SFRACT DVALUE 
GROUP 1 
X₁₁   Y₁₁₁   Y₁₁₂   Y₁₁₃   Y₁₁₄   |
X₁₂   Y₁₂₁   Y₁₂₂   Y₁₂₃   Y₁₂₄   | Group 1 containing
X₁₃   Y₁₃₁   Y₁₃₂   Y₁₃₃   Y₁₃₄   | 4 points and 4 replicates
X₁₄   Y₁₄₁   Y₁₄₂   Y₁₄₃   Y₁₄₄   |
GROUP 2
X₂₁   Y₂₁₁   Y₂₁₂   |
X₂₂   Y₂₂₁   Y₂₂₂   | Group 2 containing
X₂₃   Y₂₃₁   Y₂₃₂   | 3 points and 2 replicates
TITLE EXAMPLE 2 COMPACT DATA SECTION 
GROUPS 3 POINTS 5 YTRANS 1 NAMES DOSE SFRACT 
DVALUE   COMPACT DATA     
X₁   Y₁₁₁   Y₂₁₁   Y₃₁₁   |
X₂   Y₁₂₁   Y₂₂₁   Y₃₂₁   | Same 4 x values for
X₃   Y₁₃₁   Y₂₃₁   Y₃₃₁   | all 3 groups and 4 y values
X₄   Y₁₄₁   Y₂₄₁   Y₃₄₁   | for each of the 3 groups
FINISH

3.3 The Specification Section

The parameters that are specified in this section are introduced by words as described in the following list. The use of some of these words is optional and their omission implies a standard value(s) for the parameter(s).

Introductory Word	Information following word	Description of information introduced
GROUPS	One number	This word must be presented. Introduces the number of groups.
POINTS	One number, or a list of numbers	This word must be presented. If one number is presented this implies that all groups have this number of points. If a list of numbers is given there must be one number for each group.
REPLICATES REPS	One number, or a list of numbers, or the word UNEVEN	If this word (or heading REPEATS below) is not presented it is assumed that there is no replication. If one number is presented it is assumed that all points in all groups have this number of replicates. If a list of numbers is presented there must be one number for each group and all points in the ith group are assumed to have the number of replicates given by the ith number in the list. If the word UNEVEN is presented it is assumed that the number of replicates varies from point to point.
REPEATS	One number, or a list of numbers, or the word UNEVEN	This word introduces the same information as the word REPLICATES but the subsequent analysis differs (see section 1.8) according to which word was used.
NAMES	Two variable names	This word introduces the name of tne independent and dependent variables. If this heading is not used the names XVALUE and YVALUE are used respectively.
XTRANS	One number	Optional word used to introduce the number of the transformation to be applied to the independent variable. (See section 3.3.1 for transformation list). No transformation is assumed if this heading is omitted.
YTRANS	One number	Optional word used to introduce the number of the transformation to be applied to the independent variable. (See section 3.3.1 for transformation list). No transformation is assumed if this heading is omitted.
COMPACT	No items following	The presence of this word selects the compact form of data presentation.
CHECK	Alternate numbers and words	A number specifies the position on the data cards of an item to be checked, the following word specifies the check to be made.
IGNORE	A list of numbers	Each number specifies the position on the data cards of an item to be ignored. See section 1.3.
PROBABILITY	One number	Optional word introducing the significance level to be taken in tests of significance. The value 0.05 is assumed if this heading is not used.
LIMITS	One number	Optional word introducing the confidence probability to be used in the computation of confidence limits. The value 0.95 is assumed if this heading is not used. This word anticipates a feature to be included later. In its present form the program can only supply 95% confidence limits.
DVALUE or D	No items following	The presence of this word selects additional print out of the D-value and its confidence limits. The D-value is the reciprocal of the slope.
GRADUATE	No items following	The presence of this word selects the additional print out of a graduation of each of the fitted lines at the original data points together with residuals, standard errors and confidence limits.
RESIDUALS	No items following	The presence of this word selects the additional print out of a graduation of the fitted combined line at the data points for each group.

The following words may be used in any position in the specification section to make the specification more like English. They have no information content for the program and may be used more than once if required.

VALUE   DATA   AND

The words in the list above may be punched in any columns and in any order on any number of cards. The information introduced by a word must, however, be punched to follow that word on the same card. That is, for example the variable names introduced by the word NAMES must follow the word NAMES on the same card.

3.3.1 The Transformation List

The words XTRANS and YTRANS introduce the number of the transformation to be applied to the independent and dependent variables respectively. There are eight transformations, apart from no transformation, included in subroutine TRAN2 which performs these operations. These are selected by the numbers 0 to 8 as follows:

0	No transformation
1	Log₁₀(x)	Log
2	x^½	Square Root
3	ARCSIN(x^½)	Angular transformations for proportions
4	ARCSIN((x/100)^½)	Angular transformations for percentages
5	1/x	Reciprocal
6	x^⅓	Cube root
7	Sin(x)	Sine
8	e^x	Exponential

Additions are easily made to subroutine TRAN2 to allow further transformations selected by the numbers 9, 10 etc. Comments are produced on the output that a variable has been transformed and these are produced by a subroutine OUTRAN so that if additions are made to subroutine TRAN2 corresponding additions must be made to subroutine OUTRAN. There are tests in both these subroutines that the transformation number does not exceed 8. These tests must be updated if further transformations are added.

3.4 The Data Section

If the word COMPACT is included in the specification section the compact form of data presentation is selected otherwise the normal form of data presentation is assumed.

3.4.1 The Normal Data Section (Even replication)

The normal data section consists of several groups of data each introduced by a GROUP card consisting of the word GROUP followed by a number which will be used to identify that group. Each card normally contains the data for one value of the independent variable followed by all replicate (or repeat) values of the dependent variable. It is, however, acceptable to punch on to one physical card the observations for more than one point so that each of the following arrangements are acceptable.

Arrangement 1 
X₁₁   Y₁₁₁   Y₁₁₂   Y₁₁₃   Y₁₁₄   
X₁₂   Y₁₂₁   Y₁₂₂   Y₁₂₃   Y₁₂₄  
X₁₃   Y₁₃₁   Y₁₃₂   Y₁₃₃   Y₁₃₄   
X₁₄   Y₁₄₁   Y₁₄₂   Y₁₄₃   Y₁₄₄   

Arrangement 2 
X₁₁   Y₁₁₁   Y₁₁₂   Y₁₁₃   Y₁₁₄  X₁₂   Y₁₂₁   Y₁₂₂   Y₁₂₃   Y₁₂₄  
X₁₃   Y₁₃₁   Y₁₃₂   Y₁₃₃   Y₁₃₄  X₁₄   Y₁₄₁   Y₁₄₂   Y₁₄₃   Y₁₄₄   

Arrangement 3 
X₁₁   Y₁₁₁   Y₁₁₂   Y₁₁₃   Y₁₁₄  X₁₂   Y₁₂₁   Y₁₂₂   Y₁₂₃   Y₁₂₄  
X₁₃   Y₁₃₁   Y₁₃₂   Y₁₃₃   Y₁₃₄   
X₁₄   Y₁₄₁   Y₁₄₂   Y₁₄₃   Y₁₄₄

Arrangement 3 above is acceptable but is not advised if the CHECK and IGNORE facilities are to be used. CHECK and IGNORE positions apply to positions on the physical card and position 10 for example is defined on card 1 but not on card 2. The statement IGNORE 10 causes the tenth item to be ignored if there is a tenth item so that if an item near the end of the card is to be ignored care must be exercised to ensure that it is in the same position (tenth) on each card. Checks to be made on the data apply to one block at a time. The statement CHECK 4 ASCENDING specifies that the fourth item on each card within one group is expected to form an ascending sequence and not that the fourth item on each card for all blocks is expected to form an ascending sequence. IGNORE statements refer to all groups of data.

3.4.2 The Normal Data Section (Uneven replication)

The uneven replication is specified by the statement REPLICATES UNEVEN included in the specification section. The presentation of unevenly replicated points is similar to the presentation in the equal case but data for only one point may be included on one physical card because the number of replicates for each point is computed from the number of items on each card. Each card thus consists of the value of the independent variable followed by all the corresponding dependent variable values. The following example will illustrate this arrangement.

Arrangement 1 
X₁₁   Y₁₁₁   Y₁₁₂   Y₁₁₃   Y₁₁₄   
X₁₂   Y₁₂₁    
X₁₃   Y₁₃₁   Y₁₃₂   Y₁₃₃    
X₁₄   Y₁₄₁   Y₁₄₂

The CHECK and IGNORE facilities are available for use with this form of data presentation and CHECK and IGNORE positions apply to positions on the physical card. Position 4 for example is defined on card 1 but not on card 2. The CHECK and IGNORE statements cause items to be checked and ignored if they exist on the card so that if an item near the end of the card is to be ignored care must be exercised to ensure that the item is in the same position (e.g. 4th) on each card. Items punched after the data in this form of presentation cannot be ignored since they would occupy different positions on each card. CHECK statements apply to one block of data at a time.

3.4.3 The Compact Data Section

The compact form of presentation is available when there is no replication of the dependent variable and when the values of the independent variable are the same in all groups or when the sets of values of the independent variable differ in only one or two points from group to group. In the compact form there is only one block of cards presented and all groups are included in this block. The group card normally introducing a group of data may be omitted if only one group is to be presented. Each card contains firstly the value of the independent variable followed by the values of the dependent variable one from each group. If no measurement was made of the dependent variable in one of the groups the letter M may be punched on the card in the position that the data value would have occupied. The letter M is taken to denote a missing value. The data presentation is then:

X₁   Y₁₁₁   Y₂₁₁   Y₃₁₁  
X₂   Y₁₂₁   Y₂₂₁   Y₃₂₁   
X₃   Y₁₃₁   Y₂₃₁   Y₃₃₁    
X₄   Y₁₄₁   Y₂₄₁   Y₃₄₁    
X₅   Y₁₅₁   Y₂₅₁   Y₃₅₁

This presentation looks to be the same as the even version of the normal presentation. It is worth pausing here to consider the difference between the two presentations. The first column in both presentations contains the list of values of the independent variable. The remaining columns in the normal presentation contain the replicated values for one group whereas in the compact presentation they contain the single values for each of the several groups.

3.5 The Analysis of Variance Performed

A straight line is fitted, by the method of least squares, to the data for each group. An analysis of variance on the adequacy of each line is performed and its form depends on the type of data. If the data is replicated the analysis of variance table contains three sources of variation namely:

Due to Slope.
Variation of point means about the fitted straight line.
Variation in the replicated measurements about the point means.

If source 1 is significantly greater than source 3 the slope is significantly greater than zero. If source 2 is significantly greater than source 3 the straight line is regarded as being an inadequate description of the data.

If the data is not replicated the third source of variation listed above is not available. The second source becomes the residual and we lose the test of adequacy of fit.

A third situation occurs when the data is repeated rather than replicated. A discussion of the difference between repeats and replicates is given in section 1.8. The program recognises this third situation if the word REPEAT is used instead of the word REPLICATES (or REPS) in the specification section. The action taken by the program consists of deleting the third source of variation from the analysis of variance so that this becomes the same as if only one value of the dependent variable had been collected for each value of the independent variable.

After all groups have been analysed a combined analysis is performed in which the sources of variability are:

Due to overall slope.
Difference between group slopes.
Difference between group means.
Variation of point means about the individual lines.
Variation in the replicated measurements about the point means.

The same comments made about replicated, not replicated, and repeated data apply to the combined analysis of variance. The fifth source of variability is deleted if the measurements are repeated so that this source is only present if the measurements are replicated, the fourth source must be used as residual if the fifth source is absent. The following table gives the interpretation to be made of significant sources of variation:

Source  Interpretation 
1.      Slope of combined line is not zero.
2.      Slopes of individual group lines are different.
3.      Means of groups are different.
4.      The straight lines are not adequate descriptions of the data.

3.6 Output

The output for each group of data contains:

Equation of the fitted line.
Variance of slope and constant.
Means of both variables and the number of points.
The analysis of variance already described in section 3.5.
The D-value and confidence limits if requested (DVALUE).
A graduation of the fitted line at the original values of the independent variable including observed value, fitted value, residual, standard error, and two or three sets of confidence limits. The first set apply to the fitted value, the second to observations and the third to the means at each point. The third set of confidence limits are suppressed if the data is unreplicated.

The output for the combined analysis contains:

The combined analysis of variance described in section 3.5.
Equation of the fitted line.
Variance of the slope and constant.
Means of both variables and the number of points.
The D-value and confidence limits if requested (DVALUE).
A graduation of the fitted line at the data points for the first group.
A graduation of the fitted line at all the original data points if requested (RESIDUALS).

It is believed that the output is sufficiently clearly labelled for a detailed description of the output to be unnecessary.

4 THE COMPLETE FACTORIAL EXPERIMENT PROGRAM

This program performs the usual fixed-effect model analysis of variance of a complete factorial experiment. Several variables may be presented and equation cards may be used to redefine variables or to define new variables. Any number of analyses of subsets of the data, such as an analysis omitting certain factor levels or all but one level of a factor, may be selected. All analyses are univariate even though several variables may be presented. Several variables are allowed so that new variables may be defined as functions of more than one measured variable.

4.1 The Data Presentation

The specification section consists of cards containing the values of parameters necessary to the input of the data such as the number of replicates, the number of variables and the variable names. Each parameter, or set of parameters, is introduced by a word, for example REPLICATES 2 or VARIABLES 3. These words may appear in any order and may be punched in any columns on the card in this section with the only restriction that the value of the parameter, or set of parameters, must immediately follow the introductory word on the same card. The CHECK and IGNORE facilities described in section 1.3 are available in this program and equation cards consistent with the rules given in section 1.7 may be included.

The data section may be prepared in two different arrangements. In the first arrangement the data for all variables is presented together as one block of data. In the second arrangement the data for each variable is presented separately as distinct blocks of data. In both arrangements the data is assumed to be presented in a standard order unless it is specified in the specification section that the factor levels are punched on the data cards with the observations.

The instruction section consists of any number of instructions requesting analyses to be performed on the data or on parts of the data. Words are used to request analyses so that the instructions are given in a language which is very close to normal English. Equation cards are also allowed in the instruction section.

All cards are punched according to the card preparation rules given in section 1.6. Equation cards conform to the additional rules given in section 1.7.

4.2 An Example Presentation

The following example presentation will form the starting point for the detailed description of the presentation rules.

CUSTOMER B. E. COOPER ATLAS COMPUTER LABORATORY 
TITLE EXAMPLE 1 COMPACT PRESENTATION IN STANDARD ORDER 
DIMENSIONS 6 DOSES 4 CHEMICALS 2 EXPERIMENTS 
VARIABLES 3 NAMES A B C CHECK 4 ASCENDING IGNORE 4 
38.2  29.31  147.1   1     |
34.2  28.39  158.9   2     |  48 Data Cards presented in
39.8  21.37  172.1   3     |  standard order
......................     |
29.2  21.37  138.2   48    |
D=A/B
ANALYSE EACH VARIABLE EXCEPT B OMITTING CHEMICAL 3 
ANALYSE A AND B FOR EACH EXPERIMENT 
ANALYSE A FOR CHEMICALS 1 AND 2 AND DOSES 3 AND 4 
TITLE EXAMPLE 2 PRESENTATION BY VARIABLES IN RANDOM ORDER 
DIMENSIONS 3 DOSES 2 CHEMICALS REPLICATES 4 
VARIABLES 2 NAMES A AND B INTEGERS 1 AND 2 
DATA A 
1   1  28.3  28.7  29.4  29.1   |  Data for variable A
2   1  26.8  27.3  27.3  28.1   |  Factor levels specified but
3   1  24.8  26.1  25.2  25.3   |  standard order actually presented.


1   2  30.1  30.8  31.1  29.6   |
2   2  27.4  28.8  28.2  28.7   |
3   2  25.3  27.1  26.2  26.3   |

DATA B
1   2  30.7  31.2  29.8  30.4   |
3   2  25.8  26.1  26.2  27.9   |  Data for variable B
1   1  28.8  28.4  29.3  29.0   |  Factor levels specified and
3   1  26.0  24.3  25.0  25.5   |  a non-standard order
2   2  28.1  27.2  28.3  28.9   |  presented
2   1  26.8  28.3  27.5  27.8   |
ANALYSE EACH VARIABLE         
FINISH

4.3 Specification Section

Introductory Word	Information following word	Description of information introduced
DIMENSIONS	Alternate integers and words	There must be the same number of integers as words. The number of integers gives the number of factors. The integers give the number of factor levels and the words the factor names. For example the DIMENSION statement in example 1 above specifies that there are three factors DOSES, CHEMICALS and EXPERIMENTS and that these have 6, 4 and 2 levels respectively.
REPLICATES REPS	One integer	Either of these words introduce the number of replicates. If neither of these words are used the standard value of one replicate is assumed. See REPEATS below and section 1.8.
REPEATS	One integer	This word introduces the number of repeats and is used if appropriate instead of the word REPLICATES. See REPLICATES above and section 1.8.
NAMES	A list of words	This word introduces the list of variable names. The number of names is taken as the number of variables. An alternative method of naming the variables is described in section 4.4.3.
VARIABLES	One integer	This word introduces the number of variables and is used if the .NAMES statement is not used. See section 4.4.3.
CHECK	Alternate integers and words	Each integer specifies the position on the data cards of an item to be checked, the following word specifies the check to be made. See section 1.3.
IGNORE	A list of integers	Each integer specifies the position on the data cards of an item to be ignored. See section 1.3.
INTEGERS	A list of numbers	The use of this word specifies that the levels of each factor are recorded on each data card and the ith integer in the list of integers specifies the position on the card of the level for the ith factor. There must be, therefore, one integer for each factor. For example, the INTEGERS statement in example 2 above specifies that the levels of the two factors are recorded as the first and second items respectively on each data card. Omission of this word implies that the data is presented in standard order as in example 1.
COMPACT	No item	The presence of this word selects the compact form of presentation (sections 4.4.1 and 4.4.2) in which all variables are prepared together in the same block of data as in example 1. The absence of this word selects the non-compact form of presentation (sections 4.4.3 and 4.4.4) in which each variable is presented as a separate block of data.

DATA   PRESENTATION   AND

The introductory words in the above list may be punched in any columns and in any order on any number of cards. The information introduced by a word must, however, be punched to follow that word on the same card. That is, for example the variable names introduced by the word NAMES must follow the word NAMES on the same card.

Equation cards prepared according to the rules given in section 1.7 may appear either before or after, or even mixed with the specification cards. No additional information may appear on an equation card. That is parameters such as VARIABLES 3 must not be punched on the end of an equation card.

4.4 The Data Section

Two arrangements of the data are allowed. The presence of the word COMPACT in the specification section selects the compact presentation in which all variables are presented together as one block of data. The absence of the word COMPACT selects the non-compact presentation in which each variable is presented as a separate block of data. If items expected by the program to be on one card cannot be accommodated on one card the continuation character $ may be used to continue items on to the next card (or cards). The character $ is punched at the end of a card to be continued (see section 1.9).

4.4.1 Compact Data Section (First Arrangement)

Each card normally contains all observations for all variables for one combination of factor levels. The replicate (or repeat) observations are punched first, followed by the observations for the second variable, followed by those for the third variable, and so on. It is possible to include observations for all variables for more than one combination of factor levels and this arrangement is described in the next section. The separate factor level combinations (separate cards) may be presented in random order if integers specifying the factor levels are included on the cards. The INTEGERS statement is used in the specification section to declare the positions on each data card occupied by these factor levels. If the factor levels are not specified on each card the factor level combinations (cards) must be presented in standard order. In standard order the most rapidly changing factor is the first factor. That is the first factor passes through its levels first, followed by the second, and so on. Standard order for case 1 in the example presentation given above is as follows:

Card	Levels
Card	Doses	Chemicals	Experiments
1	1	1	1
2	2	1	1
3	3	1	1
4	4	1	1
5	5	1	1
6	6	1	1
7	1	1	1
8	2	2	1
9	3	2	1
10	4	2	1
11	5	2	1
12	6	2	1
13	1	3	1
...	...	...	...
24	6	4	1
25	1	1	2
36	6	2	2
37	1	3	2
38	2	3	2
39	3	3	2
40	4	3	2
41	5	3	2
42	6	3	2
43	1	4	2
44	2	4	2
45	3	4	2
46	4	4	2
47	5	4	2
48	6	4	2

The CHECK and IGNORE facility as described in section 1.3 is available for this section. It is not necessary to include the factor level integer positions in an IGNORE statement. The statement INTEGERS 4 5 6 implies also IGNORE 4 5 6.

4.4.2 Compact Data Section (Second Arrangement)

It is possible within the COMPACT data section to include observations for all variables for more than one factor levels combination on one card. The combinations included on one card must be consecutive in the standard order and be punched in the correct order. The cards may be presented in random order if the factor levels for the first combination on the card are included otherwise standard order must be used. If the factor levels for the second and subsequent combinations presented on one card are punched an IGNORE statement must be given to ignore these integers. A possible presentation for the first example given above and including the factor levels would be:

TITLE EXAMPLE 1 MORE COMPACT PRES.ENTATION WITH FACTOR LEVELS   
DIMENSIONS 6 DOSES 4 CHEMICALS 2 EXPERIMENTS         
VARIABLES 3   NAMES   A   B   C   INTEGERS   1   2   3   IGNORE 10   
1   1   1   38.2   29.31     147.1     34.2   28.39   15a.9   1     
3   1   1   39.8   28.27     172.1     41.2   27.88   151.3   3     
5   1   1   40.8   29.37     151.2     37.2   20.97   146.2   5     
1   2   1   41.2   28.49     162.3     30.4   29.13   167.1   7     
---------------------------------------------------------------
5   4   2   37.3   27.14     163.9     39.2   26.37   158.2  47

Although the factor levels have been included in the above example the cards themselves have been placed in standard order.

The CHECK and IGNORE facility as described in section 1.3 is available for this section. As in the first arrangement it is not necessary to include the factor level integer positions for the first combination on each card in an IGNORE statement.

4.4.3 Non-Compact Data Section (First Arrangement)

Each variable is presented as a separate block of data and is preceded by a card containing the word DATA followed by the variable name. The NAMES statement may be omitted from the specification if the data is presented in the non-compact form. The data is prepared with all replicate (or repeat) observations for one factor level combination on one card. It is possible to include the data for more than one factor level combination on one card as described in the next section. The separate factor level combinations (separate card) may be presented in random order if integers specifying the factor levels are included on the cards. The INTEGERS statement is used in the specification section to declare the positions on each data card occupied by these factor levels. If the factor levels are not specified on each card the factor level combinations (cards) must be presented in standard order. Standard order is described with an example in section 4.4.1.

The CHECK and IGNORE facility as described in section 1.3 is available for this section. It is not necessary to include the factor level integer position in an IGNORE statement. The statement INTEGERS 4 5 6 implies also IGNORE 4 5 6.

4.4.4 Non-Compact Data Section (Second Arrangement)

It is possible within the non-compact data section to include observations for more than one factor levels combination on one card. The combinations included on one card must be consecutive in the standard order and be punched in the correct order. The cards may be presented in random order if the factor levels for the first combination on the card are included otherwise standard order must be used. If the factor levels for the second and subsequent combinations presented on one card are punched an IGNORE statement must be given to ignore these integers. A possible presentation for the second example given above, without factor levels, would be:

TITLE EXAMPLE 2 PRESENTATION BY VARIABLES IN STANDARD ORDER 
DIMENSIONS 3 DOSES 2 CHEMICALS REPLICATES 4 
VARIABLES 2 CHECK 1 ASCENDING  IGNORE 1 AND 2 
DATA A 
1 A 28.3 28.7 29.4 29.1 26.8 27.3 27.3 28.1 24.8 26.1 25.2 25.3 
2 A 30.1 30.8 31.1 29.6 27.4 28.8 28.2 28.7 25.3 27.1 26.2 26.3 
DATA B 
1 B 28.8 28.4 29.3 29.0 26.8 28.3 27.5 27.8 26.O 24.3 25.0 25.5 
2 B 30.7 31.2 29.0 30.4 20.1 27.2 20.3 28.9 25.8 26.1 26.2 26.9

4.5 The Instruction Section

Each instruction is punched on a separate card but if one card is not sufficient to contain an instruction it may be continued onto a second (or more) card by the use of the continuation character $. The character $ is punched at the end of the card (or cards) to be continued. Instructions may be made up of introductory words, listed below, and the information that these words introduce. Examples of the use of these words in instructions is given after this list.

Introductory Word	Information following word	Description of information introduced
ANALYSE	1) A list of variable names, or 2) The words EACH VARIABLE.	This word introduces either variable names, or a list of names of variables to be analysed or the words EACH VARIABLE which implies that all variables are to be analysed.
EXCEPT	A list of variable names.	This word is used with the statement ANALYSE EACH VARIABLE; to introduce a list of names of variables that are not to be analysed.
FOR	1) A list of factor names and levels.	This word introduces the factor levels for which separate analyses are required. The levels are introduced by the factor name and more than one factor name followed by its levels may be listed. If a factor is not listed in a FOR statement all levels are included. The statement FOR CHEMICALS 1 2 DOSES 3 4 given in example 1 in section 4.2 is an example of this type of FOR statement. See section 4.5.2.
FOR	2) The word EACH followed by factor names.	This word together with the word EACH introduces factors for which separate analyses for each combination of levels are required. For example if two factors, having 6 and 4 levels respectively, are listed 24 separate analyses including all other factors are produced. The statement FOR EACH EXPERIMENT given in example 1 in section 4.2 is an example of this type of FOR statement. See section 4.5.2.
FOR	3) Mixture of the two lists described above	This word may also introduce a mixture of types 1 and 2 above. For example, the statement FOR EACH EXPERIMENT AND DOSES 1 2 3 is legal. See section 4.5.2.
OMITTING or OMIT	A list of factor names and levels.	This word introduces the factor levels to be omitted from the analysis or analyses. This statement may be used with any of the FOR statements described above but the joint use must be carefully considered. See section 4.5.2.
PARTITIONS (Available later)	Alternate factor names and integers.	This word introduces the extent of polynomial partitioning required. The factors listed are those for which partitioning is required and the integers following each factor name specify the highest order partition required for that factor. Factors not listed in a PARTITIONS statement are not partitioned. This facility is not included in the first version of the program but will be included later. See section 4.5.3.

The following words may be used in instructions in any position to make the instructions more like English~ They have no information content for the program and may be used more than once if required.

POLYNOMIAL VARIABLE FACTOR LEVEL 
LEVELS     AND      AT

4.5.1 The Analyses of Complete Variables

Analysis of complete variables are specified by using the ANALYSE statement, and the EXCEPT statement if required. Example instructions are

ANALYSE EACH VARIABLE 
ANALYSE VARIABLES A B AND C 
ANALYSE EACH VARIABLE EXCEPT C

Each of these statements may be qualified as described in the next two sections to specify analyses on subsets of the data for each variable.

4.5.2 The Analyses for Separate Factor Levels

The FOR statement described above may be used to specify separate analyses for various combinations of factor levels. If data for a five factor experiment is presented, for example, the FOR statement may be used to specify:

Four factor analyses for each level of one of the factors.
Four factor analyses for particular levels of one of the factors.
Three factor analyses for each combination of each level of two of the factors.
Three factor analyses for each combination of particular levels, of two of the factors.
Two factor analyses for each combination of each, or particular levels, of three of the factors.
One factor analysed for each combination of each, or particular levels, of four of the factors if replicated data is presented.

It is possible to produce analyses for each combination of each level of one factor with selected levels of a second factor. This is the third type of FOR statement described above.

The OMIT (or OMITTING) statement is used to delete particular factor levels from analyses. An OMIT statement used without a FOR statement will specify analyses involving the same number of factors as the original data unless all levels except one of a factor, or factors, are listed. The omission of all levels of a factor is clearly nonsense. The OMIT and FOR statements may be used together but factors listed in the FOR statement must not be included in the OMIT statement also. That is the statement ANALYSE A FOR EACH SEX AND CHEMICAL OMITTING CHEMICAL 3 is not legal. If the factor CHEMICAL has four levels the correct instruction is ANALYSE A FOR EACH SEX AND CHEMICALS 1, 2 AND 4 We may summarise the function of the first four words by saying that the variables to be analysed are specified by using the statements ANALYSE and EXCEPT, factor levels to be omitted from analyses are specified by using the statement OMIT (or OMITTING), and factor level combinations for which separate analyses are required are specified by using the statement FOR.

Examples of instructions using the first four words are:

ANALYSE EACH VARIABLE EXCEPT C OMITTING DOSES 3 AND 4 
ANALYSE A B AND C FOR EACH SEX AND CHEMICAL OMIT DOSES 2 3 
ANALYSE EACH VARIABLE FOR EACH SEX AND CHEMICALS 1 2 3 AND 4 
ANALYSE A FOR SEX  1 AND EXPERIMENTS 1 2 AND 3 OMIT CHEMICAL 1

4.5.3 Polynomial Partitioning

It is hoped to include facilities for polynomial partitioning in the program in the near future. The facilities planned for inclusion in the next version of the program allow the specification of the factors to be partitioned and the maximum degree of partitioning for each factor. This information will be introduced by the word PARTITION followed by a list of alternate words and numbers; the words being the names of the factors to be partitioned, and the numbers being the maximum degree of partitioning. It will be assumed that the levels of factors to be partitioned are EQUALLY SPACED although it is hoped to relax this condition eventually. If, for example, factors CHEMICALS and DOSES have 6 and 4 levels respectively the statement PARTITIONS CHEMICALS 3 DOSES will cause the linear, the quadratic, and the cubic partitions for CHEMICALS and the linear effect for DOSES to be computed. The remaining partitions for each factor will be included together in sums of squares labelled CHEMICALS REMAINDER and DOSES REMAINDER. The interactions between partitions are computed as well as the partitions themselves so that the sums of squares (as far as CHEMICALS and DOSES are concerned) produced by the above statement would be:

SOURCE                                     D.F.
CHEMICALS LINEAR                            1
CHEMICALS QUADRATIC                         1
CHEMICALS CUBIC                             1
CHEMICALS REMAINDER                         2
DOSES LINEAR                                1
DOSES REMAINDER                             2
CHEMICALS LINEAR x DOSES LINEAR             1
CHEMICALS QUADRATIC x DOSES LINEAR          1
CHEMICALS CUBIC x DOSES LINEAR              1
CHEMICALS REMAINDER x DOSES LINEAR          2
CHEMICALS LINEAR  x DOSES REMAINDER         2
CHEMICALS QUADRATIC x DOSES REMAINDER       2
CHEMICALS CUBIC x DOSES REMAINDER           2
CHEMICALS REMAINDER x DOSES REMAINDER       4

A PARTITION statement may be added to any of the statements described so far, for example:

ANALYSE EACH VARIABLE EXCEPT C PARTITION DOSES 3 
ANALYSE A B AND C FOR EACH SEX AND CHEMICAL OMIT DOSES 
  2 3 PARTITION CHEMICALS 3

The use of the PARTITION statement with an OMIT statement needs careful consideration because of the assumption that the levels of factors to be partitioned are equally spaced. If levels are to be omitted from a factor for which partitioning is required the program makes the assumption that the remaining levels are equally spaced. Thus if the original levels are equally spaced, the levels remaining after an OMIT statement may no longer be equally spaced.

It is possible of course, to partition sums of squares for factors with unequally spaced levels and it is hoped that the polynomial partitioning facility will be extended in this program to include unequally spaced levels. The necessary information for this to be done is the actual factor levels and this could be given in the specification section as a statement of the form FACTOR LEVELS DOSES 1.0 2.5 5.0 10.0. Details of polynomial partitioning will be published as new facilities become available.

4.5.4 Correct Identification of Factor Names

Each word used in instructions is truncated, if necessary, by the program to eight characters (six characters on the 7090) and factor names are correctly identified by the program if the first eight (six on the 7090) letters are correct. The English nature of the instructions may suggest in some contexts that the plural form of the factor name should be used whilst in other contexts the singular form may seem appropriate. It is important, however, for factors with names, in the singular form, of less than eight characters that the same form of the factor name is used in the specification section and in the instruction section. It is suggested that this form should be the plural form as naturally used in the DIMENSIONS statement in the specification section. This will lead to the unnatural English in instructions when one level is referred to or when the word EACH is used. The correct reference to one sex in the examples used above is either

DIMENSIONS 2 SEXES - - - - 
followed by 
ANALYSE SEXES 1 - - - -

or, alternatively 

DIMENSIONS 2 SEX - - - -
followed by 
ANALYSE SEX 1

Factors with names, in the singular form, of more than eight letters are not affected and the plural and singular forms may be used interchangeably. The factor CHFMICALS used above is an example of such a factor.

It is hoped that the identification process in the program will be extended to make correct identification of factor names irrespective of the form of the name used. The simple addition of an S (or possibly also ES) to the singular form can be allowed eventually without much difficulty. Other plural forms, of course, exist and are more difficult to allow. The extension of the identification process, when programmed, will be confined to the S and ES plural forms.

4.6 Output

The output contains, initially, all main effect and interaction means for each variable. Each set of means is clearly labelled with the name of the variable and the name, or names, of the factors involved. Analyses of variance with F-ratios and probabilities are then output for each analysis selected in the instruction section. The sums of squares are labelled with the appropriate factor names and each analysis is identified by clear headings.

5 THE DIALLEL TABLE ANALYSIS PROGRAM

This program performs the analyses of diallel table data developed by Hayman (3) and Jinks (4). Several variables may be presented and equation cards may be used to redefine variables or to define fresh variables. The program is capable of fitting a straight line between any two variables and setting-up the residuals as a fresh variable which may be referred to in subsequent equation cards or analysed as a normal variable. Two forms of replication are allowed and these are referred to as normal replicates or genetic replicates. Normal replicates simply supply a residual sum of squares to the Hayman analysis of variance whereas genetic replicates interact with the other effects in the analysis.

5.1 The Data Presentation

The specification section consists of cards containing the values of parameters necessary to the input of the data such as the number of parents, the number of variables and the names of the variables. Each parameter, or set of parameters, is introduced by a word, for example, PARENTS 8, VARIABLES 3, and NAMES A B C. These words may appear in any order and may be punched in any columns on the cards in this section with the only restriction that the value of the parameter or set of parameters must immediately follow the introductory word on the same card. The CHECK and IGNORE facilities described in section 1.3 are available in this program.

Only one arrangement of the data is available in this program although other arrangements may be added if found necessary. The data is presented in blocks each consisting of the values for all variables for that block. The order of presentation within a block may be a standard order or if the parental number codes are included on the data cards the presentation order may be random.

All cards are punched according to the card preparation rules given in section 1.6. Equation cards conform to the additional rules given in section 1.7.

5.2 An Example Presentation

The following example presentation will form the starting point for the detailed description of the presentation rules.

CUSTOMER B. E. COOPER ATLAS COMPUTER LABORATORY 
TITLE EXAMPLE 1 
PARENTS 4 NORMAL REPLICATES 2 GENERIC REPLICATES 2 
BLOCKS 3 VARIABLES 2 NAMES CONTROL IRRAD 
INTEGERS 2 3 IGNORE 1 CHECK 1 IDENTICAL 
BLOCK 1 
1 1 1 C₁₁₁₁₁ C₁₁₁₁₂ C₁₁₁₂₁ C₁₁₁₂₂ I₁₁₁₁₁ I₁₁₁₁₂ I₁₁₁₂₁ I₁₁₁₂₂ |
1 1 2 C₁₁₂₁₁ C₁₁₂₁₂ C₁₁₂₂₁ C₁₁₂₂₂ I₁₁₂₁₁ I₁₁₂₁₂ I₁₁₂₂₁ I₁₁₂₂₂ |  16 cards
------------------------------------------------------------  |
1 4 4 C₁₄₄₁₁ C₁₄₄₁₂ C₁₄₄₂₁ C₁₄₄₂₂ I₁₄₄₁₁ I₁₄₄₁₂ I₁₄₄₂₁ I₁₄₄₂₂ |
BLOCK 2 
2 3 2 C₂₃₂₁₁ C₂₃₂₁₂ C₂₃₂₂₁ C₂₃₂₂₂ I₂₃₂₁₁ I₂₃₂₁₂ I₂₃₂₂₁ I₂₃₂₂₂ |
2 1 2 C₂₁₂₁₁ C₂₁₂₁₂ C₂₁₂₂₁ C₂₁₂₂₂ I₂₁₂₁₁ I₂₁₂₁₂ I₂₁₂₂₁ I₂₁₂₂₂ |  16 cards
------------------------------------------------------------  |
2 4 3 C₂₄₃₁₁ C₂₄₃₁₂ C₂₄₃₂₁ C₂₄₃₂₂ I₂₄₃₁₁ I₂₄₃₁₂ I₂₄₃₂₁ I₂₄₃₂₂ |
BLOCK 3
3 1 1 C₃₁₁₁₁ C₃₁₁₁₂ C₃₁₁₂₁ C₃₁₁₂₂ I₃₁₁₁₁ I₃₁₁₁₂ I₃₁₁₂₁ I₃₁₁₂₂ |  16 cards
------------------------------------------------------------  |
3 4 1 C₂₃₁₁₁ C₃₄₁₁₂ C₃₄₁₂₁ C₃₄₁₂₂ I₃₄₁₁₁ I₃₄₁₁₂ I₃₄₁₂₁ I₃₄₁₂₂ |
REGRESSION OF IRRAD ON CONTROL NAME RESIDUALS IRRRES 
ANALYSE VARIABLES IRRRES AND IRRAD 
FINISH

Data values for the variable CONTROL are represented by C in the above example and values for the variable IRRAD are represented by I. The five subscripts used above to label a data value represent, in order, the following:

Blocks
Female parent
Male parent
Genetic replicate
Normal replicate

The data layout is explained in greater detail in section 5.4.

5.3 The Specification Section

Introductory Word	Information following word	Description of information introduced
PARENTS	One integer	This word introduces the number of parents.
NORMAL	One integer	Optional word introducing the number of normal replicates. Standard value 1 is assumed if not used.
GENETIC	One integer	Optional word introducing the number of genetic replicates. Standard value 1 assumed if not used.
BLOCKS	One integer	Optional word introducing the number of blocks. Standard value 1 assumed if not used.
VARIABLES	One integer	Optional word introducing the number of variables. Standard value 1 assumed if not used.
NAMES	List of words	Optional word introducing the variable names. A blank name is assumed if not used. Use of this word is strongly advised if more than one variable is presented.
HALF	No information following	The presence of this word specifies that the data for half diallels is to be presented.
INTEGERS	Two integers	Optional word introducing the positions on the data cards occupied by the parent identification numbers. The first position is that of the female identification number. If this word is not used it is assumed that no identification integers are present and that all data is presented in standard order. This heading must be used for the introduction of half-diallel data. See section 5.4.
CHECK	Alternate numbers and words	A number specifies the position on the data cards of an item to be checked, the following word specifies the check to be made. See section 1.3.
IGNORE	A list of integers	Each integer specifies the position on the data cards of an item to be ignored. See section 1.3.
HAYMAN	No information following	The presence of this word selects Hayman's analysis of the data.
JINKS	No information following	The presence of this word selects Jinks analyses of the data. These words are used in the specification section when no instructions are given in the instruction section. The absence of both words implies that both analyses are to be performed.
SELFS	One number for each variable	This word introduces the variance of observations from like parent matings.
CROSSES	One number for each variable	This word introduces the variance of observations from matings between unlike parents.
REPEATS	No information following	This heading is used with the heading NORMAL (e.g. NORMAL REPEATS 2 instead of NORMAL REPLICATES 2) if the several observations for each parental combination are repeats rather than replicates. A discussion of the difference between repeats and replicates is given in section 1.8.

AND         WITH   DIALLEL    TABLE
REPLICATES  FOR    ANALYSIS   ANALYSES
VARIANCE

The introductory words in the above list may be punched in any columns and in any order on any number of cards. The information introduced by a word must, however, be punched to follow that word on the same card. That is, for example, the variable names introduced by the word NAMES must follow the word NAMES on the same card.

Equation cards punched according to the rules given in section 1.7 may be included with this section to redefine variables or to define new variables.

5.4 The Data Section

The data section consists of a number of separate blocks of data each introduced by a block card consisting of the word BLOCK followed by an integer identifying the block of data. Each card of each block normally contains all values for all variables for one parental combination. Two integers specifying the parental combination may be punched on each card and the positions on a data card occupied by these two integers are declared in the specification section by using the heading INTEGERS. If these integers are included the cards within a block may be presented in random order, but if these integers are omitted the cards are assumed to be in standard order. Standard order is illustrated in the following example of the presentation of a 4 × 4 table.

Card	Female	Male
1	1	1
2	2	1
3	3	1
4	4	1
5	1	2
6	2	2
7	3	2
8	4	2
9	1	3
10	2	3
11	3	3
12	4	3
13	1	4
14	2	4
15	3	4
16	4	4

The data for one parental combination may be classified according to normal replicates, genetic replicates and variables. Normal replicate values for the same genetic replicate and variable are punched consecutively. All normal replicates for all genetic replicates are punched in larger groups so that all values for variable 1 are followed by all values of variable 2 and so on. The order is illustrated in the following diagram for an example consisting of 2 normal replicates, 2 genetic replicates and 2 variables:


27.4 27.9       30.4 29.1      29.1 28.8     29.9 29.7
--------        ---------      ---------     ---------
 2 normal       2 normal       2 normal       2 normal      
replicates     replicates     replicates     replicates     
for genetic    for genetic    for genetic    for genetic   
replicate 1    replicate 2    replicate 1    replicate 2
--------------------------    --------------------------
    Variable 1                      Variable 2

The CHECK and IGNORE facility as described in section 1.3 is available for this section and checks are applied to each block of data separately. It is not necessary to include the parental combination integer positions in an IGNORE statement. The statement INTEGERS 1 AND 2 implies also IGNORE 1 AND 2.

5.4.1 More Compact Data Arrangements

It is possible to present all the data for two or more parental combinations on one card provided that the parental combination order on the card is consistent with the standard order. For example if the data for two parental combinations are to be included on each card in a 4 × 4 presentation the eight cards would contain the following parental combinations.

Card	Combination 1		Combination 2
Card	Female	Male	Female	Male
1	1	1	2	1
2	3	1	4	1
3	1	2	2	2
4	3	2	4	2
5	1	3	2	3
6	3	3	4	3
7	1	4	2	4
8	3	4	4	4

If integers specifying the combination are to be included on the card only those relevant to the first combination are necessary. If integers for the remaining combinations are punched these positions must be included in an IGNORE statement.

If there is no replication present and only one variable the data may, therefore, be presented with one row per card so that the print out of such data would be in the normal diallel table form. If more than one variable is present one row of each variable could be presented on each card and the print out would show several tables side by side.

5.5 The Instruction Section

If no instructions are given in this section analyses (HAYMAN or JINKS) as specified in the specification section are performed on all variables.

Each instruction is punched on a separate card but if one card is not sufficient the instruction may be continued onto a second (or more) card by use of the continuation character $. The character $ is punched onto the end of the card (or cards) to be continued and not on the continuation card unless this is also to be continued. Two types of instruction, specifying regression analyses and diallel table analyses respectively, are allowed. Instructions are made up of introductory words, listed below, and the information that these words introduce. Examples of the use of these words in instructions is given after this list.

Introductory Word	Information following word	Description of information introduced
REGRESSION	No item following	The presence of this word selects a regression analysis.
OF	One word	This word introduces the name of the dependent variable.
ON	One word	This word introduces the name of the independent variable.
CALL or NAME	One word	Either of these words introduce the variable name to be given to the residuals produced by the regression analysis.
GRADUATE	No item following	The presence of this word selects a graduation of the fitted regression line including confidence limits.
SIGNIFICANCE	One number	This optional word introduces the significance level to be used in tests of significance. The value 0.05 is taken if this word is not used.
ANALYSE	1) A list of words 2) ALL VARIABLES 3)EACH VARIABLE	This word introduces either, a list of names of the variables to be analysed, or specification using the words ALL VARIABLES or EACH VARIABLE that all variables are to be analysed.
EXCEPT	A list of words	This optional word introduces a list of names of variables that are not to be analysed. This word is intended for use with the words ANALYSE ALL VARIABLES EXCEPT - - -
HAYMAN	No item following	The presence of this word selects Haymans analysis of the data.
JINKS	No item following	The presence of this word selects Jinks analysis of the data. The absence of both words implies that both analyses are to be performed.
OMITTING or OMIT	The word BLOCKS followed by a list of integers and/or the words GENERIC REPLICATES followed by a list of integers	Either of these words introduce a list of blocks and/or a list of genetic replicates to be omitted from Hayman or Jinks analyses. This facility is not included in the program in its early form but will be made available later.

The following words may be used in instructions in any position to make the instruction more like English. They have no information content for the program and they may be used more than once if required.

LEVEL RESIDUAL   PROBABILITY LIMITS 
AND   REPLICATES ANALYSIS    ANALYSES

5.5.1 Instructions for Regression Analyses

Instructions selecting regression analyses are made up of the first six words in the list above. Examples of these instructions are:

 
REGRESSION ANALYSIS OF Y ON X CALL REIDUALS YRES GRADUATE 
REGRESSION OF A ON B NAME RESIDUALS ARES SIGNIFICANCE LEVEL 0.01

5.5.2 Instructions for Diallel Analyses

Instructions selecting analyses other than regression analyses are made up of the last five words in the list above. Examples of these instructions are:

 
ANALYSE A AND B HAYMAN ANALYSIS 
ANALYSE ALL VARIABLES EXCEPT ARES JINKS ANALYSIS

When the OMIT (or OMITTING) facility is included in the program the following instructions will be valid:

ANALYSE A OMITTING BLOCK 1 AND GENErIC REPLICATES 1 AND 4 
ANALYSE ARES OMIT GENERIC REPLICATE 3 HAYMAN ANALYSIS

5.6 Output

The output for the two types of analyses may be considered separately:

5.6.1 Output for Regression Analyses

The output for regression analyses consists of:

The equation of the fitted straight line
The variable means, the variances of the slope and constant, and the number of observations.
An analysis of variance of the significance of the slope.
A graduation, if requested, of the observed values, fitted value, standard error of the fitted value and confidence limits for each of the original points of data.

5.6.2 Output for Diallel Analyses

The output for non-regression type analyses consist of Hayman and Jinks analyses, as requested, for each of the following:

Each genetic replicate for each block.
Each genetic replicate summed over blocks.
Each block summed over genetic replicates.
Combined analyses summed over genetic replicates and blocks.

The data analysed at each stage is recorded in the output with the analyses. All output is clearly labelled and the variable names are used to identify quantities associated with variables. It is believed, therefore, that detailed description of the output is unnecessary.

Statistical Fortran Programs (IBM 7090, IBM 7030, ICT Atlas)

B E Cooper

March 1965

ACL/R 2

CONTENTS

Introduction and Acknowledgments

References

1 INTRODUCTION ON THE STRUCTURE OF DATA DECKS

1.1 Data Deck Formation

1.2 The Structure of an Individual case

1.3 The Commentary Output

1.4 The CHECK and IGNORE Facility

1.5 Computer Differences

1.6 Card Punching Rules

1.7 Equation Card Rules

1.7.1 Type 1 Equations

1.7.2 Type 2 Equations

1.7.3 Type 3 Equations

1.7.4 Type 4 Equations

1.8 Repeats or Replicates

1.9 The Arrangement of Data on Data Cards

2 THE MULTIVARIATE ANALYSIS PROGRAM

2.1 The Data Presentation

2.2 An Example Presentation

2.3 The Specification Section

2.4 The Data Section

2.4.1 The Presentation of the Raw Data

2.4.2 The Presentation of the Variance-Covariance Matrix and Means

2.5 The Instruction Set

2.5.1 Instructions for Components Analysis

2.5.2 Instructions for Regression Analysis

2.5.3 Instructions for Canonical Correlation

2.5.4 Instructions for Factor Analysis

2.5.5 Instructions for Other Analysis

2.5.6 Equation Cards in the Instruction Section

2.6 Output

3 THE REGRESSION-WITHIN-GROUPS PROGRAM

3.1 The Data Presentation

3.2 An Example Presentation

3.3 The Specification Section

3.3.1 The Transformation List

3.4 The Data Section

3.4.1 The Normal Data Section (Even replication)

3.4.2 The Normal Data Section (Uneven replication)

3.4.3 The Compact Data Section

3.5 The Analysis of Variance Performed

3.6 Output

4 THE COMPLETE FACTORIAL EXPERIMENT PROGRAM

4.1 The Data Presentation

4.2 An Example Presentation

4.3 Specification Section

4.4 The Data Section

4.4.1 Compact Data Section (First Arrangement)

4.4.2 Compact Data Section (Second Arrangement)

4.4.3 Non-Compact Data Section (First Arrangement)

4.4.4 Non-Compact Data Section (Second Arrangement)

4.5 The Instruction Section

4.5.1 The Analyses of Complete Variables

4.5.2 The Analyses for Separate Factor Levels

4.5.3 Polynomial Partitioning

4.5.4 Correct Identification of Factor Names

4.6 Output

5 THE DIALLEL TABLE ANALYSIS PROGRAM

5.1 The Data Presentation

5.2 An Example Presentation

5.3 The Specification Section

5.4 The Data Section

5.4.1 More Compact Data Arrangements

5.5 The Instruction Section

5.5.1 Instructions for Regression Analyses

5.5.2 Instructions for Diallel Analyses

5.6 Output

5.6.1 Output for Regression Analyses

5.6.2 Output for Diallel Analyses