Chilton::ACL::ASCOP

ASCOP - A Statistical Computing Procedure

Brian Cooper

1966

STATISTICIANS have long dreamt of being able to perform, quickly and easily, any analysis of data that occurs to them whether this be an accepted analysis or a new analysis suggested by the data during the course of analysis. Until a few years ago statisticians have been restricted to those analyses that can be performed using the strictly limited resources of a desk calculating machine and an imaginative approach to data analysis has not been possible. The increase in computing power during the last few years should make possible a completely new approach to data analysis. So far this has not happened. The needs of the statistician are very varied and none of the programs written so far possess the considerable flexibility that is necessary to satisfy these needs. Within the last two years a number of statistical systems capable of performing a number of standard analyses have been written. These permit certain sequences of analyses to be performed without re-presentation of the data and they have the advantage that the data is prepared in the same way for all analyses. Although these are making the work of the data analyst easier and are encouraging the more thorough analysis of data, they are found wanting in ease of use, both to the statistician and the experimentalist who should also be encouraged to look more closely at his data, in the number and variety of analyses that may be performed, and in the ability to build up new analyses as sequences of instructions known to the system. For example a number of systems can perform a regression analysis but none of them can reference the coefficients of the fitted function in a later analysis. ASCOP is a large system which in its first version suffered the same faults as all the other systems although it was easier to use than most. The second version currently being debugged is a major revision and attempts to allow much greater flexibility to the user.

An ASCOP Program

An ASCOP program consists of a sequence of instructions which may be divided into two main types. The first of these are equations specifying arithmetic operations on variables, parameters (single values), and coefficients. New variables, parameters, and coefficients may be created and referred to in subsequent instructions of either type. Instructions of the second type are English-like sentences or phrases specifying particular analyses, or making declarations to the system. Instructions of this type may similarly define new variables, parameters, and coefficients which may be referred to later in instructions of either type. Both types of instruction may be labelled and branching statements making reference to these labels are allowed. This enables the user to specify that the performance of some analyses and arithmetic operations is conditional on the satisfaction of a particular criterion or criteria. A number of data editing operations are available including the amalgamation of several sets of data, the selective inclusion of points in a new set of data, and the inclusion of certain parameters, defined in an analysis, as a point in a new set of data. It is also possible to define subroutines made up of ASCOP instructions and equations and to call these many times over. Their definition and call are very similar to those in FORTRAN. It will be possible when a disc becomes available on ATLAS to have a set of standard subroutines stored on the disc and hence available on call to ASCOP users. It will also be possible for users to add to the standard set or, of course, to their own private set. Instructions are available in ASCOP to allow the user to specify that certain sets of data including data derived during analysis be written onto a private output tape in a form that can be presented again to ASCOP at a later time.

The Data Matrix

The basic organisational unit of data in ASCOP is the data matrix. The rows of a data matrix are referred to as POINTS and the columns as VARIABLES. Each variable may have more than one column in the data matrix and the number of columns for a variable is referred to as its replication. If variable A is replicated twice there will be two values of A in each POINT or in each row of the matrix. Thus a certain completeness in the data is implied, but in fact missing values are allowed for the incomplete situation. The fact that variables may be replicated introduces the possibility of references to point means, variances, standard deviations and numbers of replicates. Such reference is allowed in arithmetic operations and in analyses. Reference is allowed in arithmetic operations to a label associated with each point. The label may be read with the data or generated as the data is read.

Data matrices are, most commonly, read from cards but they may also be generated from other data matrices using edit operations, or generated using the random variable generation functions available as parts of the arithmetic operations. Arithmetic operations may be used to define new variables in the reading stage or in the editing stage and the inclusion of points in the data matrix may be made conditional on the values of the variables involved. Thus matrices may be formed containing those points that show specified properties. Data to be analysed in several different arrangements need be presented to the system only once and the reorganisation achieved using edit operations.

ASCOP Analysis Instructions

ASCOP analysis instructions are made up of units of information each introduced by a particular word. A unit of information may be a list of numbers and words, or a single word the presence of which has meaning. One particular unit of information defines the type of analysis to be performed and must appear first in the instruction. Other units may appear in any order but some orders will read more naturally from an English point of view than others. Analyses that are currently included in the ASCOP system are very briefly described below:

1. Read a data matrix

READ DATA MATRIX BEC 2 VARIABLE NAMES ABC D POINTS 84 
LABEL IN POSITION 3 IGNORE ITEMS 1 AND 2 REPLICATES 4 1 1 1

This instruction would read a data matrix consisting of 84 points for each of 4 variables, the first of which has 4 replicates and the remainder 1 each. The first 2 items on each data card would be ignored and the 3rd would be taken as the point label. If the number of points is unknown the word POINTS is omitted. It is also possible to read the data itself from another input stream by including STREAM n where n is the stream number with the above instruction. If this stream is a binary tape the word BINARY would be included, probably, but not necessarily, before the word STREAM.

2. Output detailed summaries for some or all variables.

OUTPUT DETAILED SUMMARIES FOR ALL VARIABLES EXCEPT A

This instruction causes the output of a detailed analysis of each variable specified including the first 4 moments, moment ratios, serial correlations, runs up and down distributions, and histograms with fitted normal distributions and assessment of fit.

3. Regression analysis.

REGRESSION OF A ON VARIABLES B C AND D

This instruction asks for the best description of A by a linear function of B C and D.

REGRESSION OF A ON BEST 2 VARIABLES

This instruction asks for the best description of A as a linear function of 2 of the remaining variables. Omission of the word BEST causes output of all possible regressions describing A by 2 of the remaining variables.

4. Components and factor analyses.

COMPONENTS ANALYSIS USING ALL VARIABLES EXCEPT A 
FACTOR ANALYSIS WITH 3 FACTORS AND USING VARIABLES B AND C
AND D

5. Analysis of variance of a complete factorial experiment.

DIMENSIONS 2 DOSES 6 EXPERIMENTS 5 TREATMENTS 
ANOVA OF VARIABLE A FOR EXPERIMENTS 4 AND 5 AND OMITTING
TREATMENT 5

The first instruction declares to the system the number of factors, the numbers of levels of each factor, and the factor names. This information remains available to the system until the next declaration and is referenced by all ANOVA instructions. This illustrates the ASCOP policy of placing structure on variables only when necessary. Thus the structure may be readily changed as required. The second instruction selects an analysis of variance with factors and levels given by the previous dimensions statement. The word FOR selects separate analyses for two levels of the factor EXPERIMENTS, that is two two-way analyses will be performed. The word OMITTING (or OMIT) causes the deletion of specified factor levels from the analysis. It should be noted that it is possible to reduce the order of the analysis by OMITTING all but one level of a factor.

6. Diallel Table Analyses.

DIALLEL TABLE ANALYSIS OF VARIABLES A AND B PARENT1 
AND BLOCKS 4

This instruction assumes that the data has the structure implied by the analysis and the value of the structure parameters PARENTS and BLOCKS. Thus the diallel structure is only relevant in this analysis and is another example of the structure-free policy. Two types of analyses are possible and are referred to as HAYMAN and JINKS analyses respectively. Both are performed unless one of the names appears in the instruction, when only that analysis is performed.

7. Edit and file-keeping operations.

START DATA MATRIX BEC 6 V ARIABLE NAMES A B C D 
ADD 24.5 39.47 84 AND PA TO DATA MATRIX BEC 6 
ADD POINTS FROM STREAM 4 TO DATA MATRIX BEC 6 
ADD POINTS FROM DATA MATRIX BEC 4 TO DATA MATRIX BEC6
   A = LOG(A) 
   IF (B-C) CONTINUE, CONTINUE, OMIT, ERROR 
COMPLETE AND SAVE DATA MATRIX BEC 6

These instructions illustrate some of the editing operations that can be performed. A new data matrix is declared to the system by the first instruction. The second adds one point to the new matrix consisting of the three numerical values given for the values of variables A Band C, and the value of parameter PA (declared as such by the instruction PARAMETER PA somewhere earlier in the program) for the value of variable D. The third instruction reads data from the specified input stream (taken as the main stream if this is not given). The fourth instruction together with the equations following select points from data matrix BEC 4 and adds them to the new data matrix if the value of C is not less than that of B. The fourth successor in the IF statement is taken if the value of the expression is undefined, that is if either of the values of B or C is missing. Any of the successors may be labels referring to other equations. Finally the last instruction causes the new data matrix to be completed and made available for other analyses. The word SAVE instructs ASCOP to write the completed matrix on to the users private output tape.

FORM DATA MATRIX BEC 8 FROM DATA MATRICES BEC 2 AND BEC 3

This instruction forms a new matrix by amalgamation of two others. Where appropriate an edit instruction may be qualified by equations defining new variables, parameters, and coefficients as well as by phrases such as EXCLUDE POINTS 2, 5, 23-45 AND 87 or OMIT VARIABLE A

8. Discriminant analysis.

DISCRIMINATE BETWEEN DATA MATRICES BEC 4 AND 5 USING ABC ANDD

Two population discriminant analyses are treated separately from those involving more than two populations, the former computing one linear function of the variables and the later computing one linear function fo each population.

9. Definition of new variables, parameters, and coefficients.

REGRESSION, COMPONENTS, FACTOR, ANOV A, and DISCRIMINANT analyses instructions can be followed by a NAMES instruction which asks for the creation of new variables, parameters, and coefficients. The form of the instruction varies according to the analysis to which it refers. The NAMES instruction can set up variables consisting of the fitted values or residuals in regression analyses and analyses of variance, and the values of components or factors in these analyses. They can also be used to set up coefficients of the various functions computed in regression, components, factor, and discriminant analyses. The decision value in a two population discriminant analysis can be set up as a parameter. Example instructions for regression, components, factor, variance, and discriminant analyses respectively are given below.

NAME FITTED VALUE AFIT RESIDUALS ARES AND COEFFICIENTS ACOF
NAME COMPONENTS CA CB AND COEFFICIENTS COFA AND COFB
NAME FACTORS FA FB AND COEFFICIENTS COF A AND COFB

NAME ARES = DATA - DOSES - DOSES - TREATMENTS 
NAME COEFFICIENTS DISC AND DECISION VALUE DEC

10. Computing the value of a linear function of specified variables with coefficients set up by a previous analysis or by earlier equations. example instruction is:

DEFINE VA AS LINEAR FUNCTION OF ABC AND D
USING COEFFICIENTS COFA

VA in this instruction may be a variable or a parameter since these instructions are treated as equations . VA may be a temporary value and a parameter will be adequate for this purpose.

The Future

At present ASCOP works in the batch processing mode taking instructions sequentially without intervention from the user. ASCOP has been written with a view to the interactive mode allowed by remote consoles. The English nature of the instructions is particularly important in this form of operation. The presence of bulk storage provided by discs and the availability of remote consoles will enable the statistician to have his data available for analysis immediately he wishes to study it. He may ask for analyses simply by typing sentences of the type exampled above, he will be available to answer questions put to him by a later version of ASCOP, and he will able to request further information as the need for it becomes apparer during the course of analysis. He will be able to try several analyses he has built up an adequate picture of his data. This total process wil not be performed at one session, the statistician will be able to store his results at each stage while he thinks, perhaps for a few days, about his problem. He will be able to try sequences of regression analyses, for example, when searching for an adequate description of a particular variable. He will be able to carry out a similar process using discriminant analysis searching for an adequate discriminant function involving as few variables as possible. Having found such a function he will be able to use it to divide further observations into the different populations. He will be able to determine the advantages of the transformation of variables provided by a components analysis before deciding the next stage of setting up the first few components. At present this must be done either in two runs or by deciding before the results of the analysis are seen. The potential of ASCOP, even in its present form, operating in the interactive mode is extremely exciting. The next developments of ASCOP will be to make such use as convenient as possible in anticipation of the availability of the necessary equipment.

The addition of further operations is not difficult and the addition of quantal response, canonical correlation, polynomial regression and regression-within-groups analyses is already planned. Additional smaller operations such as a PRINT statement allowing the user to arrange additional output on another stream are also being designed. ASCOP will continue to develop towards what is needed to make the statisticians dream come true.

ACKNOWLEDGEMENTS

I would like to acknowledge the machine time allowed to me by Bell Telephone Laboratories Inc., and the Health Sciences Computing Facility at UCLA for the debugging phase of the first version of ASCOP. The analysis of variance chapter has been written by Mr. T. Gover, the factor analysis chapter by Mr. P. Charlton and discriminant analysis by Miss S. Williams all of this laboratory.