IN recent years there has been an increasing interest in the human sciences which has brought about the gathering of numerical and qualitative information to test hypotheses about human behaviour and organisation in great quantity. In the past the summarising and grouping of this data has been gradually mechanised; in fact it was in connection with the American Census of Population of 1890 that the punched card was introduced. Until quite recently, however, to obtain a comprehensive breakdown of the data was expensive in time, money and effort. The effect on this of the advent of the large fast computer was evident from the interest shown in this topic by nearly all British universities, when asked in 1964 what work would be done in the future on the Science Research Council's Atlas. In practice it has turned out that the need had been underestimated rather than overestimated, and there is an ever increasing demand.
Initially most users want a straightforward counting of items together with ratios and simple statistical measures (means and standard deviations) calculated for different groups of categories. Since the typical user is not a programmer and the problems of one user are very much like those of another, one of the great advances that has been made in the past few years is the appreciation of the need for some general purpose program which would deal with a wide variety of requests for tabulating and presenting statistical information, and which an investigator lacking detailed computer experience could use after only a little study. This need has been met by several different organisations in the United Kingdom but the Multiple Variate Counter (MVC) Program written by A. Colin of London University is perhaps the most widely used. It is the standard program used on the Science Research Council Atlas and does satisfy the demands made on it. Most important, it is well documented and the ways of using the program are easily understood by users without previous computer experience.
They are able to start work on the survey within a few days of first studying the program manual and, with some expert assistance in getting over initial difficulties, progress can be very rapid. At the present time twenty-two surveys are being worked on in the Atlas Laboratory over a wide range of subjects - medical research, criminology, agriculture, sociology - and there is some preliminary work being done on a further ten.
To take two examples: a large-scale survey has been made of the incidence of bronchitis amongst steel workers in South Wales, for the Welsh National School of Medicine, in which the records of 10,000 cases were analysed; and the Department of Economic History at Leicester is undertaking a study of population movements, living conditions and family circumstances for samples taken from census records going back to 1841.
As experience is gained using the program, improvements are being made in several directions. First, the monitoring of the data to check the validity of the recording and the consistency of the responses between different classes of questions has been made significantly better. Second, the ability to carry out further work on assembled tables and distributions has been extended; this last feature is particularly significant because of the interest in on-line console operation which will be increasingly important in the future. The MVC program will form the basis for immediate enquiry and response for tables and analyses of various kinds.
Now that the problem of getting rapid results is being solved, the necessity for accurate and careful attention to the original data is becoming clear. The design of questionnaires so that ambiguous and inconsistent replies are not possible is becoming more important, since the computer program is being made more powerful in carrying out numerous and sometimes elaborate checks on the data: it is inefficient to have a lot of input rejected by these tests. It is also important to design the questionnaires in such a way that data can easily be recorded on a suitable computer input medium. A badly designed form can lead to many difficulties in transcribing data to, say, punched cards; and experience has shown that more time can be spent on correcting and checking faulty data than is needed to deal with the subsequent analysis. Another problem which needs attention is the order in which summarised information is required. The ease with which the investigator can now get several hundred tables of counts and percentages in a few minutes can lead to his being overwhelmed with material, and some care is necessary in developing a suitable strategy to get the best out of the enormous power now at our disposal.
The use of a large fast computer has been stressed above as necessary for this kind of work. Analyses can be carried out on small machines, but the larger machines have real advantages.
With such progress in dealing with the mechanics of elaborate data processing on a large scale, research workers using sample surveys will have to become much better educated in the use of mathematical and statistical methods. There is a danger that the lack of mathematical training in the human sciences will lead to an ever increasing gap between what can be done and the ability of the social scientists to take advantage of the new tool that he has at his disposal.