Contact us Heritage collections Image license terms
HOME ACL Associates Technology Literature Applications Society Software revisited
Further reading □ OverviewInformation retrievalWord-count and concordance generator (COCOA)Subject indexesChemistry-related graphicsList processing systemArchaeological classificationCollocation and collocabilitySymbolic language for simulation (SOL)Prehistory and explanatory hypothesesChess program (MASTER)Himmelbett games
ACD C&A INF CCD CISD Archives Contact us Heritage archives Image license terms

Search

   
ACLApplicationsCocoa :: Non-Numerical Applications
ACLApplicationsCocoa :: Non-Numerical Applications
ACL ACD C&A INF CCD CISD Archives
Further reading

Overview
Information retrieval
Word-count and concordance generator (COCOA)
Subject indexes
Chemistry-related graphics
List processing system
Archaeological classification
Collocation and collocability
Symbolic language for simulation (SOL)
Prehistory and explanatory hypotheses
Chess program (MASTER)
Himmelbett games

Manipulation of Chemical Graphs by Computer

Judith Harrison

1968

As Miss Armitage mentioned in her paper in this collection, Dr. M. F. Lynch is currently directing two research projects at the Postgraduate School of Librarianship and Information Science in Sheffield. The second of these is concerned with chemical structures. There are about four million chemical compounds known at present; various techniques have been developed for storing and searching large files of these structures in order to identify particular molecules or to identify molecules containing given partial structures. Techniques for dealing with the reactions between compounds, however, have received less attention.[1][2] We hope to develop methods of analysis of such chemical reactions which will enable machine-readable files of reactions to be organized and searched from a variety of viewpoints.

Chemical reactions are generally represented in the form of a structural diagram :-

Figure 1

Figure 1
Full image ⇗
© UKRI Science and Technology Facilities Council

The compounds involved in the reaction can be regarded as connected sets with nodes and lines of different values, or graphs with a chemical meaning. When a chemical compound is involved in a reaction, a change takes place in its structure at the reaction site but, in general, a large part of the structure is unchanged. Thus the product and reactant structures or graphs have common fragments. Our analysis aims to find these common fragments by comparison of the graphs of the product and the reactant, and then to identify the reaction site by subtraction of these common fragments from the product and reactant structures. Initially we used two approaches. One was to generate all the common linear chains from the reactant and product graphs. The common fragments were rebuilt by overlaying these chains one on another in common ways. The second approach was to map one graph against the other starting from pairs of nodes that could be extended commonly in only one direction, that is, from possible terminal nodes of the common fragment. This latter approach seems more favourable. The algorithm has been developed and a program written by Mrs. J. E. Crowe also on the staff of the School. The program, which uses recursion, has performed successfully on both acyclic and cyclic compounds, and is now being tested more extensively.

We are also studying an alternative approach, which would be to go straight to the differences between the graphs, that is, the reaction sites, without having to find the common fragments first. This has been found to be possible for many but not all reactions by comparing the sets of bonded pairs on each side of the reaction equation. Figure 2 shows such a reaction analysis. A bonded pair is a node-connection-node triplet with all the surrounding connections specified. The skeletal reaction scheme is built up from the non-common and the extra bonded pair types (see Figure 2) on each side of the reaction equation. It is possible to predict when this approach will fail, from the result of the comparison of the bonded pair sets before the reaction scheme is rebuilt. It is probable that a reaction will be analysed first into bonded pair sets and only passed through the full fragment generation algorithm if failure is detected. By a combination of these two methods the reaction sites in the reactant and product structures can be obtained. These reaction sites form the skeletal reaction scheme for a particular reactant and product. Files of reactions can then be built up on the basis of these skeletal reaction schemes.

Figure 2

Figure 2
Full image ⇗
© UKRI Science and Technology Facilities Council

Sample reactions are collected from the current literature in the form of structural diagrams and input to the machine in the form of compact lists (see fig. 3). For each node in the compound, the node number, the node type, the lowest numbered node to which it is attached (zero for node 1), and the type of connection, are specified. For every ring present in the structure this leaves one connection unspecified, and these are specified in the ring closure list at the end of the compact list. The bonded pairs algorithm has been manually tested on a sample of 500 reactions. The program is currently being written and it is hoped to test both it and the fragment generation program on larger samples of about 2,000 reactions. The two methods of analysis can then be compared and evaluated.

Figure 3

Figure 3
Full image ⇗
© UKRI Science and Technology Facilities Council

The programming language we use is SLIP, a list-processing extension to FORTRAN. This was written by Weizenbaum [3] and implemented on Atlas by Don Russell.[4] The language has symmetric lists, that is they can be read forwards or backwards, a feature we have found advantageous.

Another topic in our chemical structure project is being studied by Dr. Town and Mrs. Crowe. This is the development of screens for searching large files of chemical compounds. During a search, the specific screens for the file are used to exclude rapidly all compounds in the file which do not contain the structural fragment being sought. Ideally each screen is present in only a small proportion of the file. The screen can be, for example, a molecular formula, an atom, a bond, a substructure, etc.

I would like to offer our thanks to the Office of Scientific and Technical Information for their financial support, to Chemical Abstracts Service for their interest and provision of data tapes, and to the Atlas Laboratory for their advice and excellent service.

References

1. Concerning one system of classification and codification of organic reactions. G. E. Vleduts. Inform. Stor. Retr. 19631,146-177.

2. Documentation of chemical reactions by computer analysis of structural changes. J. E. Armitage, J. E. Crowe, P. N. Evans, M. F. Lynch, LMcGuirk. J. Chem. Doc. 1967,7,209-215.

3. Symmetric list processor. J. Weizenbaum. Com. A.C.M. (1963) 6 524-544.

4. On the implementation of SLIP. D. B. Russell. Comm. A.C.M. (1965) 8, (5) Letter to the Editor.

⇑ Top of page
© Chilton Computing and UKRI Science and Technology Facilities Council webmaster@chilton-computing.org.uk
Our thanks to UKRI Science and Technology Facilities Council for hosting this site