Chilton::INF::Implementing STEP

Implementing the emerging ISO Standard STEP into a relational database

Deborah Thomas

July 1990

Proceedings of the 8th British National Conference on Databases

Abstract

STEP is the emerging International Standard for exchange of product model data. This standard will cover all areas of engineering data throughout the life-cycle of a product. The methodology being used in its creation is based on separating the logical view of the data from the physical exchange file. The logical data model which forms the basis of STEP can also be used to generate database schemas which could aid applications integration. This paper presents the work done at the Rutherford Appleton Laboratory in implementing STEP into a relational database.

1. Introduction

In database applications, engineering data has lagged behind areas such as finance and administration. Any implementations which have been done have been specific to a certain engineering application area. There are many engineering packages on the market some of which have their own database but more which use flat files in a native format to store data between runs. These work well when the package is used in isolation but more and more there is a need for these to communicate both between the same type of package e.g. CAD (Computer Aided Design) systems at a company and its subcontractor, and between different types of package e.g. CAD and CAM (Computer Aided Manufacture).

Data exchange standards which specify neutral files go some way to solving this problem but a better solution would be a neutral database. This could be accessed by all the different applications packages. If the database conformed to a neutral schema then each package would have just one interface to read from the database and one to write to it. It would be easier to archive data and to find what data is available. Each package could obtain just the information it needed from the database.

There is an emerging International Standard for exchange of engineering data known as STEP. This paper presents the work done at the Rutherford Appleton Laboratory to determine whether this standard can be used as a basis for a neutral base for integration of engineering applications.

2. The STEP Methodology

The STandard for Exchange of Product model data, known colloquially as STEP, is being developed by the ISO committee TC184/SC4/WG1.STEP was issued as ISO Draft Proposal in December 1988 and a second issue is expected early in 1990. This standard is intended to cover exchange of all data which completely define an engineering product for all applications over its expected life cycle. STEP includes areas of general use, e.g. geometry, topology, presentation and features, and specific application areas, e.g. AEC (Architecture, Engineering and Construction), finite element analysis (FEA), electrical and drafting.

The development of this standard has used a methodology which includes the use of reference models, formal definition languages and a three layer architecture application, logical and physical). Many application area models have been developed by specialised committees, using a variety of modelling languages. These collectively form the applications layer. Once a data model is believed, by the relevant committee, to be complete and correct, it is integrated into the single Integrated Product Information Model (IPIM). In the process of integration, entities which represent the same idea in different application areas are removed from any single area, placed in a part called resources and made available to all areas. This IPIM forms the logical layer and is the core of the standard.

At present, the neutral file for data exchange is the only physical implementation layer in STEP. This neutral file is not defined as an absolute thing but is defined by mapping from the IPIM. This methodology of separating the logical data model from the physical implementation means that other physical implementations may be created simply by defining a new mapping from the logical layer.

3. Applications Integration

Applications integration requires the ability for different engineering packages, e.g. CAD, FEA, CAM, to communicate. At the user level this involves design environments and UIMS (User Interface Management Systems). However, these are not sufficient; they present a common interface to the user but the packages may still not be able to share data. At present STEP can achieve a limited degree of application integration through the use of neutral files.

The software to interface between the application packages and the neutral file can be split into two parts. First there is general software for reading and writing the neutral file; this takes care of such things as checking that the neutral file conforms to the syntax, that each entity exists in the data model and has the correct number and type of attributes. Also required is software specific to an application. This will take care of transforming entities and attributes from the form required by the neutral file to that specified by the application and vice versa. For example converting the specification of a circle from three points on the circumference to a centre point and a radius. Figure 1 illustrates this exchange of data.

Figure 1: Exchange of data between applications and neutral file

A better method of integration would be to enable data sharing by using a common database. However, having one database which several applications use for their data storage does not help applications integration if each package uses the database in its own way. It is necessary to impose a standard database schema so that each package knows in exactly what form it has to store its data and in what form it may find the data it requires. Of course, each enterprise could write their own data model for their business and achieve integration within the enterprise. However, more interfaces are then needed to deal with exchange formats when it comes to exchanging data with contractors and customers.

STEP is on the way to becoming an International Standard and is being encouraged by the US DoD, so if the STEP data model is used for both applications integration and data exchange then software requirements become much simpler. Figure 2 shows how a database based on a standard schema could work.

Figure 2: Exchange of data between applications and neutral database

In this method the Neutral File can be just like another application. When a neutral file is received it may be read directly into the database and any of the applications may then access the data as normal. It can be seen that this is very similar to figure 1, with specific interfaces for each application and general software for accessing the database.

4. EXPRESS

EXPRESS [2] is an information modelling language which is being used to describe the logical content of STEP. It is a formal language which describes entities, attributes, relationships and constraints.

Each entity is described in turn together with its set of attributes and their types. The STEP information model is written in EXPRESS and the current physical layer describes a mapping from EXPRESS to a neutral file syntax. Although the work described here is directed at STEP it would apply to any data model written in EXPRESS. Example 1 shows a small part of an EXPRESS schema.

ENTITY point;
       x : REAL;
       y : REAL;
       z : OPTIONAL REAL;
END-ENTITY;
ENTITY line;
      p0 : point ;
      p1 : point ;
END-ENTITY
ENTITY curve;
      points : LIST [1:#] OF point;
END-ENTITY

Example 1: Part of an EXPRESS Schema

It can be seen in the example that an entity is represented by a name, e.g. point, and a set of attributes. The attributes are denoted by names, e.g. x, y and z, and are given a type. This can be a base type, such as REAL, INTEGER or STRING, or could be a reference to another entity such as p0 which refers to a point entity. An attribute can also be multi-valued, as can be seen by the points attribute in the curve entity which consists of a list of 1 to many references to point. There are four types of aggregate available: array, set, list and bag.

This simple example does not show any of the more powerful facilities which are available. EXPRESS has constructs for categorisation (ie. sub- and supertypes It also includes powerful rules which may be applied to attributes, entities and relationships.

5. Applications Programming Interface

A significant proportion of the work in interface writing consists of transforming entities from the internally used form to the standard form and vice versa. Once code has been written to perform these transformations for a particular application's reading and writing of a neutral file, it should not be necessary to rewrite the code to read and write from a database. The code for this work should be kept separate from the code to access the neutral data.

One way of doing this is to define a standard applications programming interface. This would consist of a set of subroutine definitions covering the tasks of obtaining a piece of data and writing a piece of data. These should be completely independent of the storage mechanism and to ensure this they should be based on the EXPRESS model. For example, a subroutine to obtain a piece of data could require three parameters: the EXPRESS entity name, the attribute name and the identifier of the particular entity required. The subroutine should return the value or values found. Different sets of subroutines could be provided to access the data from whatever storage mechanism is being used. The program using these subroutines need have no knowledge of how the data is stored. When object oriented databases become more readily available then only these subroutines need to be rewritten to allow all existing interfaces to use the new databases. This is illustrated in figure 3.

Figure 3: Independence of interface software from data storage means

The only information that is fed to the subroutines is based on the data model, therefore for this to work it is necessary for the mapping from the data model to the physical storage mechanism to be well defined. A proposed mapping to an implementation in a relational database is defined in section 6.

The STEP physical file committee is just beginning to consider the definition of a standard interface [1] and this will eventually become part of the standard. Some of the functions which the interface is expected to provide are:

query capabilities
insert capabilities
modify capabilities
delete capabilities
navigation through entity instances
schema information
error handling
open and close data management services

6. EXPRESS to SQL Mapping

The ISO STEP physical file committee is starting work on other implementations which will eventually form part of the standard. One such is an implementation into a relational database. This work will be based on the ISO standard relational database query language, SQL [3]. Some initial work was done by Joe Eggers [4] on a mapping from EXPRESS to SQL. This was taken and expanded by RAL into a fully implementable mapping [5].

The mapping generates SQL commands to create a relational database schema automatically from an EXPRESS data model. The mapping is intended to be implemented into an EXPRESS compiler and used to automatically generate an empty database. It is not intended that there should be any human intervention.

This mapping is based on the first ISO standard of SQL. It does not take into account any variations in actual vendor SQL products. In practice, different vendors do not conform completely to the standard and slight modifications will need to be incorporated for specific packages. The mapping also does not take into account the efficiency of the resulting database structure; when experience has been gained using this mapping then ways to improve access speed will be considered.

It is believed that many of the problems encountered in the mapping are resolved by SQL Addendum 1 and by the proposed SQL2. However, as these are not yet widely available in commercial products they have not been considered for this initial work. Further versions will need to take these into account.

EXPRESS describes objects of interest (entities) in terms of a set of associated attributes, where each attribute has a data type. The RDBMS stores information as relations, or simply tables. Each entity definition in the EXPRESS corresponds to one or more tables in the RDBMS, each attribute usually corresponds to a column in the entity relation, and an instance of that entity corresponds to a tuple in the relation. The tables will be generated using the SQL command CREATE TABLE.

Example 2 shows the SQL commands which would be created from the EXPRESS given in example 1 using the mapping. This example will be used to illustrate the mapping which is described in more detail in the following paragraphs. Appendix 1 shows the database tables created from example 2 together with some sample data.

CREATE TABLE POINT         
(POINT_ID             SMALLINT   NOT NULL   UNIQUE,   
X                     REAL       NOT NULL,     
Y                     REAL       NOT NULL)     
CREATE TABLE POINT_Z         
(POINT_ID             SMALLINT   NOT NULL   UNIQUE,   
Z                     REAL       NOT NULL)     
CREATE TABLE LINE         
(LINE_ID              SMALLINT   NOT NULL   UNIQUE,   
P0                    SMALLINT   NOT NULL,     
P1                    SMALLINT   NOT NULL)     
CREATE TABLE CURVE         
(CURVE_ID             SMALLINT   NOT NULL   UNIQUE)   
CREATE TABLE CURVE_POINTS_LI1       
(CURVE_ID             SMALLINT   NOT NULL   UNIQUE,   
LIST_INDEX            INTEGER    NOT NULL,     
POINTS                SMALLINT   NOTNULL)

Example 2: Automatically Generated SQL Commands

6.1 Table and Column Names

Each EXPRESS entity will generate one main table and, for the purposes of this document, this table will take the name of the entity. Likewise attribute names usually map directly onto the respective column names. There are many instances where additional tables are generated, for example, to support list and array constructs; in these cases table names are generated which are formed from a combination of owning entity and attribute names plus, in some cases, a suffix. In fact, the convention used in this document does not guarantee uniqueness of table names (a combination of entity name plus suffix could match another entity name). It is recognised that, in practice, some transformation will be necessary between the names used in EXPRESS, which are long and purposely descriptive, and the names required by SQL, which are limited to 18 characters. The exact form of this transformation is open to discussion, the two main contenders being:

Names are generated automatically by the SQL writing code according to an appropriate algorithm. Generated names will not bear any resemblance to the originals. A proposal for this is detailed in a RAL internal report [6].
Names suitable for the RDBMS are decided by human intervention ('Aliased' from the EXPRESS). There is currently no facility in EXPRESS for this. An EXPRESS compiler could perhaps read a file containing a list of aliases before generating any SQL statements.

Both of these options could ensure uniqueness of table names. In either case, a table should be created in the RDBMS to describe this transformation. This table should be populated by the EXPRESS compiler as and when a new name is generated. Thus application codes using the data in the database will have access to the original names as decided by the EXPRESS schema writer. See section 7 for more detail.

6.2 Entity Identifiers, Keys

Although there is a UNIQUE construct in EXPRESS, entities may have any combination of attributes defined as UNIQUE or none at all. In a RDBMS, each tuple in a table must be unique from every other tuple in that table. However, because EXPRESS does not force each entity to have at least one UNIQUE attribute, a unique identifier must be added to each entity instance to ensure uniqueness. The creation of the tables includes an extra column for this identifier and a program loading information into the database must invent an identifier for each entity instance that is unique over that table. For convenience, we will use a short integer. The column will be given the name of the table plus the suffix _ID. For example, a column LINE_ID is created for the entity line as shown in example 2.

6.3 Attributes with Base Types

There is a good, although not exact, correspondence between base types allowed in EXPRESS and SQL. Their mapping is shown in table 1.

Note that SQL has no logical type so the values TRUE, FALSE, UNKNOWN are mapped as enumerations of these strings. The one problem area is with strings. In ISO standard SQL it is possible to omit the length parameter from the character type but it then takes a default value of 1. There is not a type to deal with variable length strings and hence nothing on which to map the EXPRESS base type STRING. In practice, most commercial systems have some sort of variable length string which may be used.

EXPRESS	SQL
Integer	INTEGER
Integer(n)	NUMERIC(N)
Real	REAL
REAL(n)	REAL
Logical	FLOAT(N)
Number	ENUMERATION
String	REAL
String(n)	CHARACTER(N)

Table 1: Mapping of base types

An attribute of base type appears directly in the entity relation. See the attribute X in the POINT table in example 2 which has a base type of REAL.

6.4 Attributes with Entity Types

When an entity references another entity, the owning entity has a column to hold the identifier of the entity referred to. For example, the column P0 in the LINE table in example 2 which contains a key to an entry in the POINT table. Unfortunately referential integrity is not yet supported in SQL, so it is not possible to enforce that the identifier actually exists in the referenced table.

The column is given the name of the owning attribute, not the name of the entity referred to. This is to allow the same entity to be referenced more than once in different roles within an entity. To record the fact that P0 is in fact a key to the POINT table a 'system' table is used (see section 7). This enables an application to find the correct table more easily.

6.5 Attributes with Aggregate Types

SQL does not have any base types to deal with aggregates, indeed every column in a table must be single valued. However, it is possible to implement aggregate types by creating extra tables.

Attributes with Array Types

The array is an indexed aggregation of data objects having the same data type. An array declaration in EXPRESS specifies a lower and upper index. The lower index must evaluate to an integer, while the upper index may take an undefined value greater than the lower index.

The array attribute itself does not appear in the owning table but instead, for each array declared in the EXPRESS schema, a table is created in the database for each dimension. The table names are generated by a combination of entity name, attribute name and a suffix denoting the level in the array. This algorithm is known to application programs so that values in the array can be searched for.

For a single dimension array, the extra generated table contains three columns. The first column contains the key of the owning table and the second contains the index of the array. The last column contains the actual item contained in the array and is given the name of the original attribute. This item could be a base type value or could be an integer reference to an instance of another entity. A tuple will exist in this table for each index in the array. If the array is of OPTIONAL items, for example, ARRAY [1:10] OF OPTIONAL REAL, then a tuple will only exist if the value is present.

The value in the index column must lie between the bounds declared for the array but this cannot be enforced by SQL. However, the array bounds could be stored in the generated 'system' table described in section 7 and this would allow an application program to easily check the values.

Attributes with List Types

Lists can be treated very much as arrays. The important difference is that lists do not have a fixed number of members, but this constraint (on arrays) cannot be supported by a RDBMS anyway. Neither can the required length of a list be supported. Lists should be treated as arrays, except that the lower bound is always 0 and the increment is always 1 (no gaps may appear as a result of deletions). The second column in the generated table will be named LIST_INDEX.

Example 2 shows the table CURVE_POINTS_LI1 which is created to hold the list of references to the points which make up the curve.

6.6 OPTIONAL Attributes

All non-optional attributes should be declared as NOT NULL columns in the generated SQL.

Optional attributes may be dealt with by simply omitting this NOT NULL qualifier. However, there are some problems with the implementations of NULLS [7] and also with space considerations for entities with sparsely populated optional attributes. Therefore, an extra table is generated to hold the optional attribute. The table is given the name of the owning entity plus the attribute name.

This table contains the key of the owning entity and the value of the attribute. The column holding the key is again declared as UNIQUE which ensures that, at most, there can only be one tuple in the optional attribute table for each tuple in the owning table. Using this mapping, the optional attribute docs not appear in the owning table, therefore the fact that it is an attribute of that table is stored in a system' table.

Example 2 shows the table POINT_Z which is created to hold the optional attribute Z from the point entity.

6.7 UNIQUE Attributes

Attributes mentioned in a UNIQUE clause should be specified as such in the corresponding CREATE TABLE operation. SQL requires that they also be declared NOT NULL. Joint uniqueness is applied to multiple columns by using the SQL UNIQUE construct at the end of the CREATE TABLE definition. Uniqueness on a single attribute may be done in the same way as joint uniqueness or may be included in the column definition. Uniqueness may be applied to an optional attribute by designating the actual value column as UNIQUE in the extra table created to hold that attribute. If one or more of the columns in an EXPRESS UNIQUE declaration is also optional then joint uniqueness may not be applied using this mapping.

6.8 Local Rules

SQL has no way of enforcing (or even 'understanding') local rules (EXPRESS WHERE clauses). It would be the task of an application program acting on the database to apply local (and global) rules. This problem is not considered further here.

7. System Tables

In most cases, the mapping from EXPRESS ENTITIES to SQL CREATE TABLE commands is quite straightforward but in some situations more information is required to enable subroutines to automatically access the database. One way of doing this is to provide one or more 'system' tables which contain information about the original EXPRESS model. These system tables will be created and filled by the compiler.

It is proposed to create two such system tables. The first will contain the mapping between the EXPRESS entity names and the compiler generated short table names (as mentioned in section 6.1). All the table names will be contained in this table, including extra tables which are created to support arrays, optionals etc. The short table names used in Example 3 are based on the name generation algorithm proposed by K Duffey [6]. There are two additional columns in this table which contain the attribute name, when an extra table has to be created, and a tag to denote the type of extra table. For example, for a three dimensional array there will be three tuples each with the same entity and attribute names but with different tags, to denote the levels of the array, and different generated table names.

The second table will contain one tuple for each column in the generated tables. In the case of aggregate attributes, a row is included for each of the columns generated to support the aggregate. The information included is as follows:

table name - short name generated by compiler.
attribute name - full EXPRESS attribute name.
data type - type of attribute.
sequence number - position of attribute within entity in EXPRESS schema, this is used by the name generation algorithm to generate the short column names. If a different system is used then this sequence number may be omitted but the mapped short column name must be given instead.
reference - if attribute is of entity type then this column indicates the entity to which it is a reference. See section 6.4. Note that the primary component of the short table name is given here, not the full EXPRESS name.
optional - switch to designate whether or not the attribute is optional, 1 for optional and 0 for not. See section 6.6.

Example 3 shows the SQL statements to create the two empty system tables. Appendix 2 shows the tables created by the SQL in example 3 together with some sample data corresponding to example 2.

CREATE TABLE SY _NAMES 
(TABLE_NAME           CHAR(8)    NOT NULL   UNIQUE,
ENTITY_NAME           CHARACTER  NOT NULL,
ATT_NAME              CHARACTER,
TAG_FIELD             CHARACTER(4))
CREATE TABLE SY_ATTRIB 
(TABLE_NAME           CHAR(8)    NOT NULL   UNIQUE, 
ATT_NAME              CHARACTER, 
DATA_TYPE             STRING(8), 
SEQ_NO                SMALLINT, 
REFERENCE             STRING, 
OPTIONAL              SMALLINT)

Example 3. System Tables

8. Current Status and Future Work

The project at RAL is still in its initial stages. It will be using an existing semi-conductor device modelling package to test out the feasibility of the ideas presented. This package at present uses the RALBIC neutral file [8] to transfer data to and from other packages. The STEP data model for finite element data is not yet in a stable enough form for implementation, so an EXPRESS model has been generated from the RALBIC neutral file [9]. As stated earlier, the methods described here will work for any EXPRESS data model.

The mapping from EXPRESS to SQL is now in a suitable form for initial implementations, and is being coded into the EXPRESS compiler written at RAL. A database schema has been generated from the RALBIC based EXPRESS data model using the mapping.

The initial coding of the subroutine interface is complete but not fully tested. This has been kept simple at first to prove the concepts. The first aim is to extract a piece of data from the database. The subroutines that have been written receive information based on the EXPRESS model only, ie. entity name, attribute name and identifier. They then use the EXPRESS to SQL mapping to determine in which table and column the information is stored. As it has been found impossible to write a general subroutine for this purpose using embedded SQL, because variables are not allowed for table and column names, dynamic SQL has been used within C. It was during this work that the necessity of the 'system' tables was realised.

The next stage of the work is to adapt the semi-conductor device modelling package to use the subroutine interface to obtain its data from the neutral database instead of the RALBIC neutral file. The results from this work will feed back to the development of the Application Programming Interface and to the refinement of the mapping from EXPRESS to SQL. At a later stage of the work the efficiency of the interface will be considered.

Close contact is being kept with the ISO STEP implementation committee and the results of the work are expected to influence the STEP database implementation and application programming interface.

9. References

1. ISO TC184/SC4/WG1 N448, 'Functional Requirements for an Application Software Interface to STEP Data', R Cheever and L Slovensky, December 1989

2. ISO TC184/SC4/WG1 N442, 'Information Modelling Language EXPRESS', D Schenck, Dccember21 1989.

3. ISO 9075:1987, Database Language SQL

4. Joe Eggers, 'Implementing EXPRESS in SQL', McDonnell Douglas internal report, October 1988.

5. D Thomas and M Mead, 'Mapping from EXPRESS to SQL', RAL internal report, January 1990.

6. K P Duffey, 'Proposed Name Generation Algorithm, suitable for implementation in an EXPRESS-SQL compiler', RAL internal report, October 12 1989.

7. C J Date, 'Relational Database- Selected Writings', Addison-Wesley, 1986

8. RALBIC - A Simple Neutral File for Finite Element Data, Emson, Greenough, Diserens & Duffey, RAL-87-102.

9. 'A Mapping from RALBIC to CAD*I Data Formats', KP Duffey, RAL-88-107.

Appendix 1.

This appendix shows the database tables created by the example in the main text together with some sample data.

POINT
POINT_ID	X	Y
1	12.5	14.7
2	23.1	84.3
3	24.76	22.1
4	13.2	4.3
POINT_Z
POINT_ID	Z
1	15.1
2	22.4
4	24.78
CURVE
CURVE_ID
1
2
3
CURVE_POINTS_L11
CURVE_ID	LIST_INDEX	POINTS
1	1	1
1	2	4
2	1	3
2	2	4

Appendix 2.

This appendix shows the system tables created by the example in the main text together with some sample data.

SY_NAMES
TABLE_NAME	ENTITY_NAME	ATTRIB_NAME	TAG_FIELD
AAAA0_000	POINT
AAAA0_003	POINT	Z
AAAB0_000	CURVE
AAAB1_001	CURVE	POINTS	L11

SY_ATTRIB
TABLE_NAME	ATTRIB_NAME	DATA_TYPE	SEQ_NO	REF	OPTL
AAAA0_000	X	REAL	1		0
AAAA0_000	Y	REAL	2		0
AAAA0_003	Z	REAL	3		1
AAAB1_001	POINTS	REFERENCE	1	AAAA	0
AAAB1_001	LIST_INDEX	SMALLINTL	1