INTEGRITY
A look at the declarative referential-integrity controls provided by leading DBMS vendors.
In this month's column, I investigate the merits of using the declarative referential-integrity controls provided by popular DBMSs, as opposed to implementing these integrity controls "manually" using triggers. I will first discuss referential integrity in generic terms, including what I see as the requirements of a declarative referential-integrity control mechanism. Then I'll review the declarative referential-integrity controls provided by some of the popular DBMSs, noting the options you have for do-it-yourself referential-integrity control.
Referential-integrity constraints are based on the concept of a foreign key. A foreign key is a column or set of columns in one table that refer to the primary key of another table. The table containing the foreign key is called the referencing, foreign, or child table, while the table containing the primary key is called the referenced, primary, look-up, or parent table. The reference from the foreign key to the primary key is the mechanism whereby the relationships between the occurrences of rows in the two tables are enforced.
As an example, consider the following tables used to store data about members borrowing books from a library:
BOOKS (BookNo, Title, Author).
MEMBERS (MemberID, MemberName, MemberType).
LOANS (BookNo, MemberID, LoanDate, ReturnDate).
MEMBERTYPES(MemberType, Privileges).
BookNo is a foreign key in the loans table because it is the primary key of the books table. loans is the foreign table and books the primary table. MemberType is a foreign key in the members table because it is the primary key of the membertypes table.
The relational referential-integrity constraint specifies that either of the following must be enforced for a foreign key (C.J. Date, An Introduction to Database Systems -- Volume I, 5th edition, Addison-Wesley, 1990):
* The value of the foreign key must be a key value in the corresponding primary table.
* The value of the foreign key must be undefined (that is, null). This only applies if the foreign key does not participate in the primary key of the foreign table and if the foreign-key columns in the foreign table are allowed to have null values.
In the previous example, each BookNo in the loans table must be a BookNo that is recorded in the books table. In this case the BookNo in the loans table may not take an undefined null value as it forms part of the key of the table. If MemberType is an optional column in the members table, it must either take a value that exists in the membertypes table, or it must have a null value.
When I define a referential-integrity constraint, I should be able to specify what action the DBMS must take when the constraint is violated in each of the following scenarios:
* when a row containing an invalid foreign-key value is inserted in the foreign table
* when a foreign key in the foreign table is updated to an invalid value
* when a row with a referenced primary key is deleted from the primary table
* when a referenced primary key is updated in the primary table
I should be able to specify the following actions when the constraint is violated:
* Restrict the operation, or roll back the operation that violated the referential integrity-constraint.
* Allow the operation, but warn the user of the violation. Relational purists may cringe at this requirement, but it is useful for archive tables and in data warehouse databases.
* In the case of operations performed on the primary table, set the foreign columns to null.
* In the case of operations performed on the primary table, set the foreign columns to a predefined default value.
* In the case of operations performed on the primary table, cascade the operations on the primary table down to the foreign tables. This includes deleting all the referencing rows (a cascading delete) and updating all the referencing foreign-key values (a cascading or collateral update).
ANSI has specified a standard SQL syntax for referential-integrity constraints as part of the SQL-92 specification. (See J. Melton & A.R. Simon, Understanding The New SQL: A Complete Guide, Morgan Kaufmann, 1993.)
constraint CONSTRAINTNAME
foreign key (ColumnList) references PRIMARYTABLE (ColumnList) [match { full | partial }]
on delete { no action | cascade | set default | set null | warning }
on update { no action | cascade | set default | set null | warning }
[initially { deferred | immediate }]
[[not] deferrable]
The match clause specifies how partial and null-valued foreign keys should be handled. The on clauses specify the referential actions that should be performed when the referenced rows in the primary table are deleted or updated. If the on clauses are not specified, the default no action is assumed. The initially and deferrable clauses are used to specify the time when the referential-integrity constraint should be enforced. Most of these clauses are part of the intermediate-level ANSI-92 standard. The entry-level standard only specifies the foreign key ... references ... clause. The intermediate-level ANSI specification addresses most of the requirements I described previously, except the option to allow a constraint violation and only warn the user about it.
In addition to these requirements, I want to specify a different error or warning message for each of the four scenarios when a constraint is violated. For example, I want to have different error messages when the constraint is violated because of the insertion of a row in the foreign table and when it is violated because of the deletion of a row from the primary table. Sometimes I want to specify conditional referential-integrity checking for some of the four scenarios. An example is when users want to use predefined "undefined" values rather than null values. When the predefined "undefined" value is inserted in a foreign-key column, the referential-integrity constraints should not be checked.
The two table names in the constraint definition let you specify the constraints independently from the corresponding tables. The alias clauses let you use abbreviated names in the where clauses. The where clauses let you define conditional constraint checking. The on clauses let you specify the different actions you want activated when the constraint is violated. The message clauses let you specify the messages you want returned when the corresponding action is taken.
Let's look at another example. The referential-integrity constraint from the BookNo foreign-key column in the loans table to the books table would then be defined in the following way, using the ANSI SQL-92 standard, as part of the definition of the loans table:
constraint LOANStoBOOKS
BookNo references BOOKS ( BookNo )
on delete no action
on update cascade
Although it is not part of the ANSI SQL-92 syntax, some DBMSs let you tailor the error messages returned to users. With these DBMSs you could implement the messages illustrated here. I will point out these DBMSs in the examples in the remainder of this column. Few DBMSs, however, provide you with the functionality to implement the conditional referential checking illustrated for the "referenced delete" case.
Most relational DBMSs support declarative integrity constraints, each with its own features and shortcomings. Let's look at the implementations of declarative referential-integrity constraints in CA-OpenIngres 1.2, Borland InterBase Server 4.0, Microsoft SQL Server 6.0, Oracle Enterprise Server 7.2, SQLBase 6.0 (also licensed as CA-OpenIngres/Desktop 1.1), and Sybase SQL Server 10. For each DBMS I describe the functionality provided, the user-friendliness of the implementation, and the available options for "do-it-yourself" referential-integrity control. Note that other DBMSs such as Informix-OnLine and Progress Database Server also support declarative referential-integrity constraints.