TRIGGER

TRIGGER

Selecting a DBMS

By David S. Linthicum
DBMS, July 1996

Features, Functions, and Criteria You Should Consider When Choosing a DBMS.

Developers know that DBMSs are the heart of most client/server systems. The DBMS provides secure access to shared data for client applications and allows client/server developers to split the processing load between the client and the server.

Basic Idea

The idea of client/server is to perform all of the database processing on the server. The client sends a request to the server, and the server responds with the requested data. The client and the server are usually (but not always) separate and distinct, running on different machines connected via a network.

This architecture reduces the load on the network and frees the clients to perform more important interface and application processing. DBMSs can even take on some of the application processing load through the use of stored procedures and triggers, bounce back from system crashes using roll-back/roll-forward recovery, and guard data through the use of rules and two-phase commits. DBMSs also provide locking mechanisms and multiuser access controls that protect data from the dangers of concurrent access such as "dead-lock" and "deadly embrace."

There is no foolproof way to select the right DBMS for your client/server project. You must consider the power of the platform that hosts the DBMS software, SQL standards, features such as stored procedures and triggers, and database administration and monitoring tools. Let's see if I can shed some light on the features and functions of DBMSs, as well as examine what to look for when you select a DBMS for your client/server development project.

Relational Rules

Although there are many database models out there, relational DBMSs still rule the world of client/server. The success of the relational database arises largely from the relational model's simplicity and its ability to successfully map to most problem domains. The use of two-dimensional tables linked by common fields is easier to understand and work with than the traditional network, CODASYL, or hierarchical database systems.

However, the newer OO and multidimensional DBMSs are catching up quickly in some niche application areas. These types of databases work well for systems that store complex data structures or are built for online analytical processing (OLAP). OODBMSs (such as GemStone from GemStone Systems) can store data as persistent objects. OO databases are a good fit for tools that use the OO development model in its pure form (such as Smalltalk-based development tools), and they are also better homes for complex data structures that include the storage of repository information and binary information. Multidimensional DBMSs provide a more natural way for end users to search through data using data mining tools.

Unless you have special needs such as those I just described, relational DBMSs are usually the best fit for client/server systems. Relational DBMSs expose your client/server development project to the least amount of risk. For instance, most popular client/server development tools communicate with most relational DBMSs using native database middleware or through ODBC. However, many popular tools do not yet communicate with the majority of the non-relational DBMSs available.

Key Players

Seven major players dominate the realtional market: Informix Software's Informix-OnLine Dynamic Server, Sybase Inc.'s Sybase SQL Server, Computer Associates Inc.'s CA-OpenIngres, IBM's DB/2 family, Oracle Corp.'s Oracle Server, Microsoft's SQL Server, and Borland's Interbase. (For a closer look at Interbase, see Martin Rennhackkamp's column on page 74.) Each of these DBMSs can support various platforms, including many Unix flavors running on x86, RISC, and CICS processors, as well as Windows NT in both single- and multiprocessing incarnations.

Platforms and Processors

In many respects, the platform on which the DBMS runs determines the overall performance of the database server. DBMSs also depend on the operating system's ability to provide disk, caching, and network services efficiently. Therefore, selecting the right platform for your DBMS is as important as selecting the DBMS itself.

Although exceptions exist, x86-, RISC-, and Sparc-based servers dominate the market, with many flavors of Unix and Windows NT providing host operating system services in both uni- and multiprocessing versions. As the old guard of the DBMS operating systems, Unix is able to provide true preemptive multitasking and multithreading to its native applications, including the DBMS. Unix is sold under many names for many processors, including SCO Unix, UnixWare, and Solaris for the x86, as well as Solaris for the Sparc, and HP-UX and AIX for RISC processors.

To exploit the best features of Unix, DBMS vendors such as Oracle, Informix, and Sybase have mastered Unix's ability to support numerous simultaneous client connections as standalone processes or threads (lightweight processes). They also capitalized on Unix's excellent I/O performance, memory management, and task management capabilities. All of this comes at the cost of having to manage the complexities of the Unix operating system. In addition, high-end Unix servers (such as IBM's RS/6000 and HP's HP-9000) can cost more than $50,000.

Although Unix is still a favored host operating system for DBMSs, Windows NT is rapidly gaining popularity. Windows NT also provides all of the features of Unix, but it wraps them in a user-friendly, easily manageable package. Although Windows NT does not have the database server vendor following of Unix, the trend is clearly on its side. Products such as Microsoft's SQL Server can exploit the capabilities of Windows NT, including the ability to support many client connections in a single thread and thus increase the server's ability to handle a large connection load. You can also see the increase in importance in the release schedules of the DBMS products -- the NT releases are coming earlier and earlier on the porting and release schedules.

SQL Standards

It's important to understand the ANSI/ISO standards before you select a DBMS. Currently there are three SQL standards with which you should concern yourself: SQL-89, SQL-92, and SQL3.

SQL-89 is the oldest and most useless of all ANSI SQL standards. SQL-89 is simply the least common denominator of SQL capabilities supported by existing products at the time the standard was adopted. Thus, existing products conformed to SQL-89 at the time of its release, but the label "SQL-89 compliant" is mostly meaningless. SQL-89 did not move the DBMS world any closer to a unified SQL.

SQL-92 compliance is desirable because of the many features it adds to your DBMS. For example, SQL-92 adds support for SQL agents, which are programs or interactive users that generate SQL. SQL-92 also supports SQL client/server connections, meaning that SQL agents must ask the SQL client code to connect to an available SQL server. In addition, SQL-92 provides embedded SQL support for modern languages, dynamic SQL, advanced data types such as BLOBs, and standardized error codes and diagnostics.

Although most DBMS vendors are not yet SQL-92 compliant, there is another standard on the horizon. SQL3 will add even more features, such as Object SQL capabilities that include encapsulation, methods, user-defined data types, and inheritance.

Working around the OS

Performance is everything in the DBMS market. To maintain and improve performance, DBMS vendors learned to bypass the operating system to access system resources directly. This trick has some tradeoffs. For example, most DBMSs (including Informix, Sybase, and Oracle) can bypass an operating system's native file system and go directly to a physical disk partition called a raw disk partition. This way, the DBMS server does not have to absorb the overhead required to go through operating system calls to access the disk. The result is faster disk I/O performance and a faster DBMS.

You can use a raw disk partition as an option, or use the native file system of your host operating system (Windows NT, Unix, and so on). Although the raw disk partition usually provides the best performance, you'll need special utilities provided by the DBMS vendor to back up your database.

Another trick that DBMSs use to gain performance is running the DBMS server as a kernel-level process, sometimes as Ring 0. A few Unix-based database servers do this, but the best example is DBMSs running on NetWare file servers as NetWare Loadable Modules (NLMs). Because NetWare does not have Ring 0 protection (although you can turn it on), the DBMS can run in Ring 0. While this does increase the DBMS's performance, one false move by the server software could crash the server. Most NLM-based DBMSs (including Oracle and Sybase) run well and fast on NetWare. NLM-based DBMSs are usually a better fit for organizations that already have a relatively idle NetWare server.

Managing Client Connections

The ability of a DBMS server to scale is directly related to how well the DBMS can handle client connections. A DBMS server that requires a high memory and processor capacity for each connection can only handle as many users as there are resources available. Sooner or later you hit a wall, and the operating system breaks (usually by running out of physical memory and crashing). For example, a DBMS server that uses 5 percent of a server's capacity for each connection can only handle approximately 20 users. Of course, this ratio depends largely on what the clients are actually doing.

DBMS servers handle client connections in three basic ways: processes-per-client, threading, and/or a combination of approaches. The process-per-client approach gives each client connection its own process. Thus, 20 clients would require 20 separate processes. The advantage of this approach is that the connections that operate in their own process address space are protected from other ill-behaved processes. The process-per-client approach also makes it easy for multiprocessing operating systems to allocate the connection processes to one or more processors to spread the load. The disadvantage of the process-per-client approach is that it eats up your resources, because each client requires a heavyweight process (sometimes two or three). What's more, using this approach, there is context switching and interprocess communication overhead as well. DBMSs that use this approach include DB2/2, Oracle version 6, and Informix.

The threading approach runs all user connections and the database server in the same address space as threads. Threads are lightweight processes that run faster and require fewer resources than true processes, because there is less context switching. In addition, this approach provides its own internal scheduler, with no dependence on the operating system's process-protection mechanisms. This approach is also more portable from platform to platform, because there are usually few dependencies on native operating system services. Unfortunately, a single ill-behaved process or thread can crash the DBMS. Also, threading does not dole out the operating system resources as evenly as we would like. A single thread can saturate a processor and make the other threads wait in line. DBMSs that use this approach include Sybase System 11 and Microsoft SQL Server.

The new Sybase System 11 DBMS server, using threading, requires only about 60KB of RAM per connection (50KB for System 10). Other DBMSs that use process-per-client can require as much as 2MB of RAM per client. Sybase System 11 uses a SQL Server process (known as the dataserver) to create a thread for each client connection, for each logical database device, and for an error log.

Some DBMSs use a combination of the features of threading and process-per-client to create a DBMS server architecture that provides the best of all worlds. Oracle7, for instance, uses a multithreaded network listener that makes the initial connections by assigning the client to a component called a dispatcher. The dispatcher in turn places messages from the client on an internal queue. A shared server process takes the message off the queue, executes the request inside the database server engine, and returns the response back to the client again using the internal queue. An advantage to this architecture is that it can maintain a protected processing environment without having to dedicate a process for each user connection. There are, however, some latency issues with the use of the queue.

Parallel Query Processing

The battle cry of scalable DBMSs is parallel processing, or the allocation of the operating system's processing load across several processors. Most hardware and server operating system platforms (including the x86, RISC, and CICS) support multiprocessing. However, they do so in many different ways.

DBMS servers approach multiprocessing by either letting the operating system allocate the database query processing across available processors (shared memory), or allowing the DBMS server to allocate the query processing (shared nothing). If you select the latter, you must invest in a special version of a DBMS built specifically for parallel query processing. These servers break up a database query into smaller subqueries that execute concurrently on different processors. A query manager receives the request and allocates its processing among available database engines. Usually, there is one database engine per processor, and it is functionally equivalent to an independent DBMS server. After the query processing is complete, the query manager reassembles the answer sets into a single set that is returned to the client.

Allocating query processing like this was expensive and experimental just a few years ago. Today there are several DBMS vendors that provide such features (including Oracle's Parallel Query Option, Sybase's Sybase MPP [formerly known as Navigation Server], and Informix's Dynamic Server). Some of these servers still have high per-user price tags, but prices are dropping quickly. Parallel DBMS servers work best in database warehousing systems or other applications in which database response time is critical.

Stored Procedures, Triggers, and Rules

DBMS vendors offer stored procedures, triggers, and rules. These features allow developers to program the database itself, and therefore provide another partition for application processing. Stored procedures and triggers also reduce traffic on the network.

Developers have the option to run portions of a client/server application on the DBMS, and thus balance the processing load in a two-tier client/server application. Client/server tools continue to learn how to exploit the stored procedure capabilities of database servers. For example, Oracle's Developer 2000 client/server development tool lets developers move Oracle's SQL procedures from the client to the server, and back again, using a drag-and-drop mechanism on the client side.

Simply put, stored procedures are just a collection of SQL statements and procedural logic that client/server developers can compile and store on the DBMS as database objects. Client applications execute stored procedures by sending a message to the DBMS on which the stored procedures reside.

Triggers, in contrast, are stored procedures activated automatically by events, such as insert, delete and update, that the database server runs across in the normal course of database processing.

Rules are a special type of trigger that can verify the data before inserting it in the database. Rules generally assure that the data adheres to certain business rules before existing in the database. For instance, a tax rate is never a zero.

Stored procedures, triggers, and rules provide client/server developers with a lot of power, but they are proprietary in nature. What's more, the capabilities vary greatly between server vendors. SQL Server, for example, can only fire a single trigger for an insert operation, while other DBMSs (such as Computer Associates' CA-OpenIngres) can fire multiple triggers. Moreover, stored procedures and triggers are not portable from DBMS to DBMS. Interbase actually has the best implementation of modular, optionally ordered, pre- and post-operation triggers.

Despite a DBMS's ability to support advanced features such as stored procedures, triggers, and rules, any DBMS should support referential integrity and two-phase commit. Referential integrity refers to a DBMS's ability to ensure that all foreign keys match their primary keys. Two-phase commit ensures that server crashes won't result in a loss of data, because storing data is a redundant and recoverable process. Thus, a DBMS should support all of the relational integrity constraints (declaratively), namely primary key, domain, column, and referential-integrity constraints, without requiring physical structures such as indexes to enforce primary keys.

Other Features

You should also look for database administration and monitoring tools that are bundled with the DBMS. Database administration tools include any software that can back up and restore a database and perform user administration, security administration, and performance tuning.

In addition to database administration tools, a good DBMS should provide tools for database performance monitoring. These tools allow you to watch all aspects of the DBMS, including cache, disk, processors utilization, and so on. A few DBMS vendors (such as Oracle, Informix, and Sybase) bundle a complimentary performance monitoring tool. However, if you need additional capabilities, such as a customizable reporting mechanism, you'll find what you need in the third-party marketplace. Replication is currently popular as a method for synchronizing data across distributed sites if immediate synchronization is not mandatory.

Security

A DBMS needs to provide security to the database, table, column, and sometimes the row level, to assure that only authorized users can view or update data.

Although most DBMSs come with basic database and table-level security, they aren't hack-proof by default. There are, however, secure versions of DBMSs that make the DBMS comply with government security standards such as C2 and B1. This means that they will keep a close watch on user activity, age passwords, detect intruders, and perform other "database police" services. Informix-OnLine Secure 7.x provides such features, and there are secure versions of Oracle and Sybase as well. Although enhanced security is usually a good idea, there is a tradeoff: These DBMSs cost more and don't perform as well because of the overhead of the additional security processing.

Selecting a DBMS

Developers and architects should consider the following step-by-step procedures when selecting a DBMS. First, understand your requirements, including the business problem you're trying to solve. List the features that are important to you and rank them by priority. Consider your clients, what applications they will run that connect into the database, and what type of middleware layer is in use. Using this information, create a set of at least three DBMS product candidates. Using your DBMS candidates, select operating systems and hardware platforms that best meet your requirements. Make sure you consider administrative costs, along with hardware and software costs, on a per-user basis. Create a pilot test bed using the DBMS candidates. Test each product's ability to meet the requirements of the application. And finally, may the best server win.