Database (DB2) performance considerations

The database is usually one of the potential areas for bottlenecks that makes WebSphere Commerce unable to scale and perform well. It is therefore crucial that the database is tuned for your implementation.

Note: HCL Commerce ships with default DB2 optimization settings, such as optimization levels and optimization profile registries. It is highly recommended that you thoroughly test any changes that are made to the optimization settings in a production-like environment, before you use them in a production system. Changing the optimization settings can affect the overall performance of the application, either immediately or later, such as when the data volume increases or the data distribution changes.

Physical environment considerations

Considerations for the physical environment are related to how the data is spread among the disks and how the memory is managed for the databases.

Layout on disk

Reading from the database and writing back to the database (disk I/O) can become a bottleneck for any application that is accessing a database. Proper database layout can help reduce the potential for this bottleneck. It is a significant effort to change the physical layout of the database once it is created, hence proper planning at the initial stages is important.

The first consideration is to ensure that the DB2 transaction logs reside on their own physical disk. Every update that is issued against the database is written to the logs (in addition to being updated in memory). Hence there is a lot of disk I/O in the location where the DB2 transaction logs reside. It is a good practice to try to ensure that all read/write activity on the disk is related only to the transaction logs, thus eliminating any I/O contention with other processes that might access the disk.

To set the location for the DB2 transaction logs, issue the following command:

db2 update db cfg for dbalias using NEWLOGPATH path

Before the logs are stored in the location that is specified, disconnect all sessions, or deactivate the database by issuing the db2 deactivate command.

The second consideration in terms of disk layout is to determine how to manage the table spaces efficiently. One performance principle in the management of Relational Database Management Systems (RDBMs) is to separate the database table data and database index data onto different physical disks. This separation enables better query performance, since index scans can execute in parallel with data fetch operations because they are on different physical disks.

In DB2, automatic storage table spaces are used by default. That is, the System Managed Storage (SMS) and Database Managed Storage (DMS) table space types are deprecated for user-defined permanent table spaces and might be removed in a future release.

Automatic storage table spaces are the easiest table spaces to set up and maintain, and are recommended for most applications. They are particularly beneficial when:
  • You have larger tables or tables that are likely to grow quickly
  • You do not want to have to make regular decisions about how to manage container growth.
  • You want to be able to store different types of related objects (for example, tables, LOBs, indexes) in different table spaces to enhance performance.

For more information, see Comparison of automatic storage, SMS, and DMS table spaces.

Memory

DB2 associates memory for the database through buffer pool objects. A buffer pool has a page size that is associated with it and is linked to one or more table spaces. Thus, if table spaces of different page sizes are created, then buffer pools corresponding to the different page sizes are required.

While you can create multiple buffer pools having the same page size, it is recommended that only one buffer pool per page size be created, for the most efficient usage of memory on the database server.

The question is always, how much memory to assign to the buffer pools. For DB2 32-bit implementations, there is a limit, based on the operating system, that can be available for buffer pools.

Assuming a dedicated database server, allocate a large proportion of memory available on the server, about 75% to 80%, but not exceeding the platform limits.

Note that for 64-bit implementations of DB2, the limits are increased. In this case, the buffer pool hit ratio would need to be monitored to determine the optimal setting for the buffer pools. You can also monitor the hit ratio for 32-bit implementation using database snapshots using the following command:

db2 get snapshot for database on <dbalias>

The output that is generated contains some statistics on buffer pool logical and physical reads:

Buffer pool data logical reads = DLR
Buffer pool data physical reads = DPR
...
Buffer pool index logical reads = ILR
Buffer pool index physical reads = IPR

In this output, DLR, DPR, ILR, and IPR have actual values. The hit ratio can be computed using the following formula:

(1 - (( DPR + IPR) / (DLR + ILR))) * 100%

The size of the buffer pool can be changed using the ALTER BUFFERPOOL command, or the BUFFPAGE parameter if the size of the buffer pool is set to -1.

Additional tuning and configuration database parameters

There are many parameters to consider for performance. This section describes a subset of these that are considered important for HCL Commerce implementations. To set the values for the parameters, the following command can be used:

db2 update db cfg for <dbalias> using <paramname> <paramvalue>

Parameters related to memory

The database heap (DBHEAP) contains control block information for database objects (tables, indexes, and buffer pools), as well as the pool of memory from which the log buffer size (LOGBUFSZ) and catalog cache size (CATALOGCACHE_SZ) are allocated. Its setting depends on the number of objects in the database and the size of the two parameters mentioned.

In general, the following formula can be used to estimate the size of the database heap:

DBHEAP=LOGBUFSZ + CATALOGCACHE_SZ + (SUM(# PAGES in each bufferpool) * 3%)

The log buffer is allocated from the database heap, and is used to buffer writes to the transaction logs for more efficient I/O. The default size of this setting 128 4K pages. A recommended starting point for the log buffer size (LOGBUFSZ) in HCL Commerce implementations is 256.

Parameters related to transaction logs

When you are considering values for the transaction log file size (LOGFILSIZ) and the number of primary (LOGPRIMARY) and secondary (LOGSECOND) logs, some generalizations for OLTP applications can be applied. A high number of short transactions are typical in OLTP systems, hence the size of the log file should be relatively large, otherwise more processing time is spent managing log files, rather than writing to the transaction logs. A good starting point for the size of the log file in HCL Commerce implementations is to set the value to 10000.

Primary log files are allocated when the database is activated, or on the first connect. If a long running transaction fills up all the primary logs, then secondary logs are allocated as needed until the LOGSECOND limit is reached. The allocation of a secondary log file is a significant performance hit, and should be minimized if it cannot be avoided.

To determine the right settings for these parameters, you need to monitor the database and see whether secondary log files are being allocated. If they are, then you need to increase the number of primary log files. You can monitor by taking a database snapshot and look for the following two lines:

Maximum secondary log space used (Bytes) = 0
Secondary logs allocated currently = 0

A good starting point for the number of primary log files (LOGPRIMARY) is anywhere from 6 - 10.

Parameters related to disk I/O

In addition to physical disk layout, several tuning parameters can be manipulated to affect disk I/O. Two key parameters are NUM_IOSERVERS and NUM_IOCLEANERS.

NUM_IOSERVERS specifies the number of processes that are launched to prefetch data from disk to the buffer pool pages. To maximize read parallelism, this parameter should be set to the number of physical disks that are being used by the database, to enable reading from each disk in parallel.

NUM_IOCLEANERS specifies the number of processes that are launched to flush dirty buffer pool pages to disk. To maximize usage of system resources, this parameter should be set to the number of CPUs on the system.

The frequency of how often dirty buffer pool pages are flushed to disk can be influenced by the CHNGPGS_THRESH parameter. Its value represents the limit, in the form of a percentage, that a buffer pool page can be dirty before a flush to disk is forced. For OLTP applications, a lower value is recommended. For HCL Commerce implementations, the value should be set to 40.

One final parameter to consider is MAXFILOP. It represents the maximum number of files DB2 can have open at any time. If this value is set too low, valuable processor resources are taken up to open and close files. This parameter needs to be monitored to be set to the correct value, but a good starting point is to set this value to 128. You can monitor by taking a database snapshot and looking at the following line:

Database files closed = 0

If the value monitored is greater than zero, then the value for this parameter should be increased.

Parameters related to locking

Reducing locking contention is key to performance. Several parameters exist to influence locking behavior. The total amount of memory available to the database for locks is defined by the LOCKLIST parameter. The MAXLOCKS parameter defines the maximum amount of memory available for each connection to the database. It is represented as a percentage of the LOCKLIST.

The size for both of these parameters need to be adjusted to avoid lock escalations. A lock escalation occurs when all of the memory available to a connection is used, and multiple row locks on a table are exchanged for a single table lock. The amount of memory that is used for the first lock on an object is 72 bytes, and each additional lock on the same object is 36 bytes.

A good starting value for LOCKLIST can be approximated by assuming that a connection requires about 512 locks at any time. The following formula can be used:

LOCKLIST = (512 locks/conn * 72 bytes/lock * # of database connections) / 4096 bytes/page

MAXLOCKS can be set to 10 - 20 to start. Further monitoring is necessary to adjust both of these values. In the database snapshot output, look for the following lines:

Lock list memory in use (Bytes) = 432 Lock escalations = 0 Exclusive lock escalations = 0

If lock escalations occur (value higher than 0), increase the locklist to minimize the escalations and increase the MAXLOCKS value to increase the limit of how much of the LOCKLIST a connection can use.

Best practices

Here are some of the most common best practices for any IBM DB2 UDB implementation.

Reorganizing data in table spaces

When a high number of inserts, updates, or deletes operations are performed against a table in the database, the physical placement of the rows and related indexes might not be optimal. DB2 provides a utility to reorganize data for a table:

db2 REORG TABLE <tabschema>.<tabname>;

DB2 also provides a utility to check whether a table or index data needs to be organized. While connected to a database, the following command can be issued:

db2 REORGCHK

This command checks all tables in the database and produce a listing, first by table and second by index. In the listing, an asterisk('*') in any of the last three columns implies that the table or index requires a REORG.

Collecting statistics

Each SQL statement that is submitted to the database is parsed, optimized, and a statement access plan is created for execution. To create this access plan, the optimizer relies on table and index statistics. In order for the optimizer to generate the best access plan, up-to-date statistics are required. Collecting statistics frequently (or at least when a significant amount of data changes) is a good practice.

To collect statistics for a table, the following command can be issued:

db2 RUNSTATS ON table <tabschema>.<tabname> WITH DISTRIBUTION AND DETAILED INDEXES ALL;

Statistics on the catalog tables should also be collected.