Automatic management of data distribution statistics

The HCL OneDB™ database server supports several mechanisms for automating some of the tasks that are involved in gathering, dropping, and refreshing data distribution statistics for tables, indexes, table fragments, and index fragments.

Automatic detection and refreshing of stale statistics during UPDATE STATISTICS operations

You can set the AUTO_STAT_MODE configuration parameter to enable the HCL OneDB database server to automatically detect which table and index statistics are stale, and only refresh the stale statistics when the UPDATE STATISTICS statement is run. The data distribution statistics that are automatically detected and refreshed are calculated at the table, fragment, or index level, not at the individual column level. If you set no value for this parameter, the automatic statistics mode is enabled by default. When automatic mode is enabled, the default threshold that defines stale statistics is reached when at least 10% of the rows in the table or fragment are changed by DML, LOAD, or TRUNCATE operations since the most recent calculation of data distribution statistics.

You can set another configuration parameter, STATCHANGE, to specify a nondefault change threshold for refreshing distribution statistics when automatic statistics mode is enabled. For example, if you set the STATCHANGE value to 15, statistics are refreshed if 15% of the rows in the table or fragment are changed. If the STATCHANGE parameter is not set, the system default value for STATCHANGE is 10.

You can override the STATCHANGE or AUTO_STAT_MODE configuration parameter setting for the current session by using the SET ENVIRONMENT statement to set session environment variables of the same names. The DBA can include SET ENVIRONMENT statements in the sysdbopen routine to enable or disable automatic statistics mode, or to change the stale distribution threshold (or both) at connection time. These settings are applied to UPDATE STATISTICS statements that are issued in the current session.

A table can be created with its own STATCHANGE table attribute, whose value overrides the setting of the STATCHANGE session environment variable or configuration parameter. For fragmented tables whose distribution statistics are calculated for each fragment, the value of its STATCHANGE attribute determines whether statistics are refreshed for individual fragments. The ALTER TABLE statement of SQL can reset the STATCHANGE attribute of a table.

You can also use (or disable for your current operation) the explicit or default AUTO_STAT_MODE and STATCHANGE settings during UPDATE STATISTICS statements that include the AUTO or the FORCE keyword:

AUTO

This keyword puts the UPDATE STATISTICS statement in automatic mode for detecting tables and fragments whose statistics are stale. Distribution statistics are not refreshed for tables or fragments whose STATCHANGE value is below the specified threshold.

FORCE

This keyword refreshes the statistics for all tables and columns within the specified scope. If automatic mode for detecting stale statistics is enabled, the FORCE keyword overrides automatic mode, so that values of the STATCHANGE attributes of tables and fragments are ignored, and statistics are recalculated for all database objects within the scope of the FOR TABLE specification.

The scope of AUTO or FORCE is limited to the UPDATE STATISTICS statement in which the keyword is specified. UPDATE STATISTICS statements that include neither of these keywords use the current AUTO_STAT_MODE setting of the database server (or for their session environment, if that is different). If AUTO_STAT_MODE is enabled, the STATCHANGE value is determined in the following (descending) order of precedence:

  1. The value of the STATCHANGE attribute of the table, if AUTO is not the specified value.
  2. The value that is set by the most recent SET ENVIRONMENT STATCHANGE statement in the same session.
  3. The explicit setting of the STATCHANGE configuration parameter.
  4. The system default STATCHANGE value is 10.

Automatic statistics maintenance in DDL operations

The HCL OneDB database server automatically creates, updates, or drops data distribution statistics during certain operations that create, alter, or destroy database objects.

ALTER FRAGMENT ATTACH operations

If the automatic mode for detecting stale distribution statistics is enabled, and the table that is being attached to has fragmented distribution statistics, the database server calculates the distribution statistics of the new fragment. Stale distribution statistics of existing fragments are also recalculated. This recalculation of fragment statistics runs in the background. After the database server calculates the fragment statistics, it merges them to form table distribution statistics, and stores the results in the system catalog.

Distribution statistics are not recalculated, however, unless explicit or default value of the AUTO_STAT_MODE configuration parameter or the AUTO_STAT_MODE session environment setting enables the automatic mode for detecting stale data distribution statistics.

ALTER FRAGMENT DETACH operations

Some ALTER FRAGMENT DETACH statements to attach a fragment can cause the database server to update the index structure. When an index is rebuilt in those cases, the database server also recalculates the associated column distributions, and these statistics are available to the query optimizer when it designs query plans for the table from which the fragment was detached:

  • For an indexed column (or for a set of columns) on which ALTER FRAGMENT DETACH automatically rebuilds a B-tree index, the recalculated column distribution statistics are equivalent to distributions created by the UPDATE STATISTICS statement in HIGH mode.
  • If the rebuilt index is not a B-tree index, the automatically recalculated statistics correspond to distributions created by the UPDATE STATISTICS statement in LOW mode.
If the automatic mode for detecting stale distribution statistics is enabled, and the table from which the fragment is being detached has fragment-level distribution statistics, the database server takes the following actions:
  • Uses the distribution statistics of the detached fragment to form a new table distribution.
  • Merges the distribution statistics of the remaining fragments to calculate distribution statistics for the surviving table
  • Stores the statistics that result from these operations in the system catalog.

This recalculation of fragment statistics runs in the background.

Distribution statistics are not recalculated, however, unless an explicit or default value of the AUTO_STAT_MODE configuration parameter or the AUTO_STAT_MODE environment setting enables the automatic mode for detecting stale data distribution statistics.

ALTER TABLE ADD CONSTRAINT operations

ALTER TABLE ADD CONSTRAINT statements that use the Single Column Constraint format to implicitly create an index on a non-opaque column also automatically calculate the distribution of the specified column. Similarly, if the Multiple-Column Constraint format specifies a list of columns as the scope of the new constraint, the database server implicitly creates an index on the same non-opaque column or set of columns as the referential constraint, distribution statistics are automatically calculated on the specified column, or on the lead column of a multiple-column constraint.

These distribution statistics are available to the query optimizer when it designs query plans for the table on which the constraint is defined:

  • For columns on which the new constraint is implemented as a B-tree index, the recalculated column distribution statistics are equivalent to distributions created by the UPDATE STATISTICS statement in HIGH mode.
  • If the new constraint is not implemented as a B-tree index, the automatically recalculated statistics correspond to distributions created by the UPDATE STATISTICS statement in LOW mode.

These distribution statistics are available to the query optimizer when it designs query plans for the table on which the new constraint was created.

The automatic calculation of column distribution statistics in ALTER TABLE MODIFY operations that define a constraint on a non-opaque column is not dependent on whether AUTO_STAT_MODE is enabled or disabled.

ALTER TABLE MODIFY operations

ALTER TABLE MODIFY statements that use the Single Column Constraint format or Multiple Column Constraint format to define constraints similarly cause the database server to calculate data distribution statistics for the indexes that are implicitly created to enforce the constraints. These distribution statistics have the same attributes as the statistics that are calculated automatically for an index on a non-opaque column, and that are also automatically calculated during ALTER TABLE ADD CONSTRAINT operations. These statistics are available to the query optimizer when it designs query plans for the table on which the constraints are defined.

The automatic calculation of column distribution statistics in ALTER TABLE MODIFY operations that define a constraint on a non-opaque column is not dependent on whether AUTO_STAT_MODE is enabled or disabled.

CREATE INDEX operations

The database server automatically calculates index statistics, equivalent to the statistics gathered by UPDATE STATISTICS in LOW mode, when you create a B-tree index on a UDT column of an existing table, or if you create a functional index or a virtual index interface (VII) index on a column of an existing table. Statistics that are collected automatically by this feature are stored in the system catalog and are available to the query optimizer, without the need for running the UPDATE STATISTICS statement manually. When B-tree indexes are created, column statistics are collected on the first index column, equivalent to what UPDATE STATISTICS generates in HIGH mode, with a resolution is 1% for tables of fewer than a million rows, and 0.5% for larger tables. (Tables with more than 1 million rows have a better resolution because they have more bins for statistics.)

The automatic calculation of column distribution statistics in CREATE INDEX operations is not dependent on whether AUTO_STAT_MODE is enabled or disabled.

Auto Update Statistics (AUS) maintenance system

This uses a combination of Scheduler sensors, tasks, thresholds, and tables to evaluate and update data distribution statistics. The system provides as built-in input criteria a set of configuration parameter values. The system administrator can modify these to reflect current requirements and workloads. The AUS system combines these criteria with information from the sysmaster database to automatically identify tables whose distributions are becoming stale, and generates the text of UPDATE STATISTICS statements to refresh the distribution statistics for those tables.

The list of generated UPDATE STATISTICS statements is run automatically each week at a designated period of low throughput, to update as many table distributions as can be recalculated during the designated maintenance period. Any UPDATE STATISTICS statements that do not complete are retained on the list for the next maintenance period.