Statistics held for the table and index

The accuracy with which the query optimizer can assess the execution cost of a query plan depends on the information available to the optimizer. Use the UPDATE STATISTICS statement to maintain simple statistics about a table and its associated indexes. Updated statistics provide the query optimizer with information that can minimize the amount of time required to perform queries on that table.

The database server starts a statistical profile of a table when the table is created, and the profile is refreshed when you issue the UPDATE STATISTICS statement. The query optimizer does not recalculate the profile for tables automatically. In some cases, gathering the statistics might take longer than executing the query.

To ensure that the optimizer selects a query plan that best reflects the current state of your tables, run UPDATE STATISTICS at regular intervals. For guidelines, see Update statistics when they are not generated automatically.

The optimizer uses the following system catalog information as it creates a query plan:
  • The number of rows in a table, as of the most recent UPDATE STATISTICS statement
  • Whether a column is constrained to be unique
  • The distribution of column values, when requested with the MEDIUM or HIGH keyword in the UPDATE STATISTICS statement

    For more information about data distributions, see Creating data distributions.

  • The number of disk pages that contain row data
The optimizer also uses the following system catalog information about indexes:
  • The indexes that exist on a table, including the columns that they index, whether they are ascending or descending, and whether they are clustered
  • The depth of the index structure (a measure of the amount of work that is needed to perform an index lookup)
  • The number of disk pages that index entries occupy
  • The number of unique entries in an index, which can be used to estimate the number of rows that an equality filter returns
  • Second-largest and second-smallest key values in an indexed column

Only the second-largest and second-smallest key values are noted, because the extreme values might have a special meaning that is not related to the rest of the data in the column. The database server assumes that key values are distributed evenly between the second largest and second smallest. Only the initial 4 bytes of these keys are stored. If you create a distribution for a column associated with an index, the optimizer uses that distribution when it estimates the number of rows that match a query.

For more information about system catalog tables, see the HCL OneDB™ Guide to SQL: Reference.