Resolution Clause

Use the Resolution clause in MEDIUM or HIGH mode to adjust the size of the distribution bins, and to avoid calculating data on indexes.

In MEDIUM mode only, you can also use the Resolution clause to specify a lower limit to the sampling size and to adjust the confidence level.
Resolution Clause

1 %Resolution Clause for MEDIUM Mode
1 %Resolution Clause for HIGH Mode
Resolution Clause for MEDIUM Mode

1?  SAMPLING SIZE  min ?   RESOLUTION percent?  confidence?  DISTRIBUTIONS ONLY
Resolution Clause for HIGH Mode

Element Description Restrictions Syntax
confidence Estimated likelihood that sampling in MEDIUM mode produces results equivalent to the exact HIGH mode. Default level is 0.95. Must be within the range from 0.80 (minimum) to 0.99 (maximum) Literal Number
percent Average percentage of the sample in each distribution bin. Default is 2.5 for MEDIUM and 0.5 for HIGH. Minimum value is 1/nrows, for nrows the number of rows in the table Literal Number
min The minimum integer number of randomly selected rows on which to generate the data distributions Must be greater than zero but cannot exceed nrows Literal Number

A distribution is a mapping of the data in a column into a set of column values, ordered by magnitude or by collation. The range of these sample values is partitioned into disjunct intervals, called bins, each containing an approximately equal portion of the sample of column values. For example, if one bin holds 2 percent of the data, approximately 50 such intervals hold the entire sample.

Some statistical texts call these bins equivalence categories. Each contains a disjunct subset of the range of the data values that are sampled from the column.

If you include the RESOLUTION keyword, it must be followed by a literal number, specifying the percent of values in each bin. In MEDIUM mode, it can be followed by either one or two literal numbers, with the optional second number specifying the confidence level, as in this example:

This specifies 4% of the data per bin, implying approximately 25 bins, and a confidence level of 90%, and no examination of index data. If the 0.90 value were omitted, then the default level of confidence would have been in effect. If the RESOLUTION keyword and both numeric values were omitted, then default values for percent (2.5%) and for confidence (0.95) would be used.

The query optimizer estimates the selectivity of a WHERE clause by examining, for each column included in the WHERE clause, the proportional occurrence of the data values contained in the column.

You cannot create distributions for BYTE or TEXT columns. If you include a BYTE or TEXT column in an UPDATE STATISTICS statement that specifies MEDIUM or HIGH distributions, no distributions are created for those columns. Distributions are constructed for other columns in the list, however, and the statement does not return an error.

Columns of the VARCHAR data type do not use overflow bins, even when multiple bins are being used for duplicate values.

You can use the first two parameters of the DBUPSPACE environment variable to constrain the disk space and memory resources that the UPDATE STATISTICS statement can use to sort data when it constructs column distributions. These settings affect performance, because they determine how many times the database server scans the specified table to construct each distribution. (A third DBUPSPACE parameter can control whether UPDATE STATISTICS sorts with indexes when calculating column distributions, and whether the explain output file stores the plan by which the column distributions are calculated.)