Update statistics

The operator class that is specified when you create an R-tree index defines the strategy functions that tell the query optimizer when to consider using an R-tree index when the strategy function appears in the WHERE clause of a query.

The query optimizer, however, might decide not to use an R-tree index when it calculates how to execute a query, even if a strategy function is specified in the WHERE clause. The query optimizer uses available statistics to calculate the cost of using or not using the index. If not using an R-tree index is less costly than using it, the query optimizer might decide to execute a table scan instead of an index scan.

Use the SQL statement UPDATE STATISTICS to ensure that the statistics on an R-tree indexed column are always correct and up to date. Incorrect statistics can cause a query to execute more slowly than if there are no statistics on the indexed column at all.

You should run UPDATE STATISTICS whenever you make extensive modifications to a table or whenever the distribution of the data in the indexed column changes significantly.
Important: Be sure to always run UPDATE STATISTICS after you load data into a table that has an R-tree index. Without the new statistics, the query optimizer might think the table is small and never consider using the R-tree index.
The following example shows how to update the statistics of the boxes column of the box_tab table:
UPDATE STATISTICS FOR TABLE box_tab (boxes);

When you run UPDATE STATISTICS on a column of user-defined type, the HCL Informix® server calls the statcollect() user-defined routine (if present) to gather statistics. See the Informix User-Defined Routines and Data Types Developer's Guide and the Informix DataBlade® API Programmer's Guide for more information about the statcollect() routine.

When you run UPDATE STATISTICS on a column with an R-tree index, the DataBlade module that implements the user-defined type determines how statistics are gathered to assess the cost of using the R-tree index.

If the DataBlade module provides functions to evaluate selectivity and per-row cost, the following formula is used to calculate the cost of using an R-tree index:
Cost = filtering cost + refinement cost + data-access cost
Where:
  • filtering cost = selectivity * (number of rows in table/average number of rows per page)
  • refinement cost = selectivity * number of rows * per-row cost
  • data-access cost = selectivity * number of data pages

This approach assumes that IO cost is significantly greater than the cost of evaluating the filters. See Selectivity and cost functions for information about adding selectivity and per-row cost functions.

If the DataBlade module does not provide functions to evaluate selectivity and per-row cost, the cost is set at 50. The documentation for the DataBlade module should state which method is used.

The following statistics are generated when the UPDATE STATISTICS command is executed on a column that has an R-tree index:
  • The number of levels in the R-tree index
  • An estimated number of entries in a branch page
  • An estimated number of entries in a leaf page
  • An estimated number of leaf pages
  • The number of unique values in the index
  • The number of clusters in the index

For more detailed information about the UPDATE STATISTICS statement, refer to the Informix Guide to SQL: Syntax.