User-defined statistics in a query

For SQL statements that use user-defined data types, the optimizer can call custom selectivity and cost functions. Selectivity and cost functions might need to use statistics about the nature of the data in a column. When you create the statcollect() function that collects statistics for a UDT, the database server executes this function automatically when a user runs the UPDATE STATISTICS statement with the MEDIUM or HIGH keyword.

The statistics that the database server collects might require a smart large object for storage. The configuration parameter SBSSPACENAME specifies an sbspace for storing this information. If SBSSPACENAME is not set, the database server might not be able to collect the specified statistics.

The query optimizer can use data distributions when it assesses the selectivity of a query filter. The selectivity is the number of rows that the filter will return. For queries that involve columns with built-in data types, the database server uses data distributions to automatically determine selectivity for the following kinds of filters:
  • Relational-operator functions (lessthan(), ...)
  • Boolean built-in operator functions: like(), matches()
Important: The query optimizer can only use data distributions if the UPDATE STATISTICS statement has collected these distributions in the sysdistrib system catalog table.
However, if the query involves columns with user-defined data types, you must provide the following information for the query optimizer to be able to determine the filter selectivity:
  1. Write a user-defined function to implement the appropriate operator function.

    For user-defined types, these built-in operator functions do not automatically exist. You must write versions of these functions that handle your user-defined type.

  2. Write a selectivity function for the operator function to provide the optimizer with a selectivity value.
    Selectivity and cost functions might need to use statistics about the nature of the data in a column. If you want these selectivity functions to use data distributions, take the following actions:
    • Provide user-defined statistics so that the UPDATE STATISTICS statement saves the data distributions in the sysdistrib system catalog table.
    • Access the sysdistrib table from within the selectivity function to obtain the data distributions for the column.

    For more information about how to write and register selectivity functions, see Write selectivity and cost functions.