Query plans that include an index self-join path

An index self-join is a type of index scan that you can think of as a union of many small index scans, each one with a single unique combination of lead-key columns and filters on non-lead-key columns.

The union of small index scans results in an access path that uses only subsets of the full range of a composite index. The table is logically joined to itself, and the more selective non-leading index keys are applied as index-bound filters to each unique combination of the leading key values.

An index self-join is beneficial for situations in which:

The lead key of an index has many duplicates, and
Predicates on the lead key are not selective, but predicates on the non-leading index keys are selective.

The query in SET EXPLAIN output for a query with an index self-join path shows the SET EXPLAIN output for a query plan that includes an index self-join path.

Figure 1: SET EXPLAIN output for a query with an index self-join path

QUERY:
------
SELECT a.c1,a.c2,a.c3 FROM tab1 a WHERE (a.c3 >= 100103) AND 
     (a.c3 <= 100104) AND (a.c1 >= 'PICKED      ') AND 
     (a.c1 <= 'RGA2        ') AND (a.c2 >= 1) AND (a.c2 <= 7) 
     ORDER BY 1, 2, 3

Estimated Cost: 155
Estimated # of Rows Returned: 1 
  1) informix.a: INDEX PATH 
    (1) Index Keys: c1 c2 c3 c4 c5   (Key-Only)  (Serial, fragments: ALL)
        Index Self Join Keys (c1 c2 )
          Lower bound: informix.a.c1 >= 'PICKED      ' AND (informix.a.c2 >= 1 ) 
          Upper bound: informix.a.c1 <= 'RGA2        ' AND (informix.a.c2 <= 7 ) 
        Lower Index Filter: (informix.a.c1 = informix.a.c1 AND 
          informix.a.c2 = informix.a.c2 ) AND informix.a.c3 >= 100103 
        Upper Index Filter: informix.a.c3 <= 100104 
        Index Key Filters:  (informix.a.c2 <= 7 ) AND
                            (informix.a.c2 >= 1 )

In SET EXPLAIN output for a query with an index self-join path, an index exists on columns c1, c2, c3, c4, and c5. The optimizer chooses c1 and c2 as lead keys, which implies that columns c1 and c2 have many duplicates. Column c3 has few duplicates and thus the predicates on column c3 (c3 >= 100103 and c3 <= 100104) have good selectivity.

As SET EXPLAIN output for a query with an index self-join path shows, an index self-join path is a self-join of two index scans using the same index. The first index scan retrieves each unique value for lead key columns, which are c1 and c2. The unique value of c1 and c2 is then used to probe the second index scan, which also uses predicates on column c3. Because predicates on column c3 have good selectivity:

The index scan on the inner side of the nested-loop join is very efficient, retrieving only the few rows that satisfy the c3 predicates.
The index scan does not retrieve extra rows.

Thus, for each unique value of c1 and c2, an efficient index scan on c1, c2 and c3 occurs.

The following lines in the example indicate that the optimizer has chosen an index self join path for this table, with columns c1 and c2 as the lead keys for the index self-join path:

Index Self Join Keys (c1 c2 )
          Lower bound: informix.a.c1 >= 'PICKED      ' AND (informix.a.c2 >= 1 ) 
          Upper bound: informix.a.c1 <= 'RGA2        ' AND (informix.a.c2 <= 7 )

The example shows the bounds for columns c1 and c2, which you can conceive of as the bounds for the index scan to retrieve the qualified leading keys of the index.

The following information in the example shows the self-join:

(informix.a.c1 = informix.a.c1 AND informix.a.c2 = informix.a.c2 )

This information represents the inner index scan. For lead key columns c1 and c2 the self- join predicate is used, indicating the value of c1 and c2 comes from the outer index scan. The predicates on column c3 serve as an index filter that makes the inner index scan efficient.

Regular index scans do not use filters on column c3 to position the index scan, because the lead key columns c1 and c2 do not have equality predicates.

The INDEX_SJ directive forces an index self-join path using the specified index, or choosing the least costly index in a list of indexes, even if data distribution statistics are not available for the leading index key columns. The AVOID_INDEX_SJ directive prevents a self-join path for the specified index or indexes. Also see Access-method directives and the HCL OneDB™ Guide to SQL: Syntax.