Use join filters and post-join filters

The database server provides support for using a subset of the ANSI join syntax.

This syntax that includes the following keywords:
  • ON keyword to specify the join condition and any optional join filters
  • LEFT OUTER JOIN keywords to specify which table is the dominant table (also referred to as outer table)

For more information about this ANSI join syntax, see the HCL OneDB™ Guide to SQL: Syntax.

In an ANSI outer join, the database server takes the following actions to process the filters:
  • Applies the join condition in the ON clause to determine which rows of the subordinate table (also referred to as inner table) to join to the outer table
  • Applies optional join filters in the ON clause before and during the join

    If you specify a join filter on a base inner table in the ON clause, the database server can apply it prior to the join, during the scan of the data from the inner table. Filters on a base subordinate table in the ON clause can provide the following additional performance benefits:

    • Fewer rows to scan from the inner table prior to the join
    • Use of index to retrieve rows from the inner table prior to the join
    • Fewer rows to join
    • Fewer rows to evaluate for filters in the WHERE clause

    For information about what occurs when you specify a join filter on an outer table in the ON clause, see the HCL OneDB Guide to SQL: Syntax.

  • Applies filters in the WHERE clause after the join

    Filters in the WHERE clause can reduce the number of rows that the database server needs to scan and reduce the number of rows returned to the user.

    The term post-join filters refers to these WHERE clause filters.

When distributed queries that use ANSI-compliant LEFT OUTER syntax for specifying joined tables and nested loop joins are executed, the query is sent to each participating database server for operations on local tables of those servers.

For example, the demonstration database has the customer table and the cust_calls table, which tracks customer calls to the service department. Suppose a certain call code had many occurrences in the past, and you want to see if calls of this kind have decreased. To see if customers no longer have this call code, use an outer join to list all customers.

SET EXPLAIN ON output for an ANSI join shows a sample SQL statement to accomplish this ANSI join query and the SET EXPLAIN ON output for it.
Figure 1: SET EXPLAIN ON output for an ANSI join
QUERY:
------
SELECT c.customer_num, c.lname, c.company,
c.phone, u.call_dtime, u.call_code, u.call_descr
FROM customer c
LEFT JOIN cust_calls u ON c.customer_num = u.customer_num
ORDER BY u.call_dtime

Estimated Cost: 14
Estimated # of Rows Returned: 29
Temporary Files Required For: Order By

1) virginia.c: SEQUENTIAL SCAN

2) virginia.u: INDEX PATH

   (1) Index Keys: customer_num call_dtime (Serial, fragments: ALL)
     Lower Index Filter: virginia.c.customer_num = virginia.u.customer_num

  ON-Filters:virginia.c.customer_num = virginia.u.customer_num
  NESTED LOOP JOIN(LEFT OUTER JOIN)
Look at the following lines in the SET EXPLAIN ON output in SET EXPLAIN ON output for an ANSI join :
  • The ON-Filters: line lists the join condition that was specified in the ON clause.
  • The last line of the SET EXPLAIN ON output shows all three keywords (LEFT OUTER JOIN) for the ANSI join even though this query specifies only the LEFT JOIN keywords in the FROM clause. The OUTER keyword is optional.
SET EXPLAIN ON output for a join filter in an ANSI join shows the SET EXPLAIN ON output for an ANSI join with a join filter that checks for calls with the I call_code.
Figure 2: SET EXPLAIN ON output for a join filter in an ANSI join
QUERY:
------
SELECT c.customer_num, c.lname, c.company,
c.phone, u.call_dtime, u.call_code, u.call_descr
FROM customer c LEFT JOIN cust_calls u
ON c.customer_num = u.customer_num
AND u.call_code = 'I'
ORDER BY u.call_dtime

Estimated Cost: 13
Estimated # of Rows Returned: 25
Temporary Files Required For: Order By

  1) virginia.c: SEQUENTIAL SCAN

  2) virginia.u: INDEX PATH

     Filters: virginia.u.call_code = 'I'

      (1) Index Keys: customer_num call_dtime   (Serial, fragments: ALL)
          Lower Index Filter: virginia.c.customer_num = virginia.u.customer_num

ON-Filters:(virginia.c.customer_num = virginia.u.customer_num
               AND virginia.u.call_code = 'I' )
NESTED LOOP JOIN(LEFT OUTER JOIN)
The main differences between the output in SET EXPLAIN ON output for an ANSI join and SET EXPLAIN ON output for a join filter in an ANSI join are as follows:
  • The optimizer chooses a different index to scan the inner table.

    This new index exploits more filters and retrieves a smaller number of rows. Consequently, the join operates on fewer rows.

  • The ON clause join filter contains an additional filter.

The value in the Estimated # of Rows Returned line is only an estimate and does not always reflect the actual number of rows returned. The sample query in SET EXPLAIN ON output for a join filter in an ANSI join returns fewer rows than the query in SET EXPLAIN ON output for an ANSI join because of the additional filter.

SET EXPLAIN ON output for the WHERE clause filter in an ANSI join shows the SET EXPLAIN ON output for an ANSI join query that has a filter in the WHERE clause.
Figure 3: SET EXPLAIN ON output for the WHERE clause filter in an ANSI join
QUERY:
------
SELECT c.customer_num, c.lname, c.company,
   c.phone, u.call_dtime, u.call_code, u.call_descr
FROM customer c LEFT JOIN cust_calls u
ON c.customer_num = u.customer_num
   AND u.call_code = 'I'
WHERE c.zipcode = "94040"
ORDER BY u.call_dtime

Estimated Cost: 3
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By

  1) virginia.c: INDEX PATH

    (1) Index Keys: zipcode   (Serial, fragments: ALL)
        Lower Index Filter: virginia.c.zipcode = '94040'

  2) virginia.u: INDEX PATH

     Filters: virginia.u.call_code = 'I'

    (1) Index Keys: customer_num call_dtime   (Serial, fragments: ALL)
        Lower Index Filter: virginia.c.customer_num = virginia.u.customer_num

ON-Filters:(virginia.c.customer_num = virginia.u.customer_num
      AND virginia.u.call_code = 'I' )
NESTED LOOP JOIN(LEFT OUTER JOIN)

PostJoin-Filters:virginia.c.zipcode = '94040'
The main differences between the output in SET EXPLAIN ON output for a join filter in an ANSI join and SET EXPLAIN ON output for the WHERE clause filter in an ANSI join are as follows:
  • The index on the zipcode column in the post-join filter is chosen for the dominant table.
  • The PostJoin-Filters line shows the filter in the WHERE clause.