Estimating temporary space for dbspaces and hash joins

You can estimate and increase the amount of temporary space for dbspaces and for hash joins. If you do this, you can prevent the possible overflow of memory to temporary space on disk.

You can use the following guidelines to estimate the amount of temporary space to allocate:
  • For OLTP applications, allocate temporary dbspaces that equal at least 10 percent of the table.
  • For DSS applications, allocate temporary dbspaces that equal at least 50 percent of the table.

A hash join, which works by building a table (the hash table) from the rows in one of the tables in a join, and then probing it with rows from the other table, can use a significant amount of memory and can potentially overflow to temporary space on disk. The hash table size is governed by the size of the table used to build the hash table (which is often the smaller of the two tables in the join), after applying any filters, which can reduce the number of rows and possibly reduce the number of columns.

Hash-join partitions are organized into pages. Each page has a header. The header and tuples are larger in databases on 64-bit platforms than in builds on 32-bit platforms. The size of each page is the base page size (2K or 4K depending on system) unless a single row needs more space. If you need more space, you can add bytes to the length of your rows.

You can use the following formula to estimate the amount of memory that is required for the hash table in a hash join:
hash_table_size = (32 bytes + row_size_smalltab) * num_rows_smalltab
where row_size_smalltab and num_rows_smalltab refer to the row size and the number of rows, respectively, in the smaller of the two tables participating in the hash join.

For example, suppose you have a page head that is 80 bytes in length and a row header that is 48 bytes in length. Because each row must be aligned to 8 bytes, you might need to add up to 7 bytes to the row length, as shown in these formulas:

per_row_size = 48 bytes + rowsize + mod(rowsize, 8)
page_size = base_page_size (2K or 4K)
rows_per_page = round_down_to_integer((page_size - 80 bytes) / per_row_size)  

If the value of rows_per_page is less than one, increase the page_size value to the smallest multiple of the base_page_size, as shown in this formula:

size = (numrows_smalltab / rows_per_page) * page_size

You can use the DS_NONPDQ_QUERY_MEM configuration parameter to configure sort memory for all queries except PDQ queries. Its setting has no effect, however, if the PDQ priority setting is greater than zero.

For more information, see Hash join and Configuring memory for queries with hash joins, aggregates, and other memory-intensive elements.