Clustering

Clustering is a method for arranging the rows of a table so that their physical order on disk closely corresponds to the sequence of entries in the index.

When you know that a table is ordered by a certain index, you can avoid sorting. You can also be sure that when the table is searched on that column, it is read effectively in sequential order, instead of nonsequentially. These points are covered in Queries and the query optimizer.

Tip: For information about eliminating interleaved extents by altering an index to cluster, see Creating or altering an index to cluster.

In the stores_demo database, the orders table has an index, zip_ix, on the postal-code column. The following statement causes the database server to put the rows of the customer table in descending order by postal code:

ALTER INDEX zip_ix TO CLUSTER

To cluster a table on a nonindexed column, you must create an index. The following statement reorders the orders table by order date:

CREATE CLUSTER INDEX o_date_ix ON orders (order_date ASC)

To reorder a table, the database server must copy the table. In the preceding example, the database server reads all the rows in the table and constructs an index. Then it reads the index entries in sequence. For each entry, it reads the matching row of the table and copies it to a new table. The rows of the new table are in the desired sequence. This new table replaces the old table.

Clustering is not preserved when you alter a table. When you insert new rows, they are stored physically at the end of the table, regardless of their contents. When you update rows and change the value of the clustering column, the rows are written back into their original location in the table.

Clustering can be restored after the order of rows is disturbed by ongoing updates. The following statement reorders the table to restore data rows to the index sequence:

ALTER INDEX o_date_ix TO CLUSTER

Reclustering is usually quicker than the original clustering because reading out the rows of a nearly clustered table is similar in I/O impact to a sequential scan.

Clustering and reclustering take a lot of space and time. To avoid some clustering, build the table in the desired order initially.