Fragmentation: Storage distribution strategies

The performance of data warehousing applications can typically benefit from distributed storage allocation designs for partitioning a database table into two or more fragments. Each fragment has the same schema as the table, and stores a subset of the rows in the table (rather than a subset of its columns).

The fragments of a table can be stored in dbspaces on different devices, or in dbspaces on the same physical storage device. The fragments can also be stored in named partitions within a single dbspace.

A database can include both fragmented and nonfragmented tables. Index storage can also be fragmented, either in the same storage spaces as their table (called attached indexes) or in a different storage distribution scheme (detached indexes).

Potential performance and security advantages of distributed storage include these:

For frequently-accessed tables, fragmentation can reduce the overhead of I/O contention for data that resides on a single storage device.
The GRANT FRAGMENT and REVOKE FRAGMENT statements of SQL can specify the access privileges that users, roles, or the PUBLIC group hold on specified fragments of the table. With appropriate fragmentation strategies, these statements can selectively restrict user access to subsets of the records in a table.
For databases that enables parallel-database queries (PDQ), multiple scan threads require less time to scan the fragments than to scan the same rows in a nonfragmented table.
Input operations that distribute new rows across multiple fragments run more quickly (using multiple INSERT threads) that if a single table extent stores the same rows.
For fragmentation strategies where the storage allocation of rows is correlated with data values, query execution plans can ignore fragments that are logically excluded by predicates in the query. Defining fragments to improve selectivity is called fragment elimination.
In cluster environments, fragmentation can reduce the time required for recovery from hardware failure, because restoring only a subset of the fragments imposes a smaller data load than restoring the entire table.
For tables fragmented by interval, the database server create new fragments automatically, simplifying management of the data.

Note: Do not confuse table fragmentation strategies, which can improve the efficiency and throughput of database operations, with the various pejorative meanings of fragmentation in reference to file systems that waste storage space or increase retrieval time through inefficient storage algorithms, or through insufficient use of defragmentation tools to store files in contiguous disk partitions.

HCL OneDB™ fragmentation options

HCL OneDB supports the following storage fragmentation strategies that can be applied to database tables:

By Round-robin: A specified number of fragments is defined for the table. Inserted rows are automatically distributed for storage in these fragments, without regard to data values in the row, in order to balance the number of rows in each fragment. Such fragments are called round-robin fragments.
By Expression: Each fragment is defined by a Boolean expression that can be evaluated for one or more columns of the table. Inserted rows are stored in a fragments for which the expression that defines the fragment is true for the data in that row. Rows that match the expression for more than one fragment are stored in the first matching fragment within the ordered list of fragments that the system catalog maintains for the table. Such fragments are called expression fragments.
By List: Each fragment is defined by a list of one or more constant values that correspond to one or more columns in the table. No two fragments can share the same value in their lists. These values must be categories on a nominal scale that has no quantified order within the set of categories. Inserted rows are stored in the fragment that matches the data value of one or more columns. Such fragments are called list fragments.
By Interval: At least one fragment must be defined for values less than a numeric, DATE, or DATETIME column in the table. An interval size, specifying the range of fragment key values assigned to a single fragment, must also be defined. You can optionally specify a list of dbspaces to store interval fragments. The fragments created by the user when the fragmentation strategy is defined are called range fragments. The database server automatically creates new fragments of the same interval size to store rows whose fragment key values are outside the range of the user-defined range fragments. Fragments created by the database server are called interval fragments.

Each user-defined permanent or temporary database table can either be nonfragmented or else can have exactly one fragmentation scheme. You cannot, for example, define a table in which some fragments use a round-robin strategy, and other fragments use a list or interval strategy.

You can use the ALTER FRAGMENT statement of SQL, however, to modify the fragmentation scheme of a table in various ways, including these:

to change the fragmentation strategy of a fragmented table,
to define a fragmentation strategy for a nonfragmented table,
to change a fragmented table to a nonfragmented table,
to add another fragment to an existing fragmented table,
to combine two tables that have identical structures into a single fragmented table,
to drop one or more dbspaces from the list of dbspaces that store interval fragments.
to detach one fragment from a fragmented table and store the rows in a new nonfragmented table.

For more information about the ALTER FRAGMENT statement and some of the tasks that it can accomplish in data warehousing operations, see the Change the storage distribution strategy.

Storage fragmentation terms

The following terms are useful for understanding and using the various strategies available for the distributed storage of table and index fragments.

Fragment key

The column or a set of columns on which the table or index is fragmented. Depending on the chosen fragmentation strategy, the fragment key can be a column, or a single column expression, or a multi-column expression. For a row inserted into a table for which a fragment key is defined, the value of the column (or the set for values in the fragment key columns) determines which fragment stores the row. A synonym for fragment key is partitioning key. Tables partitioned by round-robin have no fragment key.

Fragment list

An ordered list of the fragments that the database server maintains for every fragmented table or index. By default, the ordinal positions of each fragment on this list reflects the sequence in which the fragments were created. The system catalog stores this integer value in the sysfragments.evalpos column of the row that describes the fragment. Queries that do not use fragment elimination read the fragments in ascending order of their evalpos values. The database server automatically updates evalpos values to reflect changes to the fragment list. Updates to the list are required, for example, when the database server creates an interval fragment, or when the ALTER FRAGMENT statement of SQL adds new fragments, or drops or modifies existing fragments.

Fragment expression

An expression that defines a specific fragment. For example, if the fragment key is colA of data type SMALLINT, a fragment could be defined by the expression

colA <=8 OR colA IN
(9,10,21,22,23)

in an expression based fragmentation strategy.

Expression-based fragments are defined by a Boolean expression.
List-based fragments are defined by one or more constant expressions.
Range fragments (in interval fragmentation) are defined by a range expression. The only valid operator in the range expression is the less-than ( < ) operator. (For example, VALUES < 100).
System-defined interval fragments (in interval fragmentation) are defined by a system-generated expression that includes the greater-than-or-equal >= relational operator, the AND Boolean operator, and the less-than ( < ) relational operator. (For example, VALUES >= 100 AND VALUES < 300 specifies an interval that includes fragmentation key values ranging from 100 to the (non-inclusive) upper limit of 300. )

Tables partitioned by round-robin have no fragment expressions.

NULL fragment

A fragment that stores NULL values (either because its range fragment or list fragment expression is IS NULL, or because a list-based or expression-based fragment is defined with NULL as its fragment expression). For all fragmentation strategies except round-robin, the database server returns an exception if you insert a row whose fragment key value is missing, but no NULL fragment is defined (and for list or expression strategies, no REMAINDER fragment is defined). You do not need to define a NULL fragment if the fragment key column enforces a NOT NULL constraint.

REMAINDER fragment

A fragment that stores any row whose fragment key value does not match the fragment expression of any fragment. If you attempt to insert a row that does not match any fragment key value for a table or index that is fragmented by expression or by list, and no REMAINDER fragment is defined, the database server issues an exception. You cannot define a REMAINDER fragment for tables fragmented by a round-robin or interval strategy.

Transition fragment

In an interval fragmentation scheme, the range fragment whose upper limit in its VALUES clause is larger than the upper limit for any other range fragment. If no interval fragments have been created for the table, inserting a row whose fragment-key value exceeds that upper limit requires the database server to create a new interval fragment. The upper limit of the transition fragment VALUES clause is called the transition value for the table.

The MODIFY INTERVAL TRANSITION option to the ALTER FRAGMENT statement can increase the transition value for a table. This can result in a different fragment becoming the new transition fragment. This and other ALTER FRAGMENT operations can cause changes to column values in the sysfragments system catalog table for the transition fragment, including these:

its position relative to other fragments (the evalpos column),
its fragment expression (the exprtext and exprbin columns),
and its name (the partition column).