Hierarchical index structure

An R-tree index is arranged as a hierarchy of pages. The topmost level of the hierarchy is the root page. Intermediate levels, when needed, are branch pages. Each branch page contains entries that refer to a subset of pages, or a subtree, in the next level of the index. The bottom level of the index contains a set of leaf pages. Each leaf page contains a list of index entries that refer back to rows in the indexed table. Each index entry also includes a copy of the bounding-box of the indexed key from the table, or data object. The pages of an R-tree index do not usually contain the maximum possible number of index entries.

An R-tree index is height-balanced, which means that all paths down the tree, from the root page to any leaf page, traverse the same number of levels. This also means that all leaf nodes are at the same level.

Each page in an R-tree index structure is a physical disk page. The R-tree index is designed to minimize the number of pages that need to be fetched from disk during the execution of a query, since disk I/O is often the most costly part.

The upper section of R-tree index structure shows how the data objects and the bounding boxes (described in Bounding boxes) stored in an R-tree index structure are related. The root page contains entries for bounding boxes R1 and R2. Together, these two bounding boxes enclose all the objects in the index.

Tip: Use the rtreeRootBB() function to return coordinates of the bounding box that enclose all objects in an R-tree index. For detailed instructions on how to use this function, refer to Manage databases that use the R-tree secondary access method.

The bounding boxes of an index page can overlap. However, a data object appears only once in the index even if it falls inside more than one bounding box at the branch levels. For example, data object D2 appears only once in the index that R-tree index structure shows, even though it falls inside bounding boxes R9, R3, R4, and R1.

The reason data objects appear only once in an R-tree index is to keep the index small. If each object had to be replicated in several index pages, the size of the R-tree index would be larger than it needs to be.

An index entry in a leaf page consists of:

A copy of the key, or data object, from the table
A pointer back to the row in the indexed table (also known as a row ID)

The size of an index entry in a leaf page is the size of the data object plus 20 bytes.

An index entry in a root or branch page consists of:

A bounding box that contains all the objects in its child pages
A page number that points to a lower-level (branch or leaf) page in the index

The size of an index entry in a root or branch page is the size of the bounding box plus 12 bytes.

Each type of page in an R-tree index (leaf, branch, or root) also has an overhead of 20 bytes plus the size of the overall bounding box of the page.

The number of levels needed to support an R-tree index depends on the number of index entries each index page can hold. The number of entries per index page depends, in turn, on the size of the key value. The number of entries per page determines the branching factor of the tree. More entries per page, or a higher branching factor, means that fewer levels are needed for the same number of leaf pages as well as fewer leaf pages for a given number of base table keys. For any reasonable branching factor, almost all the space that an R-tree index needs is used by leaf pages.