Indexes on patterns

You can create a pattern matching index on a time series to speed pattern matching queries.

If you do not create pattern matching indexes, pattern matching queries run sequential scans to find matching data.

A pattern matching index is used during a pattern matching query if the query and the index match in the following ways:

  • The time series instance (the primary key value) in the query and the index must be the same.
  • The column in the TimeSeries subtype in the query and the index must be the same.
  • The search method (whole pattern match or subsequence pattern match) in the query and the index must be the same.
  • The search time range must be within the index time range.
  • The query pattern length must be an integer multiple of the sliding window size.

A pattern matching index processes time series values by a Discrete Fourier Transformation (DFT) into vector points, calculates bounding boxes around the vector points, and indexes the bounding boxes with an R-tree index structure. During a query, the pattern match index evaluates the bounding boxes for the target time series against the query pattern.

Before you create a pattern matching index, you must understand your data and the type of search queries that you plan to run. Consider the following factors:

  • The time series instance: You must create an index for every time series instance in which you want to search for patterns.
  • The search method: Either whole pattern match for searching consecutive sequences of values, or subsequence pattern match for searching every possible subsequence of values. You can create a whole pattern match index and a subsequence match index on the same column.
  • The accuracy of the index: A more accurate index results in faster queries but requires more storage space. The accuracy of the index depends on the number of DFT points that are included in a bounding box:
    • For a whole pattern match, set the partition length.
    • For a subsequence pattern match, set the sliding window size.
  • The time range: You specify the beginning and ending time stamps of the data that you want to index. A pattern match index is used only if the time range that is specified in the query is within the time range that is indexed.

For each time series instance, you can create one whole pattern match index and one subsequence pattern match index for each numeric column in the TimeSeries row type.

When you create a pattern matching index, an internal table is created to contain the bounding boxes. The internal table names have the following format: TS_Index_table_name_col_position, where table_name is the name of the time series table that is indexed, and col_position is the numeric position of the column in the TimeSeries data type that is indexed. The internal table TS_INDEX_MAPPING tracks the pattern matching indexes.

Index maintenance

To extend the time range of an existing index, run the TSCreatePatternIndex function with a new beginning point and ending point and with identical index properties of the existing index. You can extend the index time range in either direction, or both directions, but the existing and new time ranges must overlap.

To change the index properties of an existing index, replace an existing index with a new index by running the TSCreatePatternIndex function with the scope parameter set to 1.

To drop a particular index, run the TSDropPatternIndex function.

To drop all pattern matching indexes and the associated internal tables, run the TSPIndexCleanup function.

Storage requirements

The following formula gives an approximate number of disk pages for a pattern matching index:

disk pages = entries/((pagesize - 88)/(colsize +16))*.6
entries
The number of entries in the index depends on the search method.
Whole pattern match: entries=elements/partitionlen, where elements is the number of time series elements that are indexed and partitionlen is the value of the partitionlen index property.
Subsequence pattern match: entries=((elements/aggwin)-slidewin)/4, where elements is the number of time series elements that are indexed, aggwin is the value of the aggwin index property, and slidewin is the value of the slidewin index property.
pagesize
The page size of the dbspace, in bytes.
colsize
The size of the column in the TimeSeries subtype, in bytes.

The .6 multiplier is because R-tree index leaf pages are usually just over half full.