Advanced analytics

Advanced analytics functions provide specialized methods of analyzing time series data for patterns or abnormalities.

You use advanced analytics functions to perform the following types of analysis:

  • Search based on specific measures like similarity, distance, and correlation. Find the portions of a sequence which are related to a given pattern.
  • Quantify similarity, distance, and correlation between two sequences using the Lp-norm, Dynamic Time Warping, or Longest Common Subsequence method.
  • Detect anomalies within time series data.

The following illustration shows different methods of comparing the same sequences. The Euclidean distance method, or the L2-norm method, does not find a match. Both the dynamic time warping and the longest common subsequence methods detect matches.

Figure 1: Comparison of Euclidean distance, dynamic time warping, and longest common subsequence matching

The image shows three graph of the same data with different matching methods.

Lp-norm

The Lp-norm method measures the linear distance between two patterns. The Lp-norm method has the following characteristics:

  • One-to-one comparisons of sequences
  • Sequences must have equal length
  • Results can be indexed in multi-dimensional indexes

The standard Euclidean distance is expressed as L2-norm, where the value of p is 2 in the following formula:

Figure 2: The Lp-norm formula

The Lp-norm formula is shown

Dynamic time warping

Dynamic time warping (DTW) is an algorithm for measuring similarity between two time-based sequences where the frequency of the observations is not synchronized between the two sequences. You can apply the DTW algorithm to linear numerical sequences of data, such as video, audio, and graphics data. The DTW algorithm calculates an optimal match between two given sequences with the sequences warped non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension.

DTW has the following characteristics:

  • One-to-many mapping on sequence points
  • Sequences are not required to have equal lengths
  • Tolerance for noise so that two sequences can be similar even if a small part of the sequences are significantly divergent

For an input time series of C1, C2, ..., Ci, and a pattern sequence of Q1, Q2, ..., Qj, the DTW score is computed with the following formula:

D(i,j)=|Ci – Qj|+min{D(i – 1,j),D(i – 1,j – 1),D(i,j – 1)}

The score can be visualized as a mapping between C and Q.

Figure 3: DTW mapping

The figure shows a curve for C and a curve for Q and their unconstrained DTW mapping.

You can constrain the DTW algorithm in the following ways:

Sakoe-Chiba constraint
The Sakoe-Chiba constraint forms a band to constrain the scoring. For an input time series of C1, C2, ..., Ci, and a pattern sequence of Q1, Q2, ..., Qj, the DTW score with a Sakoe-Chiba constraint is computed with the following formula:
D(i,j)=|Ci – Qj|+min{D(i – 1,j),D(i – 1,j – 1),D(i,j – 1)}, 
      where  |i-j| < norm_mconstraint *  min(lengthof(C),lengthof(Q))
    

The following illustration shows how the Sakoe-Chiba constraint forms a band where the DTW score is computed.

Figure 4: C | Q mapping with an mConstraint

The images show the Sakoe-Chiba band on C | B mappings.
Itakura Parallelogram constraint
The Itakura Parallelogram constraint forms a parallelogram-shaped region where the DTW score is computed, as shown in the following illustration.
The graphics show the Itakura parallelogram.

Longest common subsequence

The Longest common subsequence (LCSS) measures the similarity between two time series sequences.

Figure 5: The longest common subsequence formula

The longest common subsequence formula.

Pearson correlation coefficient

The Pearson correlation measures the degree of linear interrelation between two time series sequences. The Pearson correlation coefficient is obtained by dividing the covariance of the two variables by the product of their standard deviations. A correlation coefficient of 1 indicates an exact match.

Figure 6: Pearson correlation coefficient formula

The image shows the Pearson correlation coefficient formula.

Abnormal sequence detection

Abnormal sequence detection compares sequences to identify any sequences that are abnormal compared to other sequences. Given a long time series sequence, abnormal sequence detection provides the ability to tell which part of the time series is significantly different from the portion of data nearby in time order. For example, the tallest sequence in the following illustration is abnormal compared to the other sequences.

Figure 7: Abnormal sequence detection

The image show three line values, one of which is abnormal compared to the others.

You specify a sliding window length, the size of the step to move the comparison, and the size of the sequence. This method checks how many neighbors of a sequence's top-k nearest neighbor sequences also consider the sequence as a top-k nearest neighbor.

Figure 8: Abnormal sequence detection flow

The image shows how sequences are evaluated sequentially by the step size for a sliding window.

For sequence 3, only its neighboring sequence 6 regards it as a top-3 neighbor, which implies that sequence 3 is a relative outlier in the set of sequences 1 - 6.


The image shows a matrix of numbers.