Storage of dimension values

About this task

When dimensions are created, by default they are defined with the following settings:

  • Values to Record: Whitelist + Observed Values
  • Max Values Per Hour: 1000

The above configuration means the following:

  • Any observed value is saved to the dimension and thus is stored in the database.
  • For each Canister, up to 1000 unique values can be detected and recorded per hour. The set of unique values is cleared each hour.

Under these settings, data volumes can grow very quickly for the following reasons:

Procedure

  1. The uniqueness of values is reset each hour. The list of unique, observed values in the Canister is reset each hour. Once per hour, the observed values are collected from the Canister and written into the database, and the list of values that are known to the Canister is cleared. As a result, the total number of values can be greater than the unique values per hour. For example, if the limit is 1000 unique value per hour and in hour 1 you have 1000 unique values and in hour 2 you have a completely different set of 1000 unique values, you will have 2000 unique values.
  2. The limit to the number of values is applied per Canister. Each Canister is permitted to capture 1000 unique values per hour. In an environment with 20 Canisters, the maximum potential number of values is 20,000 per hour by default. While it is unlikely that each Canister captures 1000 unique values, it is important to remember that the data grows based on the number of Canisters capturing values.
  3. The default capture limit may not seem to be large, but over time it can be. In a one-Canister environment that is configured to capture 1,000 unique values per hour, the total number of values that could be captured in a day is 24,000 values. The number of values that are written to the database is then multiplied by the number of Canisters in the environment.

Results

The above data management issues are most significant in dimensions that are configured to use Whitelist + Observed Values. Since observed values are typically unfiltered or are highly dynamic, maximum permitted values for each hour can be reached quickly.

Note: When creating a dimension, Discover recommends immediately converting it to a Whitelist Only dimension and then using the recommended workflow to populate the dimension. This workflow is especially important for high-volume dimensions. See Recommended workflow below.