Data Collector

These settings define the characteristics for the Discover Data Collector, including buffer and batch sizes, intervals and time-outs, and the gathering of statistics on Discover events.

  • The Data Collector scans each active Canister every fives minutes for updated information. It is a Windows service on the Report Server.
Table 1. Data Collector
Setting Description Default
Data Aggregation Set this value to Enabled to enable data aggregation of collected data for reporting purposes.
Note: Don't change this setting unless directed by Discover.
Enabled
Data Aggregation - Usability Daily Data Processing Describes the available settings for the frequency of daily Usability aggregations and the date range over which they occur. Daily Through Previous Day
Data Aggregation - Daily Data Processing Describes the available settings for the frequency of daily aggregations and the date range over which they occur.
Note: Configuring this setting to a value other than Daily through Start of Hourly Retention Period retains overlapping aggregated data in the database and may impact system performance during data aggregation.

See Data Aggregation and Retention.

Daily through Start of Hourly Retention Period
Data Aggregation - Daily Data Time of Day This setting defines the hour of the day when the daily data aggregation run is performed, if the daily data aggregation is set to occur on a daily basis.
Note: Discover recommends configuring the daily data aggregation run to be performed during an off-peak hour.
02:00
Data Aggregation - Max Concurrent Daily Threads The maximum number of concurrent threads that can be used for aggregating daily reporting data. 4
Data Aggregation - Max Concurrent Performance Threads The maximum number of concurrent threads that can be used for aggregating performance data. 3
Data Aggregation - Max Concurrent Threads The maximum number of concurrent threads that can be used for aggregating base reporting data. 4
Data Aggregation - Max dimension extraction threads The maximum number of concurrent threads that can be used for aggregating dimension data. 4
Data Aggregation - Performance Daily Data Processing When data aggregation is performed on hourly performance data, this parameter defines the scope of the data that is aggregated at the daily level. Available options:
Note: Configuring this setting to a value other than Daily through Start of Hourly Retention Period retains overlapping aggregated data in the database and may impact system performance during data aggregation.
  • Hourly through Current Hour - Performance data is aggregated to the daily level through the current hour.
  • Daily through Previous Day - Performance data is aggregated at the daily level through the previous day.
  • Daily through Start of Hourly Retention Period - Performance data is aggregated at the daily level for dates before the start of the hourly data retention period, after which the hourly data is applicable.
Daily through Start of Hourly Retention Period
Data Aggregation - Performance Daily Data Time of Day The time of day when the data aggregation run is performed on performance data to aggregate hourly data to daily data 4:00
Data Aggregation - Performance Data When Enabled, data on client performance, response times, and connection times is aggregated for reporting purposes. Enabled
Data Collection Set this value to Enabled to enable the Data Collector service to collect data from the Long Term Canister for insertion into the Discover database.
Note: Don't change this setting unless directed by Discover.
Enabled
Data Collection - Canister Connection Timeout (seconds) 300
Data Collection - Chunk Size 2000
Data Collection - Max batches per collection run 10
Data Collection - Max Concurrent Canisters The number of canisters from which the Data Service will collect in parallel. 2
Data Collection - Max Concurrent Conenctions to a Single Canister 2
Max Extraction Table Queue Size 100
Data Collection - Max Sequential Canister Connection Failures Before Logging to Event Viewer 5
Data Collector - Log Entry Max Wait Time (minutes) Defines the frequency in minutes for writing accumulated log entries to the log file. 5
Data Collector - Log Entry Threshold (rows) Defines the number of log entries that are required before writing to the log file. Log entries are saved when one of the following occurs:
  • The log entry threshold that is configured for this setting is met.
  • The amount of time that is defined in Data Collector Log Entry Max Wait Time (minutes) has expired.
1000
Data Collector Logging Level Specify the logging level for the Data Collector only.
Note: Do not change this setting unless directed by Discover.

Levels:

  • 0 - none
  • 1 - Error (default value)
  • 2 - Warning
  • 3 - Info
  • 4 - Detail
  • 5 - Status
  • 6 - Trace
  • 7 - All
    Note: Status level messages always appear in the log for any non-zero logging level.

    This value overrides the system logging level, which can be configured through Manage Services. See "Manage Services WorldView Tab" in the Unica Discover Administration Manual.

Error
Data Trim - Canister Data Staging Tables Enabled
Data Trim - Canister Data Staging Tables Immediate Drop Enabled
Data Trim - Interval Determines the interval at which the Data Service trims data from the database 0 - None, 1 - Hourly, 2 - Daily, 3 - Weekly. Hourly
Data Trim - Max Batch Size The maximum number of records trimmed from the reporting or canister data tables in any single delete statement executed as part of a trimming operation. 100000
Data Trim - Reporting Data When Enabled, the reporting data is trimmed based on the configured Data Collector settings.
Note: When this value is set to Disabled, no trimming occurs at all. The Reports database can grow without limit.
Enabled
Data Trimm - Statistics When Enabled, the Discover statistics data is trimmed based on the configured Data Collector settings.
Note: When this value is set to Disabled, no trimming occurs at all. The Statistics database can grow without limit.
Enabled
Data Trim - System When Enabled, the user activity logs in the DC_SYSTEM database are trimmed based on the configured Data Collector settings.
Note: When this value is set to Disabled, no trimming occurs at all. The related System database tables can grow without limit.
Enabled
Data Trim - Anomaly Detection When Enabled, the Anomaly Detection data is trimmed based on the configured Data Collector settings.
Note: When this value is set to Disabled, no trimming occurs at all. Anomaly Detection data in the database can grow without limit.
Enabled
Database Connection - Timeout (seconds) The timeout in seconds when connecting to the database.
  • If the Data Collector aggregation operation times out (exceeds this setting), the setting is doubled in the next run. If it times out again, this timeout continues to be doubled until the number of times defined in the Data Collection Processes - Max Tries per Staging Table setting. The temporary extended connection timeout setting is maintained until the service is restarted, after which it reverts to the original timeout value defined for this setting.
60
Database Growth Calculation Time of Day The time of day when the database growth report is populated with current size information
  • This report is available through the Portal. See System Status.
5:00
Dimension Log Aggregation When Enabled, Discover dimension values are aggregated from log entries at predefined intervals. Enabled
Dimension Log Aggregation Interval When Dimension Log Aggregation is enabled, this setting defines the time interval between checks of the logs for reference values. Hourly
Dimension Log Aggregation Time When Dimension Log Aggregation is enabled and Dimension Log Aggregation Interval is set to Daily, this setting defines the 24-hour time when the review of the logs is executed. 3:00
Dimension Log Capture Days 40
Dimension Trim - Day of Week If Dimension Trimming - Frequency is set to Weekly, then this setting defines the day of the week when the trim operation is executed.
  • If Dimension Trimming - Frequency is set to Monthly, then this setting defines the first occurrence of the day in the month when the trim operation is executed.
  • See "Data Management for Dimensions" in the Unica Discover Event Manager Manual.
Sunday
Dimension Trim - Frequency Set this value to how frequently the dimension trimming operation is executed: Daily, Weekly, or Monthly.
  • See "Data Management for Dimensions" in the Unica Discover Event Manager Manual.
Daily
Dimension Trim - Time of Day The time of day when the dimension trimming operation occurs
Note: Discover recommends setting this value to occur during an off-peak hour, as early as possible after the end of peak usage and after the Scheduling Service has cycled services.
  • If services are cycled during a dimension trim operation, the Data Collector is forced to restart. See "Configuring the Scheduling Service" in the Unica Discover Configuration Manual.
  • Since the Database Table Size report is updated at 2AM, changes to the table sizes are not reflected in the report until the following evening under the default setting.
  • See "Data Management for Dimensions" in the Unica Discover Event Manager Manual.
3:00
Dimension Trim - Update "Others" Fact Counts In addition to trimming dimension values from the dimension data, this setting enables the updating of counts of trimmed dimension values to [others] in all reporting data, when Enabled.
  • When Enabled, updating of the counts for trimmed dimension values requires making the change to every instance of the dimension value in all reporting data. Depending on the number of instances and the number of trimmed values, this process can take a few minutes to multiple hours to complete.
  • When Disabled, reporting data is not updated during dimension trimming operations. As a result, discrepancies can be introduced between the sum of event counts not filtered by the trimmed dimension and the sum of event counts filtered by the trimmed dimension.
    Note: If you do not update counts in the reporting data as part of your dimension trimming, updates for previously reviewed dimension values are not subsequently applied to the reporting data if the option is enabled at a later time.
  • See "Data Management for Dimensions" in the ® Discover Event Manager Manual.
Enabled
Dimension Value Tracking - Max Concurrent Threads The maximum number of concurrent threads that can be spawned when timestamps for individual dimension values are being updated.
  • The Data Collector updates the timestamps for each value in each dimension when they are detected. During a dimension trimming operation, the Data Collector reads the timestamps associated with each value to determine the most recently occurring ones.
  • This setting defines the maximum number of threads that can be spawned during the timestamp updating process, which runs independently of the dimension trimming process.
    Note: Discover recommends leaving this value unchanged from the default setting.
4
Fact Limits - Check Interval (minutes) Defines the interval when the number of facts written for each event within the past hour is compared to the permitted maximum. Accepted values are 15, 30 or 60 minutes. 60
Index Maintenance Disabled
Index Maintenance - Check if usage stats for > (days) 7
Index Maintenance - Check tables older than (days) 180
Index Maintenance - Day of Week Saturday
Index Maintenance - Drop if < than (uses) 1
Index Maintenance - Hour of Day 3:00
PreAggregation - Canister Polling Interval (seconds) Frequency, in seconds, of polling canisters to check for new data. 60
PreAggregation - Staging Table Threshold (rows) The number of rows that are required for triggering staging table creation and writing the pre-aggregated data to the staging table.
Note: Data is also written to the staging table if minimum required rows are not accumulated but the amount of time that is configured in PreAggregation - Staging Table Write Max Wait Time has expired.
100000
PreAggregation - Staging Table Write Max Wait Time (minutes) The maximum amount of time, in minutes, the pre-aggregator will wait before writing the pre-aggregated data to the staging table.
Note: If the value set in PreAggregation - Staging Table Threshold (rows) has not yet been reached during this time period, the data that has been pre-aggregated data that is accumulated during this time period is written to staging table.
5
Refresh Frequency (minutes) - Event Definitions The frequency, in minutes, of refreshing event definitions from the database. 1
Refresh Frequency (minutes) - Control Settings The frequency, in minutes, of refreshing control settings from the database. 1
Send Report Schedules When Enabled, scheduled reports are delivered according to their configured settings. Enabled
Service Monitor Enabled
Service Monitor - Alert when a dimension count of new values in past hour is > [count] 1000
Service Monitor - Alert when a fact table grows by > [count] rows in an hour 50000
Service Monitor - Alert when fact data > [hours] old is being aggregated 2
Service Monitor - Alert when logged errors is > [count] 2
Service Monitor - Alert when logged warnings is > [count] 2
Service Monitor - Check errors/warnings from current time through [minutes] back 60
Service Monitor - Check frequency [minutes] 30
Service Monitor - De-activate a fact when a table grows by > [count] rows in an hour 75000
Service Monitor - Display messages with alert(s) in portal to Admin level users Enabled
Service Monitor - Display messages with alert(s) in portal to all users Disabled
Service Monitor - Email address for Data Collector Service Monitor alerts
Service Monitor - Email top [count] of log entries per DC component 10
Service Monitor - Errors/Warnings must span at least [minutes] to trigger alert 10
Anomaly Detection When Enabled, Anomaly Detection that have been configured and enabled are calculated. Hourly Anomaly Detection are calculated once per hour, and daily Anomaly Detection are calculated once per day. Enabled
Anomaly Detection - Auto-calculate Daily Anomaly Detection When Enabled, this setting forces the creation and calculation of daily Anomaly Detection for all current and newly created events and event + dimension combinations.
Switching this setting changes the status of all Anomaly Detection in the system. You may still manually enable or disable individual Anomaly Detection.
Note: Enabling this setting can have significant impacts on data storage and performance. See "Data Management for Anomaly Detection" in the ® Discover Event Manager Manual.
Disabled
Anomaly Detection - Auto-calculate Hourly Anomaly Detection When Enabled, this setting forces the creation and calculation of hourly Anomaly Detection for all current and newly created events and event + dimension combinations.
Switching this setting changes the status of all Anomaly Detection in the system. You may still manually enable or disable individual Anomaly Detection.
Note: Enabling this setting can have significant impacts on data storage and performance. See "Data Management for Anomaly Detection" in the ® Discover Event Manager Manual.
Disabled
Anomaly Detection - Maximum data points for calculations The maximum number of data points required to calculate average and deviation values for event and dimension Anomaly Detections. No more than the maximum number of data points are used in the calculation of averages and standard deviations for event and dimension Anomaly Detection. 16
Anomaly Detection - Minimum data points for calculations The minimum number of data points required to calculate average and deviation values for event and dimension Anomaly Detections. If there are too few data points, the deviation is not calculated for the period.

Discover recommends that you do not set this value below the default value (4). The minimum accepted value is 2.

4
Anomaly Detection - Number of Threads used for calculations The number of threads that the Data Collector uses when performing Anomaly Detection hourly and daily calculations.
Note: You can raise this setting to attempt to improve performance of Anomaly Detection calculations. However, depending on the system load at the time of calculation, raising this setting can negatively impact system performance. See "Data Management for Anomaly Detection" in the Unica Discover Event Manager Manual.
4
Anomaly Detection - Time for Daily Calculation When Anomaly Detection is enabled, this value specifies the time when deviation calculations are performed for daily Anomaly Detection. Time is based on the Discover system time zone. It should be configured for an off-peak hour. 4:30
Anomaly Detection - Calculation Mode This setting configures how Anomaly Detection are computed.
  • Consecutive Days - Anomaly Detection are calculated for consecutive days. For example, the data reported for a single Anomaly Detection might contain entries for each day of last week and this week.
  • Same Days - Anomaly Detection deviations are calculated from the same day from the previous weeks. For example, deviation values for Wednesday are computed using the preceding Wednesdays.
  • See "Manage Events - Anomaly Detection Tab" in the ® Discover Event Manager Manual.
Consecutive Days