Configuring the ETL process

After you install the Unica Interact ETL process, you need to configure the process by modifying files on the ETL process server, and in the Unica Platform configuration pages.

About this task

To configure the ETL process, there are files in the Interact home directory on the ETL process server to indicate where the necessary Java runtime files are found, as well as other environment variables. Then, you need to connect to the Platform server associated with this installation and use the configuration pages there to set up the necessary properties to run the ETL process.

Procedure

  1. On the server on which you installed the ETL process, open the following file in any text editor:
    <Unica Interact_home>\PatternStateETL\bin\setenv.bat on Microsoft Windows, or <Unica Interact_home>\PatternStateETL\bin\setenv.sh on a UNIX-like operating system.
    1. Complete the line that reads set JAVA_HOME=[CHANGE ME], changing [CHANGE ME] to the actual path to the 64-bit Java runtime you want to use.
      Note: Although the Unica installer provides a Java runtime in <Unica Interact_home>\..\jre, such as C:\HCL\UNICA\jre, this is a 32-bit Java runtime used only for installation. This runtime is not suitable for running the ETL process. Install a supported 64-bit Java runtime if one is not already installed, and update the setenv file to use that runtime.
    2. Complete the line that reads set JDBCDRIVER_CP= with the actual location of the JDBC driver for the connection to the database that contains the system tables.
      For example, if you were connecting to an Oracle database, you might specify the path to a local copy of ojdbc8.jar, in case of MariaDB, you may specify the path to local copy of mariadb-java-client-2.3.0.jar.
  2. In a supported web browser, connect to the Unica Platform server associated with this installation and log in using administer-level credentials.
  3. Open the Configuration page by clicking Settings > Configuration in the toolbar.

    The Configuration page shows the Configuration Categories tree.

  4. Navigate to interact | ETL in the Configuration Categories tree.
  5. Click PatternStateETLConfig Template under patternStateETLin the tree to create a new Pattern State ETL configuration.

    In the right pane, complete the following information:

    • New category name. Provide a name that uniquely identifies this configuration. Note that you must provide this exact name when you run the ETL process. For convenience in specifying this name on the command line, you might want to avoid a name that contains spaces or punctuation, such as ETLProfile1.
    • runOnceADay. Determines whether the ETL process in this configuration should run once each day. Valid answers are Yes or No. If you answer No here, the processSleepIntervalInMinutes determines the run schedule for the process.
    • preferredStartTime. The preferred time at which the ETL process should start. Specify the time in the format HH:MM:SS AM/PM, as in 01:00:00 AM.
    • preferredEndTime. The preferred time at which the ETL process should stop. Specify the time in the format HH:MM:SS AM/PM, as in 08:00:00 AM.
    • processSleepIntervalInMinutes. If you have not configured the ETL process to run once a day (as specified in the runOnceADay property), this property specifies the interval between ETL process runs. For example, if you specify 15 here, the ETL process will wait for 15 minutes after it stops running before starting the process again.
    • maxJDBCInsertBatchSize. The maximum number of records of a JDBC batch before committing the query. By default, this is set to 5000. Note that this is not the maximum number of records that the ETL processes in one iteration. During each iteration, the ETL processes all available records from the UACI_EVENTPATTERNSTATE table. However, all those records are broken into maxJDBCInsertSize chunks.
    • maxJDBCFetchBatchSize. The maximum number of records of a JDBC batch to fetch from the staging database.

      You might need to increase this value to tune the performance of the ETL.

    • communicationPort. The network port on which the ETL process listens for a stop request. Under normal circumstances, there should be no reason to change the network port from the default value.
    • queueLength. A value used for performance tuning. Collections of pattern state data are fetched and transformed into objects that are added to a queue to be processed and written to the database. This property controls the size of the queue.
    • completionNotificationScript. Specifies the absolute path to a script to run when the ETL process is completed. If you specify a script, three arguments are passed to the completion notification script: start time, end time, and total number of event pattern records processed. The start time and end time are numeric values representing number of milliseconds elapsed since 1970.
    • requireAuthentication. Specifies whether a password is required to run the Pattern State ETL command. The platform user who is running the Pattern State ETL command must have administrative privileges.
  6. When you finish completing the configuration, click Save.
    When you save the configuration, three additional categories are automatically created in the tree underneath the new configuration: Report, RuntimeDS, and TargetDS. Use the Report category to configure the report aggregation ETL integration. Use the RunteimDS and TargetDS categories to specify the data source where the ETL process should retrieve the data that it will process (the database that contains the Interact runtime tables), and the data source where the results are stored.
  7. Configure the Interact | ETL | patternStateETL | <patternStateETLName> | Report categories for the report aggregation ETL integration configuration.
    In the right pane, click Edit Settings and complete the following fields:
    • enable . Enable or disable the report integration with ETL. This property is set to disable by default.
    • retryAttemptsIfAggregationRunning. The number of times the ETL attempts to check whether the report aggregation is completed if the lock flag is set. This property is set to 3 by default.
    • sleepBeforeRetryDurationInMinutes. Sleep time in minutes between consecutive attempts. This property is set to 5 minutes by default.
    • aggregationRunningCheckSql. Use this property to define a custom SQL, which can be run to see whether the report aggregation lock flag is set. By default this property is empty.
      When this property is not set, the ETL runs the following SQL to get the lock flag.
      select count(1) AS ACTIVERUNS from uari_pattern_lock where islock='Y'
      => If ACTIVERUNS is > 0, lock is set
    • aggregationRunningCheck. Enable or disable the check if the report aggregation is running before the ETL run is performed. This property is set to enable by default.

    Save the changes when you are done.

  8. Configure the Interact | ETL | patternStateETL | <patternStateETLName> | RuntimeDS and Interact | ETL | patternStateETL | <patternStateETLName> | TargetDS categories for the ETL configuration.

    The two categories determine the data sources for retrieving and storing the event pattern data used by the ETL process.

    Note: The data source that you specify for the TargetDS configuration might be the same data source in which the Unica Interact runtime tables are stored, or it might be a different data source for performance reasons.
    1. Click the category (RuntimeDS or TargetDS) that you want to configure.
    2. In the right pane, click Edit Settings and complete the following fields:
      • type. A list of the supported database types for the data source you are defining.
      • dsname. The JNDI name of the data source. This name must also be used in the user's data source configuration to ensure that the user has access to the target and runtime data sources.
      • driver. The name of the JDBC driver to use, such as any of the following:

        Oracle: oracle.jdbc.OracleDriver

        Microsoft SQL Server: com.microsoft.sqlserver.jdbc.SQLServerDriver

        IBM DB2: com.ibm.db2.jcc.DB2Driver

        MariaDB: org.mariadb.jdbc.MySQLDataSource

      • serverUrl. The data source URL, such as any of the following:

        Oracle: jdbc:oracle:thin:@ <your_db_host>:<your_db_port>:<your_db_service_name>

        Microsoft SQL Server: jdbc:sqlserver:// <your_db_host>:<your_db_port> ;databaseName= <your_db_name>

        IBM DB2: jdbc:db2:// <your_db_host>:<your_db_port>/<your_db_name>

        MariaDB: jdbc:mariadb://<your_db_host>:<your_db_port>/<your_db_name>

      • connectionpoolSize. A value that indicates the size of the connection pool, provided for performance tuning. Pattern state data is read and transformed concurrently depending upon the available database connections. Increasing the connection pool size allows for more concurrent database connections, subject to limitations of memory and database read/write capabilities. For example, if this value is set to 4, four jobs run concurrently. If you have a large amount of data, you might need to increase this value to a number such as 10 or 20, as long as sufficient memory and database performance is available.
      • schema. The name of the database schema to which this configuration is connecting.
      • connectionRetryPeriod. The ConnectionRetryPeriod property specifies the amount of time in seconds Unica Interact automatically retries the database connection request on failure. Unica Interact automatically tries to reconnect to the database for this length of time before reporting a database error or failure. If the value is set to 0, Unica Interact retries indefinitely; if the value is set to -1, no retry is attempted.
      • connectionRetryDelay. The ConnectionRetryDelay property specifies the amount of time in seconds Unica Interact waits before it tries to reconnect to the database after a failure. If the value is set to -1, no retry is attempted.
    Save the changes when you are done specifying both the runtime and target data sources.
  9. Still in Unica Platform server, click Settings > Users in the toolbar.
  10. Edit the user that is running the ETL process and click Edit Data Sources.
  11. Define data sources for the user to match the TargetDS and RuntimeDS categories you defined for the ETL category. The Data Source name that you specify for the user data source must match the value of the dsname property for TargetDS or RuntimeDS configuration.
    The event pattern state ETL read the user name and password you specify here to connect to the database during processing.

Alternate procedure to configure the ETL process

A property file can be used to configure Pattern State ETL.
  1. Copy all the configuration settings from Platform into a property file. A sample of such file is included in {{$INTERACT_HOME/PatternStateETL/conf}}.
  2. In the command that starts pattern state ETL, remove both username and password and add the following JVM parameter{{-Dcom.ibm.interact.evpatetl.conf=<absolute path of the property file>}}.
  3. Start the ETL process.

Results

You have now configured the Unica Platform for use with the event pattern ETL process. Be aware that any changes you make to the ETL configuration other than the communication port are automatically implemented in the next run of the ETL process. There is no need to restart the ETL process after you change the configuration, unless you specify a new communication port.

What to do next

With the installation and configuration of the event pattern ETL process that is completed, you are now ready to run the process.