Exporting data from Unica Campaign to a Impala-based Hadoop system

You can send data from Unica Campaign to your Impala-based Hadoop big data system.

About this task

To send data from Unica Campaign to your Impala-based Hadoop big data system, create a flowchart that pulls user data from one or more data sources, such as DB2® and Oracle databases. Configure the Snapshot process in a flowchart to export the data to your big data instance. When you run the flowchart, the snapshot data is exported to the Impala database.

The Unica Campaign configuration settings for the Impala dataSource determine how the data is transferred from Unica Campaign to Impala.

Procedure

  1. An administrator must configure the Impala data source (in Campaign | Partitions | Partition[n] | dataSources) to specify the required SCP and SSH commands:
    • The LoaderPreLoadDataFileCopyCmd value uses SCP to copy data from Unica Campaign to a location specified under configuration "DataFileStagingFolder on your Impala-based Hadoop system. This location must be an HDFS location on the Impala server . This value can either specify the SCP command or call a script that specifies the SCP command. See the example below.
    • The LoaderPostLoadDataFileRemoveCmd value must specify the SSH "rm" command to remove the temporary file after it is loaded into Impala.
    To support this functionality, SSH must be configured on the Unica Campaign listener server. For instructions, see the Unica Campaign Installation Guide.
  2. Configure the Snapshot process in a flowchart to obtain input data from one or more data sources and export the data to your Impala database. Design the flowchart as you normally would, including any desired processes such as Select and Merge.
  3. Run the flowchart.

    The entire dataset is exported to a temporary data file at <Campaign_Home>/partitions/partition[n]/tmp. The temporary file is copied to the Impala server using LoaderPreLoadDataFileCopyCmd and the data is loaded into a Impala table. The temporary file is removed from the Impala server using LoaderPreLoadDataFileCopyCmd.

Example

Example: Configuring export to Cloudera using a script: Using a script can be useful to avoid file permission issues. If there are any issues related to file permissions, the LOAD command cannot access the data file and the command fails. To avoid this type of issue, you can write your own shell or command-line script to SCP the data file to Hive and update the file permissions of the data file. The following example shows Unica Campaign configured to use a script for export to Cloudera. LoaderPreLoadDataFileCopyCmd calls a script that uses SCP to copy the data file from the local machine running Unica Campaign to a HDFS directory on the remote Cloudera machine. LoaderPostLoadDataFileRemoveCmd removes the file.

Campaign | Partitions | Partition[n] | dataSources | Impala_Cloudera | LoaderPreLoadDataFileCopyCmd = /opt/HCL/CampaignBD/Campaign/bin/copyToHadoop.sh <DATAFILE>

Campaign | Partitions | Partition[n] | dataSources | Impala_Cloudera | LoaderPostLoadDataFileRemoveCmd = ssh cloudera@example.company.com "rm /tmp/<DATAFILE>"

Here is the script that is called by LoaderPreLoadDataFileCopyCmd: copyToHadoop.sh: #!/bin/sh scp $1 cloudera@example.company.com:/tmp ssh cloudera@example.company.com "chmod 0666 /tmp/'basename $1'"

The script is on the Unica Campaign listener machine. The script executes the SCP command as the user “cloudera" on the destination server (example.company.com) to copy the file to the HDFS directory. The SSH command connects as the same user to make sure that the permissions are correct for the load and removal processes that will follow.