Exporting data from Unica Campaign to a Hive-based Hadoop system

You can send data from to your Hive-based Hadoop big data system.

About this task

To send data from Unica Campaign to your Hive-based Hadoop big data system, create a flowchart that pulls user data from one or more data sources, such as DB2® and Oracle databases. Configure the Snapshot process in a flowchart to export the data to your big data instance. When you run the flowchart, the snapshot data is exported to the Hive database.

The Unica Campaign configuration settings for the Hive dataSource determine how the data is transferred from Unica Campaign to Hive.

Procedure

  1. An administrator must configure the Hive data source (in Campaign | Partitions | Partition[n] | dataSources) to specify the required SCP and SSH commands:
    • The LoaderPreLoadDataFileCopyCmd value uses SCP to copy data from Unica Campaign to a temp folder called /tmp on your Hive-based Hadoop system. The location must be called /tmp and it must be on the Hive server (the file system location, not the HDFS location). This value can either specify the SCP command or call a script that specifies the SCP command. See the two examples below.
    • The LoaderPostLoadDataFileRemoveCmd value must specify the SSH "rm" command to remove the temporary file after it is loaded into Hive.
    To support this functionality, SSH must be configured on the Unica Campaign listener server. For instructions, see the Unica Campaign Installation Guide.
  2. Configure the Snapshot process in a flowchart to obtain input data from one or more data sources and export the data to your Hive database. Design the flowchart as you normally would, including any desired processes such as Select and Merge.
  3. Run the flowchart.

    The entire dataset is exported to a temporary data file at <Campaign_Home>/partitions/partition[n]/tmp. The temporary file is copied to the Hive server using LoaderPreLoadDataFileCopyCmd and the data is loaded into a Hive table. The temporary file is removed from the Hive server using LoaderPreLoadDataFileCopyCmd.

Example

Example 1: Configuring export to MapR: This example shows Unica Campaign configured for export to MapR, using a datasource called Hive_MapR. LoaderPreLoadDataFileCopyCmd uses SCP to copy the data file from the local machine running Unica Campaign to a temp directory on the remote machine running the Hive server (the MapR machine). LoaderPostLoadDataFileRemoveCmd uses SSH rm to remove the file.

Campaign | Partitions | Partition[n] | dataSources | Hive_MapR | LoaderPreLoadDataFileCopyCmd = scp <DATAFILE> mapr@example.company.com/tmp

Campaign | Partitions | Partition[n] | dataSources | Hive_MapR | LoaderPostLoadDataFileRemoveCmd = ssh mapr@example.company.com "rm/tmp/<DATAFILE>"

Example 2: Configuring export to Cloudera using a script: Using a script can be useful to avoid file permission issues. If there are any issues related to file permissions, the LOAD command cannot access the data file and the command fails. To avoid this type of issue, you can write your own shell or command-line script to SCP the data file to Hive and update the file permissions of the data file. The following example shows Unica Campaign configured to use a script for export to Cloudera. LoaderPreLoadDataFileCopyCmd calls a script that uses SCP to copy the data file from the local machine running Unica Campaign to a temp directory on the remote Cloudera machine. LoaderPostLoadDataFileRemoveCmd removes the file.

Campaign | Partitions | Partition[n] | dataSources | Hive_Cloudera | LoaderPreLoadDataFileCopyCmd = /opt/HCL/CampaignBD/Campaign/bin/copyToHadoop.sh <DATAFILE>

Campaign | Partitions | Partition[n] | dataSources | Hive_Cloudera | LoaderPostLoadDataFileRemoveCmd = ssh cloudera@example.company.com "rm /tmp/<DATAFILE>"

Here is the script that is called by LoaderPreLoadDataFileCopyCmd:
copyToHadoop.sh:
#!/bin/sh
scp $1 cloudera@example.company.com:/tmp
ssh cloudera@example.company.com "chmod 0666 /tmp/'basename $1'"

The script is on the Unica Campaign listener machine. The script executes the SCP command as the user “cloudera" on the destination server (example.company.com) to copy the file to the tmp directory. The SSH command connects as the same user to make sure that the permissions are correct for the load and removal processes that will follow.