Dividing contacts into sample groups

To create target and control groups, use the Sample process. There are several sampling methods: Random creates statistically valid control groups or test sets. Every Other X allocates every other record to a sample group. Sequential Portions allocates a number of records into subsequent samples.

Procedure

  1. Open a campaign and click a flowchart tab.
  2. Click the Edit icon Tiny pencil icon in the flowchart window.
  3. Drag the Sample process Beaker pouring liquid into a test tube icon from the palette to your flowchart.
  4. Connect at least one configured process (such as a Select process) as input to the Sample process box.
  5. Double-click the Sample process in the flowchart.

    The process configuration dialog appears.

  6. Use the Input list to select the cells that you want to sample. The list includes all output cells from any process connected to the Sample process. To use more than one source cell, select the Multiple Cells option. If more than one source cell is selected, the same sampling is performed on each source cell.
    Note: All selected cells must be defined at the same audience level, such as Household or Customer.
  7. Use the # of Samples/Output Cells field to specify how many samples to create for each input cell. By default, three samples are created for each input cell, with default names Sample1, Sample2 and Sample3.
  8. To change the default sample names, double-click a sample in the Output Name column, then type a new name. You can use any combination of letters, numbers, and spaces. Do not use periods (.) or slashes (/ or \).
    Important: If you change the name of a sample, you must update all subsequent processes that use this sample as an input cell. Changing a sample name might unconfigure subsequent connected processes. In general, you should edit the names of samples before connecting subsequent processes.
  9. Use one of the following methods to define the sample size:
    • To divide records up by percentages: Select Specify Size By %, then double-click the Size field to indicate the percentage of records to use for each sample. Use the Max Size field if you want to limit the size of the sample. The default is Unlimited. Repeat for each sample listed in the Output Name column, or use the All Remaining check box to assign all remaining records to that sample. You can select All Remaining for only one output cell.
    • To specify the number of records for each sample size: Select Specify Size By # Records, then double-click the Max Size field to specify the maximum number of records to allocate to the first sample group. Specify the Max Size for the next sample in the Output Name column or use the All Remaining check box to assign all remaining records to that sample. You can select All Remaining for only one output cell.

      Optional: Click Sample Size Calculator, then use the calculator to determine the optimal sample size. Copy the value from the Min. Sample Size field in the calculator, click Done to close the calculator, then paste the value into the Max. Size field for Specify Size By # Records.

  10. Ensure that each sample in the Output Name list has a Size defined or has All Remaining checked.
  11. In the Sampling Method section, specify how to build the samples:
    • Random Sample: Use this option to create statistically valid control groups or test sets. This option randomly assigns records to sample groups using a random number generator based on the specified seed. Seeds are explained later in these steps.
    • Every Other X: This option puts the first record into the first sample, the second record into the second sample, up to the number of samples specified. This process repeats, until all records are allocated to a sample group. To use this option, you must specify the Ordered By options to determine how records are sorted into groups. The Ordered By options are explained later in these steps.
    • Sequential Portions: This option allocates the first N records into the first sample, the next set of records in the second sample, and so on. This option is useful for creating groups based on the top decile (or some other size) based on some sorted field (for example, cumulative purchases or model scores). To use this option, you must specify the Ordered By options to determine how records are sorted into groups. The Ordered By options are explained later in these steps.
  12. If you selected Random Sample, in most cases you can accept the default seed. The random seed represents the starting point that IBM Campaign uses to select IDs randomly.

    To generate a new seed value, click Pick or enter a value in the Seed field. Examples of when you might need to use a new seed value are:

    • You have exactly the same number of records in the same sequence and if you use the same seed value, records are created into the same samples each time.
    • The random sample produces undesired results (for example, all males are being allocated to one group and all females to another).
    Note: The same random set of records will be used for each subsequent run of the Sample process (unless the input to the process changes). This is important if you intend to use the results for modeling purposes, because different modeling algorithms must be compared across the same set of records to determine each model's effectiveness. If you do not intend to use the results for modeling, you can make the Sample process select a different random set of records each time it runs. To do this, use a Random Seed of zero (0). A value of 0 ensures that a different random set of records will be selected each time the process runs.
  13. If you selected Every Other X or Sequential Portions, you must specify a sort order to determine how records will be allocated to sample groups:
    1. Select an Ordered By field from the drop-down list or use a derived field by clicking Derived Fields.
    2. Select Ascending to sort numeric fields in increasing order (low to high) and sort alphabetic fields in alphabetical order. If you choose Descending, the sort order is reversed.
  14. Click the General tab if you want to modify the default Process Name and Output Cell Names. By default, output cell names consist of the process name followed by the sample name and a digit. You can accept the defaultCell Codes or uncheck the Auto Generate Cell Code box and assign codes manually. Enter a Note to clearly describe the purpose of the Sample process.
  15. Click OK.

Results

The process is configured and enabled in the flowchart. You can test run the process to verify that it returns the results you expect.