Sample-related provisioning problems

To handle large volumes of data while not sacrificing the quality of the results, and at the same time getting the results in an acceptable amount of time, certain requirements are made regarding the makeup of the proposed contacts in a session.

One of the strategies Unica Optimize uses is to break the proposed contact data into random subsets of approximately equal numbers of customers; it then optimizes the proposed contacts of each of these samples independently. If multiple threads are configured and supported by your hardware, these customer samples are processed concurrently.

There is a class of problems that can result in errors or suboptimal results that are a side-effect of the customer sample approach. The number of customer samples that are used for a session run is determined by dividing the number of customers in the PCT by the value of the configuration parameter Optimize|AlgorithmTuning| CustomerSampleSize. It is important that there are enough proposed contacts matching each capacity rule to allow each random customer sample to be statistically similar relative to every feature used by a capacity rule.

For example, say that we have 1 million customers, and have a configured customer sample size of 1000. This configuration implies that we have 1000 customer samples. Imagine that we have a capacity rule that is set up as: minimum 1 email, maximum 5000 emails. What Unica Optimize does in this example is to take the rule constraints and modify them to spread that rule across the customer samples. In this example, the maximum 5000 emails constraint is divided by the number of samples, so that each sample is processed with a maximum 5 emails constraint. But what do we do with the minimum 1 email constraint? We cannot have each sample requiring a minimum 1/1000 of an email!

Instead, we randomly pick one sample to process with a minimum 1 email constraint, while the other 999 samples are processed with no minimum email constraint. This process all works fine, unless there are not enough proposed contacts using email to make sure that all 1000 samples get at least one email. If your proposed contacts contains only 500 contacts using email, there is a smaller than 50% chance that a particular sample contains an email. That means you have a greater than 50% chance that the session exits with an error because the minimum cannot be satisfied, even though 500 times that minimum were present in the proposed contacts.

To avoid this situation, any feature that is used in a capacity rule should be well-represented relative to the number of samples. The following scenario is an example of this. You have an input cell 1 that contains 100,000 contact IDs, which are all 100,000 distinct audience IDs. Offer 1 is assigned to input cell 1. You also have input cell 2 that contains one contact ID, and the audience ID is not in input cell 1. Offer 2 is assigned to input cell 2. The one capacity rule sets the minimum number of offers to 1, and the minimum number of contact IDs is set to 100,000.