Skipping duplicate IDs in process output

The Extract, Call list, Mail list, and Snapshot processes allow you to specify how to treat duplicate IDs in the process output. The default is to allow duplicate IDs in the output.

About this task

Follow these steps to exclude records with duplicate IDs from the output.

Note: This feature can impact performance because the application must download all data before it can de-duplicate the data. Best practice is to ensure that the data does not contain duplicates. You can use an ETL process to remove duplicates or choose columns in the audience key to make it unique.

Procedure

  1. From the configuration window of the process, click More.

    You see the Advanced Settings window.

    1. Select Skip records with duplicate IDs, and specify the criteria to determine which record to retain if duplicate IDs are returned. For example, select MaxOf and Household_Income to export only the ID with the highest household income.
      Note: This option only removes duplicates in the same input field. Your data can still contain duplicate IDs if the same ID appears in multiple fields. To remove all duplicate IDs, you must use a Merge or Segment process upstream of the Extract process to purge duplicate IDs or create mutually exclusive segments.
  2. Click OK to close the Advanced Settings window.

    Your duplicate ID settings are displayed in the configuration window.

    Note: In the Mail list or Call list process box, the Skip records with duplicate IDs option pertains only to the fulfillment table created by the process and not to records that are written to contact history. The contact history tables only handle unique IDs. The flowchart designer must ensure that the result set obtains the correct records before reaching the contact history tables. Use the Extract process to de-dupe the result set before the Mail list or Call list process box to ensure that the correct records are written to both the fulfillment table and contact history.