The X_PERCENTILE user-defined aggregate

Suppose you want to create a user-defined aggregate that calculates the x-percentile for a group of values. The x-percentile is the number within the group of values that separates x percent of the values below and (100-x) percent above. The median is a special case of the x-percentile. It represents the 50th-percentile:
X_PERCENTILE(y, 50)

That is, the above aggregate returns the value within a sample of y values that has an equal number of values (50 percent) above and below it in the sample.

The implementation of this UDA uses the following aggregate features:
  • Uses a setup argument to enable the end user to specify the x-percentile to obtain
  • Uses a pointer-valued state to hold the state information, and allocates and initializes the state in the INIT support function
  • Uses a COMBINE function that must do more than just call the ITER support function
  • Handles NULL values as aggregate arguments, including returning an SQL NULL value if the aggregate argument to return was NULL
The X_PERCENTILE user-defined aggregate needs the following state information:
  • The user-specified setup argument
  • The current number of values processed
  • The current list of values processed
  • The current list of whether the values processed are NULL.
Therefore, X_PERCENTILE uses a C structure named percentile_state_t to hold the aggregate state:
#define MAX_N 1000

typedef struct percentile_state
   {
   mi_integer percentile;
   mi_integer count;
   mi_integer value_array[MAX_N];
   mi_integer value_is_null[MAX_N];
   } percentile_state_t;
Important: The percentile_state_t structure stores the number of values processed in an in-memory array within the state. You can also choose to store these values elsewhere, such as in an operating-system file or in a separate location in memory. Each of these locations has advantages and disadvantages. Choose the structure that best fits your application needs.
The size of the percentile_state_t structure depends on the number of aggregate arguments stored in the value_array array; that is, values less then or equal to the MAX_N constant. On a system with four-byte mi_integer values, the size of this structure is:
8 + 4(MAX_N)

If X_PERCENTILE used an opaque-type state, this structure must be less than the maximum opaque-type size. For systems that have a 32 KB maximum opaque-type size, the X_PERCENTILE aggregate can use an opaque-type state as long as it is called in a query that finds 8190 or fewer rows. If the query finds more than 8190 rows, the state would not fit into an opaque type. To avoid this restriction, X_PERCENTILE implements the aggregate state as a pointer-valued state.

The following code shows the INIT aggregate support function that handles an INTEGER argument for the X_PERCENTILE aggregate:
/* X_PERCENTILE INIT support function on INTEGER */
mi_pointer init_xprcnt(dummy, prcntile, fparam)
   mi_integer dummy;
   mi_integer prcntile;
   MI_FPARAM *fparam;
{
   percentile_state_t *state;

   /* Allocate memory for the state from the PER_COMMAND
    * pool
    */
   state = (percentile_state_t *)
      mi_dalloc(sizeof(percentile_state_t), PER_COMMAND);

   /* Initialize the aggregate state */
   if ( mi_fp_argisnull(fparam, 1) )
      state->percentile = 50; /* median */
   else
      state->percentile = prcntile;
   state->count = 0;

   return ((mi_pointer)state);
}
This INIT support function performs the following tasks:
  • Handles a setup argument

    This setup argument is the value that the end user specifies so that the aggregate can determine the value that has x percent values below and (100-x) percent above. If the end user provides an SQL NULL for the setup argument, X_PERCENTILE assumes a value of 50 and therefore calculates the median.

  • Allocates PER_COMMAND memory for the pointer-valued state

    The database server does not perform state management for pointer-valued states. Therefore, the INIT function must allocate the memory for the state. It also assigns the appropriate values to the percentile_state_t structure to initialize the state.

The following code implements the ITER aggregate support function that handles an INTEGER argument for the X_PERCENTILE aggregate:
/* X_PERCENTILE ITER support function on INTEGER */
mi_pointer iter_xprcnt(state_ptr, value, fparam)
   mi_pointer state_ptr;
   mi_integer value;
   MI_FPARAM *fparam;
{   
   mi_integer i, j;
   mi_integer is_null = 0;
   percentile_state_t *state = 
      (percentile_state_t *)state_ptr;

   /* Check for NULL-valued 'value' */
   if ( mi_fp_argisnull(fparam, 1) )
      {
      value = 0;
      is_null = 1;
      }

   /* Find position of 'value' in ordered 'value_array' */
   for ( i=0; i < state->count; i++ )
      {
      if ( state->value_array[i] > value )
         break;
      }

   /* Increment number of values (count) */
   ++state->count;

   /* Put value into ordered list of existing values */
   for (j=state->count - 1; j > i; j--)
      {
      state->value_array[j] = state->value_array[j-1];
      state->value_is_null[j] = state->value_is_null[j-1];
      }
   state->value_array[i] = value;
   state->value_is_null[i] = is_null;

return ((mi_pointer)state);
}
The ITER support function updates the aggregate state in-place with the following information:
  • Increments the number of aggregate arguments processed (count)
  • Stores the new aggregate argument in increasing sorted order in the value_array array
  • Stores the is-NULL flag that corresponds to each aggregate argument in its corresponding position in the value_is_null array

The ITER function also handles a possible NULL-valued aggregate argument. Because the X_PERCENTILE aggregate is defined to handle NULL values (see Registering the X_PERCENTILE user-defined aggregate), the database server calls ITER for NULL-valued aggregate arguments.

The following COMBINE aggregate support function handles an INTEGER argument for the X_PERCENTILE aggregate:
/* X_PERCENTILE COMBINE support function on INTEGER */
mi_pointer combine_xprcnt(state1_ptr, state2_ptr)
   mi_pointer state1_ptr, state2_ptr;
{
   mi_integer i;
   percentile_state_t *state1 = 
      (percentile_state_t *)state1_ptr;
   percentile_state_t *state2 = 
      (percentile_state_t *)state2_ptr;

   /* Merge the two ordered value arrays */
   for ( i=0; i < state2->count; i++ )
      (void) iter_xprcnt(state1_ptr, 
         state2->value_array[i]);

   /* Free the PER_COMMAND memory allocated to the state of
    * the 2nd parallel thread (state2). The memory
    * allocated to the state of the 1st parallel thread
    * (state1) holds the updated state. It is in the FINAL
    * support function.
    */
   mi_free(state2);

   return (state1_ptr);
}
This COMBINE support function merges two aggregate states, as follows:
  • Two ordered lists are merged into a single ordered list.
  • Two counts are added together.
  • Memory for one of the partial states is freed.
  • A pointer to the merged aggregate state is returned.
The following FINAL aggregate support function handles an INTEGER argument for the X_PERCENTILE aggregate:
/* X_PERCENTILE FINAL support function on INTEGER */
mi_integer final_xprcnt(state_ptr, fparam)
   mi_pointer state_ptr;
   MI_FPARAM *fparam;
{
   mi_integer index, trunc_int;
   mi_integer x_prcntile;
   percentile_state_t *state = 
      (percentile_state_t *)state_ptr;

/* Obtain index position of x-percentile value */
   trunc_int = (state->count) * (state->percentile);
   index = trunc_int/100;
   if ( (trunc_int % 100) >= 50 )
      index++;

/* Obtain x-percentile value from sorted 'value_array' */ 
   x_prcntile = state->value_array[index];

/* Check for NULL value so it can be returned as such */
   if ( state->value_is_null[index] )
      mi_fp_setreturnisnull(fparam, 0, MI_TRUE);

/* Free the PER_COMMAND memory allocated to the state */
   mi_free(state);

/* Return retrieved x-percentile value */
   return (x_prcntile);
}
This FINAL support function performs the following tasks:
  • Calculates the x-percentile for the values in the sorted array

    The FINAL function must obtain the index position for the value that represents the x-percentile, where x is the percentage that the end user has passed in as a setup argument.

  • Deallocates PER_COMMAND memory for the pointer-valued state

    The database server does not perform any state management for pointer-valued states. Therefore, the FINAL function must deallocate the PER_COMMAND state memory that the INIT function has allocated.

After you successfully compile and link the aggregate support functions, you can define the PERCENT_GTR aggregate in the database. The following code fragment shows the CREATE AGGREGATE statement that defines the X_PERCENTILE user-defined aggregate. This statement specifies the registered SQL names of the required aggregate support functions. It also includes the HANDLESNULLS modifier to indicate that the PERCENT_GTR aggregate does process NULL values as aggregate arguments. By default, the database server does not pass a NULL value to an aggregate.
Figure 1: Registering the X_PERCENTILE user-defined aggregate
CREATE AGGREGATE x_percentile
   WITH (INIT = init_x_prcntile,
        ITER = iter_x_prcntile,
        COMBINE = combine_x_prcntile,
        FINAL = final_x_prcntile,
        HANDLESNULLS);

Suppose that the INIT, ITER, COMBINE, and FINAL aggregate support functions for the X_PERCENTILE aggregate are compiled and linked into a shared-object module named percent.

On UNIX™ or Linux™, the executable code for the X_PERCENTILE aggregate support functions would be in a shared library named percent.so.

The following CREATE FUNCTION statements register the X_PERCENTILE aggregate support functions for an aggregate argument of the INTEGER data type:
CREATE FUNCTION init_x_prcntile(dummy INTEGER, x_percent INTEGER)
   RETURNING POINTER
   WITH (HANDLESNULLS)
   EXTERNAL NAME 
      '/usr/udrs/aggs/percent/percent.so(init_xprcnt)'
   LANGUAGE C;

CREATE FUNCTION iter_x_prcntile(agg_state POINTER, 
      one_value INTEGER)
   RETURNS POINTER
   WITH (HANDLESNULLS)
   EXTERNAL NAME 
      '/usr/udrs/aggs/percent/percent.so(iter_xprcnt)'
   LANGUAGE C;

CREATE FUNCTION combine_x_prcntile(agg_state1 POINTER, 
      agg_state2 POINTER)
   RETURNS POINTER
   WITH (HANDLESNULLS)
   EXTERNAL NAME 
   '/usr/udrs/aggs/percent/percent.so(combine_xprcnt)'
   LANGUAGE C;

CREATE FUNCTION final_x_prcntile(agg_state POINTER)
   RETURNS INTEGER
   WITH (HANDLESNULLS)
   EXTERNAL NAME 
      '/usr/udrs/aggs/percent/percent.so(final_xprcnt)'
   LANGUAGE C;

These CREATE FUNCTION statements use the SQL data type, POINTER, to indicate that the aggregate support functions accept a generic C pointer and perform their own memory management. They must all include the HANDLESNULLS routine modifier because the X_PERCENTILE aggregate handles NULL values.

The following query uses the X_PERCENTILE aggregate function on the INTEGER column, col3, to determine the quartile (the 25th percentile) for the values in col3:
SELECT X_PERCENTILE(col3, 25) FROM tab1;
For the tab1 rows that A table with a complexnum_t column has inserted, X_PERCENTILE creates the following sorted list for the col3 values:
5, 9, 13, 19, 24, 31

Because 25 percent of 6 values is 1.5, X_PERCENTILE obtains the list item that has 2 values (1.5 rounded up to the nearest integer) below it. The preceding query returns 13 as the quartile for col3.

Suppose you add the following row to the tab1 table:
INSERT INTO tab1 (7, NULL:complexnum_t, NULL);
This INSERT statement adds a NULL value to the col3 column. Because X_PERCENTILE handles NULLs, the database server calls the X_PERCENTILE aggregate on this new row as well. After this seventh row is inserted, X_PERCENTILE would generate the following sorted list for col3:
(NULL), 5, 9, 13, 19, 24, 31

Twenty-five percent of 7 values is 1.75, so X_PERCENTILE obtains the list item that has 2 (1.75 truncated to the nearest integer) values below it. Now the quartile for col3 would be 9. If X_PERCENTILE was not registered with the HANDLESNULLS modifier, however, the database server would not call X_PERCENTILE for this newest row and the quartile for col3 would have been 13 (the quartile for 6 rows, even though col3 actually has 7 rows).

If you called the X_PERCENTILE aggregate with an x-percentile that would return the first value in the list (the NULL value), the FINAL support function uses the DataBlade® API function mi_fp_setreturnisnull() to set the aggregate result to NULL. For example, suppose you execute the following query on the col3:
SELECT X_PERCENTILE(col3, 5) FROM tab1;

This query asks for the 5th percentile for the seven values in col3. Because 5 percent of 7 values is 0.35, X_PERCENTILE obtains the list item that has zero values (0.35 truncated to the nearest integer) below it. The preceding query returns NULL as the quartile for col3. The ITER function has stored NULL values as zeros in the sorted value_array. For the FINAL support function to determine when a value of zero indicates a NULL and when it indicates zero, it checks the value_is_null array. If the zero indicates a NULL value, FINAL calls the DataBlade API function mi_fp_setreturnisnull() to set the aggregate result to NULL.