Apply function

The Apply function queries one or more time series and applies a user-specified SQL expression or function to the selected time series elements.

Syntax

Apply(sql_express  lvarchar, 
     ts           TimeSeries, ...) 
returns TimeSeries;

Apply(sql_express  lvarchar, 
     multiset_ts  multiset(TimeSeries)) 
returns TimeSeries;

Apply(sql_express  lvarchar, 
     filter       lvarchar, 
     ts           TimeSeries, ...) 
returns TimeSeries;

Apply(sql_express lvarchar, 
     filter      lvarchar, 
     multiset_ts multiset(TimeSeries)) 
returns TimeSeries;

Apply(sql_express lvarchar, 
     begin_stamp datetime year to fraction(5), 
     end_stamp   datetime year to fraction(5), 
     ts          TimeSeries, ...) 
returns TimeSeries with (handlesnulls);

Apply(sql_express lvarchar, 
     begin_stamp datetime year to fraction(5), 
     end_stamp    datetime year to fraction(5), 
     multiset_ts multiset(TimeSeries)) 
returns TimeSeries with (handlesnulls);

Apply(sql_express lvarchar, 
     filter      lvarchar, 
     begin_stamp datetime year to fraction(5), 
     end_stamp   datetime year to fraction(5), 
     ts          TimeSeries, ...) 
returns TimeSeries with (handlesnulls);

Apply(sql_express lvarchar, 
     filter      lvarchar, 
     begin_stamp datetime year to fraction(5), 
     end_stamp   datetime year to fraction(5), 
     multiset_ts multiset(TimeSeries)) 
returns TimeSeries with (handlesnulls);

sql_express: The SQL expression or function to evaluate.
filter: The filter expression used to select time series elements.
begin_stamp: The begin point of the range. See Clip function for more detail about range specifications.
end_stamp: The end point of the range. See Clip function for more detail about range specifications.
ts: The first ts argument is the first series, the second ts argument is the second series, and so on. This function can take up to eight ts arguments. The order of the arguments must correspond to the desired order in the SQL expression or function. There is no limit to the number of $ parameters in the expression.
multiset_ts: A multiset of time series.

Description

This function runs a user-specified SQL expression on the given time series and produces a new time series containing the result of the expression at each qualifying element of the input time series.

You can qualify the elements from the input time series by specifying a time period to clip and by using a filter expression.

The sql_express argument is a comma-separated list of expressions to run for each selected element. There is no limit to the number of expressions you can run. The results of the expressions must match the corresponding columns of the result time series minus the first time stamp column. Do not specify the first time stamp as the first expression; the first time stamp is generated for each expression result.

The parameters to the expression can be an input element or any column of an input time series. Use the following format:

$ts_position.col_number

The ts_position is the position of a given time series on the input time series list. The col_number is the number of the column in the TimeSeries data type. Both the position number and column number are zero-based. For example, $0 means the element of the first input time series, $0.0 represents its time stamp column, and $0.1 is the column following the time stamp column. Another way to refer to a column is to use the column name directly, instead of the column number. Suppose the second time series has a column called high then you can use $1.high to refer to it. If the high column is the second column in the element, $1.high is equivalent to $1.1.

If the column of the input time series is a BSON column, you can specify a field with the following format:

$ts_position.bson_column_name.path.field

The ts_position is the position of a given time series on the input time series list. You cannot specify the column number. If the BSON top-level field name is the same as another column in the TimeSeries data type, you must include the bson_column_name. Otherwise, omit the bson_column_name. The path is the path to the field within the BSON document, with each part of the path separated by a period. The field is the field name. For example, $1.employee.id represents the second input time series and if the TimeSeries data type does not contain an employee column, employee.id is processed as a path to the value in the BSON column in the TimeSeries data type. The BSON field is automatically cast to a FLOAT data type, however, you can explicitly cast a BSON field to an INTEGER, BIGINT, or LVARCHAR data type. For example, you can cast the employee.id field to an INTEGER data type: $1.employee.id::integer. For examples, see Example: Run Apply on a BSON column.

If Apply has only one time series argument, you can refer to the column name without the time series position part; hence, $0.high is the same as $high. Notice that $0 always means the whole element of the first time series. It does not mean the first column of the time series, even if there is only one time series argument.

If you use a function as your expression, then it must take the subtype of each input time series in that order as its arguments and return a row type that corresponds to the subtype of the result time series of Apply. In most cases, it is faster to evaluate a function than to evaluate a generic expression. If performance is critical, you should implement the calculation to be performed in a function and use the function syntax. See Example: Run Apply without a filter or a clipped range for how to achieve this.

The following examples show valid expressions for Apply to apply. Assume two argument time series with the same subtype daybar(t DATETIME YEAR TO FRACTION(5), high REAL, low REAL, close REAL, vol REAL). The expression could be any of:

"$0.high + $1.high)/2, ($0.low + $1.low)/2"
"($0.1 + $1.1)/2, ($0.2 + $1.2)/2"
"$0.high, $1.high"
"avghigh"

The signature of avghigh is:

"avghigh(arg1 daybar, arg2 daybar) returns (one_real)"

The syntax for the filter argument is similar to the previous expression, except that it must evaluate to a single-column Boolean result. Only those elements that evaluate to TRUE are selected.

"$0.vol > $1.vol and $0.close > ($0.high - $0.low)/2"

Apply with the multiset_ts argument assigns parameter numbers by fetching TimeSeries values from the set and processing them in the order in which they are returned by the set management code. Since sets are unordered, parameters might not be assigned numbers predictably. Apply with the multiset_ts argument is useful only if you can guarantee that the TimeSeries values are returned in a fixed order. There are two ways to guarantee this:

Write a C function that creates the set and use the function as the multiset_ts argument to Apply. The C function can return the TimeSeries values in any order you want.
Use ORDER BY in the multiset_ts expression

Apply with the multiset_ts argument evaluates the expression once for every timepoint in the resulting union of time series values. When all the data in the clipped period has been exhausted, Apply returns the resulting series.

Apply uses the optional clip time range to restrict the data to a particular time period. If the beginning timepoint is NULL, then Apply uses the earliest valid timepoint of all the input time series. If the ending timepoint is NULL, then Apply uses the latest valid timepoint of all the input time series. When the optional clip time range is not used, it is equivalent to both the beginning and ending timepoints being NULL: Apply considers all elements.

If both the clip time range and filter expression are given, then clipping is done before filtering.

If you use a string literal or NULL for the clip time range, you should cast to DATETIME YEAR TO FRACTION(5) on at least the beginning timepoint to avoid ambiguity in function resolution.

When more than one input time series is specified, a union of all input time series is performed to produce the source of data to be filtered and evaluated by Apply. Hence, Apply acts as a union function, with extra filtering and manipulation of union results. For details on how the Union function works, see Union function.

Returns

A new time series with the results of evaluating the expression on every selected element from the source time series.

Example: Run Apply without a filter or a clipped range

The following example uses Apply without a filter argument and without a clipped range:

select Apply('$high-$low',
      datetime(2011-01-01) year to day,
      datetime(2011-01-06) year to day,
      stock_data)::TimeSeries(one_real)
   from daily_stocks
   where stock_name = 'IBM';

Example: Run Apply without a filter and with a clipped range

The following example shows Apply without a filter and with a clipped range:

select Apply(
   '($0.high+$1.high)/2, ($0.low+$1.low)/2, ($0.final+$1.final)/2,
($0.vol+$1.vol)/2',
   datetime(2011-01-04) year to day, 
   datetime(2011-01-05) year to day,
   t1.stock_data, t2.stock_data)
   ::TimeSeries(stock_bar)
from daily_stocks t1, daily_stocks t2
where t1.stock_name = 'IBM' and t2.stock_name = 'HWP';

Example: Create a function with a filter without a clip range

The following example shows Apply with a filter and without a clip range. The resulting time series contains the closing price of the days that the trading range is more than 10% of the low:

create function ts_sum(a stock_bar)
    returns one_real;
    return row(null::datetime year to fraction(5),
   (a.high + a.low + a.final + a.vol))::one_real;
end function;

select Apply('ts_sum',
   '2011-01-03 00:00:00.00000'::datetime year 
      to fraction(5),
   '2011-01-03 00:00:00.00000'::datetime year 
      to fraction(5),
   stock_data)::TimeSeries(one_real) 
   from daily_stocks
      where stock_id = 901;

Example: Use a function as an expression

The following example uses a function as the expression to evaluate to boost performance. The first step is to compile the following C function into applyfunc.so:

/* begin applyfunc.c */
#include "mi.h"
MI_ROW *
high_low_diff(MI_ROW *row, MI_FPARAM *fp)
{
    MI_ROW_DESC            *rowdesc;
    MI_ROW            *result;
    void            *values[2];
    mi_boolean            nulls[2];
    mi_real            *high, *low;
    mi_real            r;
    mi_integer            len;
    MI_CONNECTION            *conn;
    mi_integer               rc;
    
    nulls[0] = MI_TRUE;
    nulls[1] = MI_FALSE;
    conn = mi_open(NULL,NULL,NULL);
    if ((rc = mi_value(row, 1, (MI_DATUM *) &high, 
      &len)) == MI_ERROR)
   mi_db_error_raise(conn, MI_EXCEPTION,
      "ts_test_float_sql: corrupted argument row");
    if (rc == MI_NULL_VALUE)
   goto retisnull;
    
    if ((rc = mi_value(row, 2, (MI_DATUM *) &low, 
      &len)) == MI_ERROR)
   mi_db_error_raise(conn, MI_EXCEPTION,
      "ts_test_float_sql: corrupted argument row");
    if (rc == MI_NULL_VALUE)
   goto retisnull;
    
    r = *high - *low;
    values[1] = (void *) &r;
    rowdesc = mi_row_desc_create(mi_typestring_to_id(conn, 
      "one_real"));
    result = mi_row_create(conn, rowdesc, (MI_DATUM *) 
      values, nulls);
    mi_close(conn);
    return (result);
 retisnull:
    mi_fp_setreturnisnull(fp, 0, MI_TRUE);
    return (MI_ROW *) NULL;
}
/* end of applyfunc.c */

Then create the following SQL function:

create function HighLowDiff(arg stock_bar) returns one_real
external name '/tmp/applyfunc.bld(high_low_diff)'
language C;


select stock_name, Apply('HighLowDiff', 
        stock_data)::TimeSeries(one_real)
from daily_stocks;

The following query is equivalent to the previous query, but it does not have the performance advantages of using a function as the expression to evaluate:

select stock_name, Apply('$high - $low', 
   stock_data)::TimeSeries(one_real)
from daily_stocks;

Example: Run Apply on a BSON column

The following statements create a time series with a BSON column, create a virtual table on the time series, and insert data into the time series:

create row type if not exists iot_row(
   tstamp datetime year to fraction (5),
   value bson);
Row type created.

create table if not exists cf (pid int primary key, ts timeseries(iot_row));
Table created.

EXECUTE PROCEDURE TSContainerCreate("ctn1" ,
        "iot_dbs",
        "iot_row",256,512);
Routine executed.

execute procedure TSCreateVirtualTab('iot_vti', 'cf', 'elem_insert');
Routine executed.

insert into cf values(1,
'origin(2016-01-01 00:00:00.00000),calendar(ts_1min),container(ctn1),
        threshold(0), regular');
1 row(s) inserted.

insert into iot_vti values(1,
        '2016-01-01 00:00:00.00000',
        '{"outside":{"temp":21,"pressure":1003.2},"engine":{"temp":140,
                     "pressure":2001}}'::json::bson);
1 row(s) inserted.

insert into iot_vti values(1,
        '2016-01-01 00:01:00.00000',
        '{"outside":{"temp":23,"pressure":1003.2},"engine":{"temp":145,
                     "pressure":2020}}'::json::bson);
1 row(s) inserted.

The following statements create row data types for the output of the Apply function:

create row type if not exists apply_row(tstamp datetime year to fraction (5), 
        value integer);
Row type created.

create row type if not exists bson_row(tstamp datetime year to fraction (5), 
        value bson);
Row type created.

The following statement selects the values of the engine.pressure field from the BSON column:

select Apply('$0.engine.pressure::integer',
        null::datetime year to fraction(5),
        null::datetime year to fraction(5),
      ts)::TimeSeries(apply_row)
   from cf;


(expression)  origin(2016-01-01 00:00:00.00000), calendar(ts_1day), container()
              , threshold(0), regular, [(2001       ), (2020       )]

1 row(s) retrieved.

The following statement returns the contents of the BSON column as a BSON document:

select Apply('$0.1',
        null::datetime year to fraction(5),
        null::datetime year to fraction(5),
      ts)::TimeSeries(bson_row)
   from cf;


(expression)  origin(2016-01-01 00:00:00.00000), calendar(ts_1day), container()
              , threshold(0), regular, [('{"outside":{"temp":21,"pressure":1003
              .2},"engine":{"temp":140,"pressure":2001}}'), ('{"outside":{"temp
              ":23,"pressure":1003.9},"engine":{"temp":145,"pressure":2020}}')]

1 row(s) retrieved.

The following statement returns the engine document from the BSON column:

select Apply('$engine::bson',
        null::datetime year to fraction(5),
        null::datetime year to fraction(5),
      ts)::TimeSeries(bson_row)
   from cf;


(expression)  origin(2016-01-01 00:00:00.00000), calendar(ts_1day), container()
              , threshold(0), regular, [('{"engine":{"temp":140,"pressure":2001
              }}'), ('{"engine":{"temp":145,"pressure":2020}}')]

1 row(s) retrieved.