Aggregates

Expressions in HCL Detect Expression Language can also contain aggregates. Aggregates are summary statistics maintained at different temporal granularities. They are specified using their names, zero or more group by attributes (coming from the tuple being processed), period and the window unit.

The available periods are Current, Last, and AllTime. When the period is Current, the available window units are: Day, Hour, Month and Year. If the period is Last, the Minute can also be used as the window unit. The computed aggregates are always of type Double.

The following examples illustrate the use of aggregates as part of expressions:

  • This aggregate, named numCallsMade, returns the number of calls made for a given number within the last hour, where callingNumber is an attribute available from the current tuple: aggregate(numCallsMade[callingNumber], Last, 1, Hour).
  • Aggregates can be involved in arithmetic operations as usual: aggregate(numCallsMade[callingNumber],Last,1,Hour)+aggregate(numCallsMade[calledNumber], Last, 1, Hour)
  • Some aggregates may have no group by attributes: aggregate(totalCalls,Current,Month)
  • Some aggregates may have multiple group by attributes: aggregate(numCallsMade[callingNumber,callingCellTower],Last,1,Hour)

Aggregates can also specify the number of most recent time units to be used for the aggregation. By default an hourly aggregate is computed from 6 10-minute aggregates, a daily aggregate is computed from 24 hourly aggregates, a monthly aggregate is computed from daily aggregates within a month, and a yearly aggregate is computed from 12 monthly aggregates. One can specify a second argument as part of the aggregate's temporal access function, which represents the number of time units to be used for the aggregation. The time units used are always the most recent ones. For instance:

  • The following aggregate gets the number of calls made during the last week (7 days): aggregate(numCallsMade[callingNumber],Last,7,Day)

The aggregates with the Current and AllTime periods provide exact results:

  • aggregate(<aggregate>,Current,Hour): an exact aggregate value over all the activities within the current hour. E.g.: If the current time is 14:20pm, then the activities within the last 20 minutes are included.
  • aggregate(<aggregate>,Current,Day): an exact aggregate value over all the activities within the current day. E.g.: If the current time is 14:20pm, then the activities since midnight are included.
  • aggregate(<aggregate>,Current,Month): an exact aggregate value over all the activities with the current month. E.g.: If the current time is 14 May 14:20pm, then the activities since the beginning of May up to 14:20pm on May 14th are included.
  • aggregate(<aggregate>,Current,Year): an exact aggregate value over all the activities with the current year. E.g.: If the current time is 14 May 2017 14:20pm, then the activities since the beginning of 2017 up to 14:20pm on May 14th are included.
  • aggregate(<aggregate>,AllTime): an exact aggregate value over all the activities, irrespective of time.

There are 4 possible ways of computing aggregates with the Last period:

  • Aggregate over the last hour: an approximate aggregate value over the activities within the last 60 minutes, that is the last six 10-minute periods. It is an approximate value in the sense that if the current 10-minute interval is at least half past, then the aggregate is over the activities within the current 10-minute interval plus the last five 10-minute intervals. If the current 10 minute interval is less than half past, then the aggregate is over the activities within the current 10-minute interval plus the last six 10-minute intervals. E.g.: If the current time is 14:29pm, then the activities within the interval [13:30pm - 14:29pm] are included. If the current time is 14:21pm, then the activities within the interval [13:20pm - 14:21pm] are included.
  • Aggregate over the last day: an approximate aggregate value over the activities within the last 24 hours. It is an approximate value in the sense that if the current hour is at least half past, then the aggregate is over the activities within the current hour plus the last 23 calendar hours. If the current hour is less than half past, then the aggregate is over the activities within the current hour plus the last 24 calendar hours. E.g.: If the current time is Tuesday 14:50pm, then the activities within the interval [Monday 15:00pm - Tuesday 14:50pm] are included. If the current time is Tuesday 14:10pm, then the activities within the interval [Monday 14:00pm - Tuesday 14:10pm] are included.
  • Aggregate over the last month: an approximate aggregate value over the activities within the last 30 days. It is an approximate value in the sense that if the current day is at least half past, then the aggregate is over the activities within the current day plus the last 29 calendar days. If the current day is less than half past, then the aggregate is over the activities within the current day plus the last 30 calendar days. E.g.: If the current time is 14 May 22:00pm, then the activities within the interval [15 April 00:00am - 14 May 22:00pm] are included. If the current time is 14 May 02:00am, then the activities within the interval [14 April 00:00am - 14 May 02:00am] are included.
  • Aggregate over the last year: an approximate aggregate value over the activities within the last 12 months. It is an approximate value in the sense that if the current month is at least half past, then the aggregate is over the activities within the current month plus the last 11 calendar months. If the current month is less than half past, then the aggregate is over the activities within the current month plus the last 12 calendar months. E.g.: If the current time is 28 May 2017 14:00pm, then the activities within the interval [1 June 00:00am - 28 May 14:00pm] are included. If the current time is 2 May 14:00pm, then the activities within the interval [1 May 00:00am - 2 May 14:00pm] are included.

The same kind of approximation applies if the number of most recent time units are specified while accessing an aggregate. For instance, if the last 4 days are requested from the last month, then the current day plus the last 3 or 4 calendar days are included in the result, depending on whether the current day is at least half past or not, respectively.

These computations are done by using the following aggregate expressions:

  • aggregate(<aggregate>,Last,<windowLength>,Minute): this computes the aggregation by using windowLength minutes from the last hour.
  • aggregate(<aggregate>,Last,<windowLength>,Hour): if windowLength is greater than 1, this computes the aggregation over the last windowLength hours from the last day. Otherwise it computes the aggregation using the last hour.
  • aggregate(<aggregate>,Last,<windowLength>,Day): if windowLength is greater than 1, this computes the aggregation over the last windowLength days from the last month. Otherwise it computes the aggregation over the last day.
  • aggregate(<aggregate>,Last,<windowLength>,Month): if windowLength is greater than 1, this computes the aggregation over the last windowLength months from the last year. Otherwise it computes the aggregation over the last month.
  • aggregate(<aggregate>,Last,<windowLength>,Year): this computes the aggregation over the last year and the windowLength must be equal to 1.