VARIANCE Function

The VARIANCE function returns an estimate of the population variance, as the standard deviation squared.

VARIANCE calculates the following value:
(SUM(Xi2) - (SUM(Xi)2)/N)/(N - 1)
In this formula,
  • Xi is each value in the column,
  • and N is the total number of non-NULL values in the column (unless all values are NULL, in which case the variance is logically undefined, and the VARIANCE function returns NULL).

You can apply the VARIANCE function only to numeric columns.

The following query estimates the variance of age values for a population:
SELECT VARIANCE(age) FROM u_pop WHERE u_pop.age > 0;
As with the other aggregates, the VARIANCE function applies to the rows of a group when the query includes a GROUP BY clause, as in this example:
SELECT VARIANCE(age) FROM u_pop GROUP BY birth
   WHERE VARIANCE(age) > 0;

As previously noted, VARIANCE ignores NULL values unless every qualified row is NULL for a specified column. If every value is NULL, then VARIANCE returns a NULL result for that column. (This typically indicates missing data, and is not necessarily a good estimate of underlying population variance.)

If N, the total number of qualifying non-NULL column values, equals 1, then the VARIANCE function returns zero (another implausible estimate of the true population variance). To omit this special case, you can modify the query. For example, you might include a HAVING COUNT(*) > 1 clause.
Important: All calculations for the VARIANCE function are performed in 32-digit precision, which should be sufficient for many sets of input data. The calculation, however, loses precision or returns incorrect results when all of the input data values have 16 or more digits of precision.

Although DATE values are stored internally as an integer, you cannot use the VARIANCE function on columns of data type DATE.