varPop
This page covers the varPop and varPopStable functions available in ClickHouse.
varPop
Calculates the population covariance between two data columns. The population covariance measures the degree to which two variables vary together. Calculates the amount Σ((x - x̅)^2) / n, where n is the sample size and x̅is the average value of x.
Syntax
covarPop(x, y)
Parameters
Returned value
Returns an integer of type Float64.
Implementation details
This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable varPopStable function.
Example
Query:
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
    x Int32,
    y Int32
)
ENGINE = Memory;
INSERT INTO test_data VALUES (1, 2), (2, 3), (3, 5), (4, 6), (5, 8);
SELECT
    covarPop(x, y) AS covar_pop
FROM test_data;
Result:
3
varPopStable
Calculates population covariance between two data columns using a stable, numerically accurate method to calculate the variance. This function is designed to provide reliable results even with large datasets or values that might cause numerical instability in other implementations.
Syntax
covarPopStable(x, y)
Parameters
- x: The first data column. String literal
- y: The second data column. Expression
Returned value
Returns an integer of type Float64.
Implementation details
Unlike varPop(), this function uses a stable, numerically accurate algorithm to calculate the population variance to avoid issues like catastrophic cancellation or loss of precision. This function also handles NaN and Inf values correctly, excluding them from calculations.
Example
Query:
DROP TABLE IF EXISTS test_data;
CREATE TABLE test_data
(
    x Int32,
    y Int32
)
ENGINE = Memory;
INSERT INTO test_data VALUES (1, 2), (2, 9), (9, 5), (4, 6), (5, 8);
SELECT
    covarPopStable(x, y) AS covar_pop_stable
FROM test_data;
Result:
0.5999999999999999