While Variance measures the spread of a single random variable, Covariance and Correlation measure the degree to which two random variables move together.
Covariance indicates the direction of the linear relationship (positive or negative). A positive covariance suggests that as one variable tends to be above its mean, the other also tends to be above its mean. A negative covariance suggests they move in opposite directions relative to their means.
Correlation (specifically, Pearson correlation coefficient \(\rho\)) standardizes covariance to a unitless value between -1 and +1, making it easier to interpret the strength and direction of the linear relationship, regardless of the variables' scales.
Notation: Covariance \(Cov(X, Y)\) or \(\sigma_{XY}\). Correlation \(Corr(X, Y)\), \(\rho\), or \(\rho_{XY}\).
Definition: The expected value of the product of the deviations of two random variables \(X\) and \(Y\) from their respective means (\(\mu_X = E[X]\), \(\mu_Y = E[Y]\)):
$$ Cov(X, Y) = \sigma_{XY} = E[(X - \mu_X)(Y - \mu_Y)] $$
Computational Formula: Derived using linearity of expectation:
$$ Cov(X, Y) = E[XY] - E[X]E[Y] = E[XY] - \mu_X \mu_Y $$
Requires calculating \(E[X]\), \(E[Y]\), and \(E[XY]\) (the expected value of the product, found using the joint distribution).
Interpretation of Sign:
\(Cov(X, Y) > 0\): Indicates a positive linear relationship (X and Y tend to move in the same direction relative to their means).
\(Cov(X, Y) < 0\): Indicates a negative linear relationship (X and Y tend to move in opposite directions relative to their means).
\(Cov(X, Y) = 0\): Suggests no linear relationship. (Important: Could still have a non-linear relationship).
Units: The units of covariance are the units of X multiplied by the units of Y, making it hard to compare across different pairs of variables.
If \(X\) and \(Y\) are independent, then \(Cov(X, Y) = 0\) and \(Corr(X, Y) = 0\).
Proof Sketch: If independent, \(E[XY] = E[X]E[Y]\), so \(Cov(X, Y) = E[XY] - E[X]E[Y] = 0\).
Important: The converse is not generally true. \(Cov(X, Y) = 0\) (or \(\rho = 0\)) does not imply independence. It only implies the absence of a linear relationship. Variables can have a strong non-linear relationship (e.g., \(Y = X^2\) for \(X \sim N(0,1)\)) and still have zero covariance/correlation.
(Exception: If X and Y are jointly Normally distributed, then zero correlation does imply independence).
Critical Distinction: Correlation measures the statistical association between two variables. It does not imply that one variable causes the other. A correlation might arise due to:
Causation (X causes Y, or Y causes X)
A third, unobserved confounding variable affecting both X and Y.
Feature Selection & Engineering: Correlation analysis helps identify redundant features (high correlation) or features strongly related to a target variable. Used in understanding feature interactions.
Multicollinearity in Regression: High correlation between predictor variables is problematic for interpreting linear models.
Portfolio Theory (Finance): Covariance and correlation are essential for calculating portfolio risk and diversification benefits. Combining assets with low or negative correlation can reduce overall portfolio variance.
Variance of Sums: Covariance appears in the formula \(Var(X + Y) = Var(X) + Var(Y) + 2 Cov(X, Y)\).
Dimensionality Reduction: Techniques like Principal Component Analysis (PCA) rely heavily on the covariance matrix of the data.
Covariance \(Cov(X, Y)\) measures the direction of the linear relationship between \(X\) and \(Y\). Units depend on X and Y. \(E[(X-\mu_X)(Y-\mu_Y)] = E[XY] - E[X]E[Y]\).
Correlation \(Corr(X, Y)\) or \(\rho\) standardizes covariance (\(\frac{Cov(X,Y)}{\sigma_X \sigma_Y}\)) to be between -1 and +1, measuring the strength and direction of the linear relationship. It is unitless.
\(\rho = +1\): Perfect positive linear relation. \(\rho = -1\): Perfect negative linear relation. \(\rho = 0\): No linear relation.
Independence implies zero correlation, but zero correlation does not imply independence (unless jointly Normal).