Independence of Events¶

Definition / Introduction¶

In probability, two events $A$ and $B$ are independent if the occurrence of one event does not affect the probability of the other event occurring.
Conversely, events are dependent if the occurrence of one does change the probability of the other.
Understanding independence is crucial as it often allows for significant simplification in calculating joint probabilities and is a key assumption in many statistical models (like Naive Bayes).

Events A and B are independent if knowing that B occurred does not give you any new information about the likelihood of A occurring (and vice-versa).

There are three equivalent ways to mathematically define independence for events A and B:

Using Conditional Probability: A and B are independent if and only if: $$ P(A|B) = P(A) $$ (Requires $P(B) > 0$). This directly reflects the intuition: the probability of A remains the same even when we know B happened.
- Equivalently: $P(B|A) = P(B)$ (Requires $P(A) > 0$).
Using Joint Probability (Multiplication Rule for Independent Events): A and B are independent if and only if: $$ P(A \cap B) = P(A) P(B) $$ This is often the most practical way to check for or utilize independence. The probability of both independent events occurring is simply the product of their individual probabilities.

Events A and B are dependent if they are not independent. This means:
- $P(A|B) \neq P(A)$
- $P(B|A) \neq P(B)$
- $P(A \cap B) \neq P(A) P(B)$
In dependent events, knowing one occurred provides information about the other.

Independent Events:
- Experiment: Flipping a fair coin twice.
- Event A: Getting Heads on the first flip ($P(A) = 0.5$). Event B: Getting Heads on the second flip ($P(B) = 0.5$).
- These are independent. $P(A|B) = 0.5 = P(A)$. The outcome of the first flip doesn't influence the second.
- $P(A \cap B)$ (Heads on both) = $P(HH) = 0.25$. Also, $P(A) P(B) = 0.5 \times 0.5 = 0.25$. The multiplication rule holds.
Dependent Events:
- Experiment: Drawing two cards from a standard deck without replacement.
- Event A: The first card is a King ($P(A) = 4/52$). Event B: The second card is a King.
- These are dependent. If the first card was a King, then $P(B|A) = 3/51$. If the first card was not a King, then $P(B|\neg A) = 4/51$. Since $P(B|A) \neq P(B|\neg A)$, the probability of B depends on the outcome of A.
- Here, $P(B)$ is actually $4/52$ (by symmetry), but $P(B|A) = 3/51 \neq 4/52$.
- $P(A \cap B)$ (both Kings) = $P(A) P(B|A) = (\frac{4}{52}) \times (\frac{3}{51})$. This is not equal to $P(A) P(B) = (\frac{4}{52}) \times (\frac{4}{52})$.

Sometimes, two events A and B might be dependent, but become independent given a third event C. This is called conditional independence.
Definition: A and B are conditionally independent given C if $P(A \cap B | C) = P(A|C) P(B|C)$. Or equivalently $P(A | B, C) = P(A | C)$.
Importance: This concept is fundamental in graphical models and Bayesian networks.

For more than two events (e.g., A, B, C) to be mutually independent, the multiplication rule must hold for all possible subsets of events:
- $P(A \cap B) = P(A)P(B)$
- $P(A \cap C) = P(A)P(C)$
- $P(B \cap C) = P(B)P(C)$
- $P(A \cap B \cap C) = P(A)P(B)P(C)$

Crucial distinction when applying the Multiplication Rule. Use $P(A \cap B) = P(A)P(B)$ only if independent, otherwise use $P(A \cap B) = P(A|B)P(B)$.
The assumption of feature independence (often conditional independence given the class) is the core idea behind the Naive Bayes classifier, making computations tractable.
Related to the concept of uncorrelated variables in Covariance and Correlation, although independence is a stronger condition (independence implies zero correlation, but zero correlation does not always imply independence, except for normally distributed variables).

Independent Events: Occurrence of one does not affect the probability of the other.
Key Tests: $P(A|B) = P(A)$ or $P(A \cap B) = P(A) P(B)$.
Dependent Events: Occurrence of one does affect the probability of the other.
Independence simplifies calculations significantly ($P(A \text{ and } B) = P(A) P(B)$).
It's a critical assumption in many statistical models and algorithms.