Conditional Probability¶

Definition / Introduction¶

Conditional probability measures the likelihood of an event $A$ occurring given that another event $B$ has already occurred or is known to be true.
It allows us to update our probability estimates based on new information or conditions.
It's a fundamental concept in probability theory with wide applications in inference, machine learning (especially Bayesian methods), and decision making.

The conditional probability of event $A$ occurring given that event $B$ has occurred is denoted by $P(A|B)$.
Read as: "The probability of A given B".

If $P(B) > 0$, the conditional probability $P(A|B)$ is defined as: $$ P(A|B) = \frac{P(A \cap B)}{P(B)} $$
Where:
- $P(A|B)$ is the probability of A given B.
- $P(A \cap B)$ (or $P(A \text{ and } B)$) is the probability that both event A and event B occur (their intersection).
- $P(B)$ is the probability of the conditioning event B occurring.

When we know that event $B$ has occurred, we effectively restrict our Sample Space to only those outcomes included in $B$.
The conditional probability $P(A|B)$ is then the proportion of those outcomes in $B$ that also belong to event $A$.
The formula divides the probability of both happening ($P(A \cap B)$) by the probability of the condition happening ($P(B)$) to re-normalize the probability within the new, restricted sample space $B$.

Consider data on 100 patients regarding a disease (D) and a symptom (S):

Symptom (S) No Symptom ($\neg S$) Total

Disease (D) 15 5 20

No Disease ($\neg D$) 10 70 80

Total 25 75 100
Let's find $P(D|S)$: The probability a patient has the disease given they have the symptom.
- $P(D \cap S)$ (Disease and Symptom): $\frac{15}{100} = 0.15$
- $P(S)$ (Symptom): $\frac{25}{100} = 0.25$
- $P(D|S) = \frac{P(D \cap S)}{P(S)} = \frac{0.15}{0.25} = 0.60$ or 60%.
Alternatively, using intuition: Restrict the sample space to the 25 patients with the symptom. Within this group, 15 have the disease. So, $P(D|S) = \frac{15}{25} = 0.60$.

Rearranging the conditional probability formula gives a way to calculate the probability of the intersection of two events: $$ P(A \cap B) = P(A|B) P(B) $$ $$ P(A \cap B) = P(B|A) P(A) $$
This is useful for calculating the probability of a sequence of events.

For a sequence of events $A_1, A_2, ..., A_n$: $$ P(A_1 \cap A_2 \cap \dots \cap A_n) = P(A_1) P(A_2|A_1) P(A_3|A_1 \cap A_2) \dots P(A_n|A_1 \cap \dots \cap A_{n-1}) $$
Importance: Crucial in modeling sequences, like in Natural Language Processing (calculating probability of a sentence) or analyzing sequential processes.

Leads directly to the concept of Independence (when $P(A|B) = P(A)$).
Forms the basis for Bayes' Theorem, which relates $P(A|B)$ to $P(B|A)$.
Used extensively in Classification algorithms (e.g., Naive Bayes relies on assumptions about conditional probabilities).
Fundamental to understanding [[Markov Chains]].

Conditional probability $P(A|B)$ is the probability of event A happening, given that event B has already happened.
Formula: $P(A|B) = \frac{P(A \cap B)}{P(B)}$.
It represents updating probability based on new information by restricting the sample space.
The Multiplication Rule ($P(A \cap B) = P(A|B)P(B)$) helps find joint probabilities.
The Chain Rule extends this to sequences of events.