The Chain Rule of Conditional Probabilities


The Chain Rule of Conditional Probabilities is also called the general product rule. It allows the calculation of any number of the associate distribution of a set of random variables. It permits by using only conditional probabilities.

The Chain Rule is very helpful in the study of Bayesian networks that define a probability distribution in terms of conditional probabilities. In this article, we will understand the Chain Rule in detail.


Conditional probability arises:

  • When the probability of occurrence of a specific event changes
  • When one or more conditions are satisfied.
  • These conditions once more are events.
  • In technical terms, if A and B are two events then the conditional probability of A w.r.t B is denoted by P (A|B).
  • Therefore, when we speak in terms of conditional probability, just for an instance, we make a statement corresponding to “The probability of event A given that B has previously happened”.

If A and B are independent events

From the definition of independent events, the occurrence of event A is not dependent on event B. Therefore P (A|B) = P (A).

If A and B are mutually exclusive

As A and B are disjoint events, the probability that A will occur when B has already occurred is 0. Therefore, P (A|B) = 0

Conditional probability of P (A|B) is undefined when P (B) =0. That is acceptable as if P (B) =0. It means that event B never occurs. Therefore, it does not make sense to talk about the probability of A given B.

The Chain Rule of Conditional Probabilities

If A and B are two events in a sample space S, then the conditional probability of A given B is defined as

P (A|B) =P (A∩B) P (B), when P (B)>0.

The Chain rule

We can rearrange the formula for conditional probability to get the Chain rule:

P (A, B) = p (A|B) p (B)

We can range this for three variables:

P(A,B,C) = P(A| B,C) P(B,C) = P(A|B,C) P(B|C) P(C)

and in general to n variables:

P(A1, A2, …, An) = P(A1| A2, …, An) P(A2| A3, …, An) P(An-1|An) P(An)

In general, we refer to this as the chain rule.

This formula is particularly important for Bayesian Belief Nets. It delivers a means of calculating the full joint probability distribution. The conditional probability of the aforementioned is a probability measure. Therefore, it fulfills probability axioms. In specific,

Axiom 1: For any event A, P (A|B) ≥0.

Axiom 2: Conditional probability of B given B is 1, i.e., P (B|B) =1.

Axiom 3: If A1,A2,A3,⋯ are disjoint events, then P(A1∪A2∪A3⋯|B)=P(A1|B)+P(A2|B)+P(A3|B)+⋯