The Chain Rule is used to find the derivative of composite functions – functions that are nested inside other functions (like \(f(g(x))\)).
Think of layers: the Chain Rule tells you how a change in the input \(x\) affects the outer function \(f\) by considering how \(x\) affects the inner function \(g\), and how \(g\) in turn affects \(f\). It links the rates of change through the "chain" of functions.
If \(y = f(u)\) is a differentiable function of \(u\), and \(u = g(x)\) is a differentiable function of \(x\), then the composite function \(y = f(g(x))\) is a differentiable function of \(x\), and its derivative is:
$$ \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} $$
In Lagrange's notation: If \(h(x) = f(g(x))\), then
$$ h'(x) = f'(g(x)) \cdot g'(x) $$
In words: The derivative of the composite function is the derivative of the outer function (evaluated at the inner function) multiplied by the derivative of the inner function.
The rule extends to multiple compositions: If \(y = f(g(h(x)))\), then
$$ \frac{dy}{dx} = f'(g(h(x))) \cdot g'(h(x)) \cdot h'(x) $$
You multiply the derivatives of each "layer", evaluating outer derivatives at the inner function results.
Backpropagation in Neural Networks: The Chain Rule is the mathematical heart of backpropagation, the algorithm used to train deep neural networks. The loss function depends on the output layer, which depends on the hidden layers, which depend on the weights and inputs. Backpropagation uses the chain rule repeatedly to calculate the gradient of the loss with respect to each weight and bias throughout the network, allowing Gradient Descent to update the parameters.
Related Rates Problems: Used in physics and engineering to find the rate of change of one quantity in terms of the rate of change of another related quantity.
Implicit Differentiation: A technique that relies on the chain rule to find derivatives of implicitly defined functions.