Partial Derivatives¶

Simple Idea¶

When dealing with a function of multiple variables (like $z = f(x, y)$), a partial derivative measures the rate of change of the function's output with respect to just one of the input variables, while holding all other input variables constant.
Think of standing on a hillside (surface $z=f(x,y)$). The partial derivative with respect to $x$ tells you how steep the hill is if you walk only in the x-direction (East-West). The partial derivative with respect to $y$ tells you the steepness if you walk only in the y-direction (North-South).

For a function $z = f(x_1, x_2, ..., x_n)$, the partial derivative of $f$ with respect to the variable $x_i$ at a point $(a_1, ..., a_n)$ is found by treating all variables except $x_i$ as constants and taking the ordinary derivative with respect to $x_i$.
Notation:
- $\frac{\partial f}{\partial x_i}$ (Leibniz-style, "del f del x_i") - Most common.
- $f_{x_i}(x_1, ..., x_n)$ or $D_i f$ (Subscript notation).
Limit Definition (for $f(x, y)$ w.r.t $x$ at $(a, b)$): $$ \frac{\partial f}{\partial x}(a, b) = \lim_{h \to 0} \frac{f(a+h, b) - f(a, b)}{h} $$ (Notice only $x$ changes by $h$, $y$ stays fixed at $b$).

To find $\partial f / \partial x$, treat $y, z$, etc., as constants and apply the standard differentiation rules with respect to $x$.
To find $\partial f / \partial y$, treat $x, z$, etc., as constants and apply standard rules with respect to $y$.
Example: Let $f(x, y) = x^2 y^3 + 2x + y$.
- Find $\partial f / \partial x$: Treat $y^3$ and $y$ as constants. $\frac{\partial f}{\partial x} = \frac{\partial}{\partial x}(x^2 y^3) + \frac{\partial}{\partial x}(2x) + \frac{\partial}{\partial x}(y) = (2x)y^3 + 2 + 0 = 2xy^3 + 2$
- Find $\partial f / \partial y$: Treat $x^2$ and $2x$ as constants. $\frac{\partial f}{\partial y} = \frac{\partial}{\partial y}(x^2 y^3) + \frac{\partial}{\partial y}(2x) + \frac{\partial}{\partial y}(y) = x^2(3y^2) + 0 + 1 = 3x^2y^2 + 1$

For $z = f(x, y)$, $\partial f / \partial x$ at point $(a, b)$ is the slope of the curve formed by intersecting the surface $z=f(x,y)$ with the plane $y=b$, evaluated at $x=a$. It's the slope parallel to the xz-plane.
Similarly, $\partial f / \partial y$ is the slope of the curve formed by intersecting the surface with the plane $x=a$, evaluated at $y=b$. It's the slope parallel to the yz-plane. (Visual Idea: Excalidraw showing a surface with tangent lines drawn in the x and y directions at a point).

Partial derivatives can be taken multiple times and with respect to different variables.
Notation:
- $\frac{\partial^2 f}{\partial x^2} = \frac{\partial}{\partial x} \left(\frac{\partial f}{\partial x}\right)$
- $\frac{\partial^2 f}{\partial y^2} = \frac{\partial}{\partial y} \left(\frac{\partial f}{\partial y}\right)$
- $\frac{\partial^2 f}{\partial y \partial x} = \frac{\partial}{\partial y} \left(\frac{\partial f}{\partial x}\right)$ (Differentiate w.r.t. $x$ first, then $y$)
- $\frac{\partial^2 f}{\partial x \partial y} = \frac{\partial}{\partial x} \left(\frac{\partial f}{\partial y}\right)$ (Differentiate w.r.t. $y$ first, then $x$)
Clairaut's Theorem (Equality of Mixed Partials): If the second partial derivatives are continuous in a region, then the order of differentiation doesn't matter: $\frac{\partial^2 f}{\partial y \partial x} = \frac{\partial^2 f}{\partial x \partial y}$.

Gradient: The gradient of a multivariable function is a vector containing all of its first-order partial derivatives. $\nabla f = \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n} \right]^T$. This vector points in the direction of the function's steepest ascent.
Gradient Descent & Optimization: Partial derivatives are essential for finding the gradient of the loss function with respect to model parameters. Gradient descent uses this gradient to update parameters. $w_i \leftarrow w_i - \eta \frac{\partial \text{Loss}}{\partial w_i}$ (where $\eta$ is learning rate).
Directional Derivatives: Measures the rate of change in any direction (not just parallel to axes), calculated using partial derivatives and the direction vector.
Hessian Matrix: A square matrix of all second-order partial derivatives. Used in second-order optimization methods and analyzing concavity/convexity.
Jacobian Matrix: Generalizes the gradient to vector-valued functions (functions with multiple outputs). Its entries are partial derivatives.

Partial Derivative ($\partial f / \partial x_i$) measures the rate of change of a multivariable function $f$ with respect to one variable $x_i$, holding others constant.
Calculated using standard differentiation rules, treating other variables as constants.
Geometrically, represents the slope of the function's surface in the direction parallel to an axis.
The components of the crucial Gradient vector ($\nabla f$) are the partial derivatives.
Essential for multivariable optimization (Gradient Descent) in ML/AI.