The L₁ norm of a vector measures its "length" by summing the absolute values of all its components.
Imagine moving along a city grid (like Manhattan) where you can only travel horizontally or vertically. The L₁ norm is the total distance you travel along the grid lines from the origin to the vector's endpoint.
In 2D, for a vector \([x_1, x_2]^T\), \(||\mathbf{x}||_1 = |x_1| + |x_2|\). This represents the distance from the origin \((0,0)\) to the point \((x_1, x_2)\) if movement is restricted to horizontal and vertical paths.
The set of all vectors with \(||\mathbf{x}||_1 = 1\) (the "unit ball" in the L₁ norm) forms a square (or diamond) shape rotated by 45 degrees in 2D, or an octahedron in 3D.
(Visual Idea: An Excalidraw showing the L₁ unit "circle" (diamond) vs the L₂ unit circle (circle) would highlight the difference).
Lasso Regression (L1 Regularization): This is the most significant application in machine learning. Lasso adds a penalty term to the loss function proportional to the L₁ norm of the model's weight vector (\(\lambda ||\mathbf{w}||_1\)).
Sparsity: A key effect of L₁ regularization is that it tends to produce sparse solutions, meaning many of the weights (\(w_i\)) are driven to exactly zero. This effectively performs automatic feature selection, as features corresponding to zero weights are removed from the model. The sharp corners of the L₁ "ball" encourage solutions (where level sets of the loss function first touch the ball) to land on the axes where some components are zero.
Robustness: L₁ norm is sometimes considered more robust to outliers than the L₂ norm when used in loss functions (e.g., Mean Absolute Error (MAE), \(\frac{1}{n}||\mathbf{y}_{\text{pred}} - \mathbf{y}_{\text{actual}}||_1\), uses L₁, whereas Mean Squared Error (MSE) uses L₂). Minimizing sum of absolute errors is less sensitive to large individual errors.
Compressed Sensing: L₁ minimization is used in signal processing to reconstruct signals from incomplete measurements, leveraging the assumption that the original signal is sparse in some basis.