Least Squares Approximation#
Find the least squares approximation of the system \(A \boldsymbol{x} \approx \boldsymbol{b}\) by minimizing the distance \(\| A \boldsymbol{x} - \boldsymbol{b}\|\). There are several methods to find the approximation including the normal equations and the QR equations.
Definition#
Let \(A\) be an \(m \times n\) matrix with \(m > n\) and \(\mathrm{rank}(A) = n\). The least squares approximation of the system \(A \boldsymbol{x} \approx \boldsymbol{b}\) is the vector \(\boldsymbol{x}\) which minimizes the distance \(\| A\boldsymbol{x} - \boldsymbol{b} \|\).
Normal Equations#
Let \(A\) be an \(m \times n\) matrix with \(m > n\) and \(\mathrm{rank}(A) = n\). The least squares approximation of the system \(A \boldsymbol{x} \approx \boldsymbol{b}\) is the solution of the system
The system is called the normal equations.
Proof
If \(\boldsymbol{x} \in \mathbb{R}^n\), then \(A \boldsymbol{x} \in R(A)\). The projection theorem states that the point in \(R(A)\) nearest to \(\boldsymbol{b} \in \mathbb{R}^m\) is the orthogonal projection of \(\boldsymbol{b}\) onto \(R(A)\). If \(\boldsymbol{x}\) is the vector such that \(A\boldsymbol{x} = \mathrm{proj}_{R(A)}(\boldsymbol{b})\), then \(A\boldsymbol{x} - \boldsymbol{b}\) is in \(R(A)^{\perp}\) and therefore
We assume \(\mathrm{rank}(A) = n\), therefore \(A^TA\) is nonsingular and the solution exists and is unique.
QR Equations#
Let \(A\) be an \(m \times n\) matrix with \(m > n\) and \(\mathrm{rank}(A) = n\). The least squares approximation of the system \(A \boldsymbol{x} \approx \boldsymbol{b}\) is the solution of the system of equations
where \(A = Q_1 R_1\) is the thin QR decomopsition. The system is called the QR equations. Futhermore, the residual is given by
where \(A = QR\) is the QR deomposition with \(Q = [ Q_1 \ \ Q_2 ]\).
Proof
The matrix \(Q\) is orthogonal therefore
where we use the Pythagoras theorem in the last equality. The vector \(Q_2^T \boldsymbol{b}\) does not depend on \(\boldsymbol{x}\) therefore the minimum value of \(\| A \boldsymbol{x} - \boldsymbol{b} \|\) occurs when \(R_1 \boldsymbol{x} = Q_1^T \boldsymbol{b}\) and the residual is \(\| A \boldsymbol{x} - \boldsymbol{b} \| = \| Q_2^T \boldsymbol{b} \|\).
Fitting Models to Data#
Suppose we have \(m\) points
and we want to find a line
that “best fits” the data. There are different ways to quantify what “best fits” means but the most common method is called least squares linear regression. In least squares linear regression, we want to minimize the sum of squared errors
In matrix notation, the sum of squared errors is
where
We assume that \(m \geq 2\) and \(t_i \not= t_j\) for all \(i \not= j\) which implies \(\mathrm{rank}(A) = 2\). Therefore the vector of coefficients
is the least squares approximation of the system \(A \boldsymbol{c} \approx \boldsymbol{y}\). See Wikipedia:Simple linear regression.
More generally, given \(m\) data points
and a model function \(f(t,\boldsymbol{c})\) which depends on parameters \(c_1,\dots,c_n\), the least squares data fitting problem consists of computing parameters \(c_1,\dots,c_n\) which minimize the sum of squared errors
If the model function is of the form
for some functions \(f_1(t),\dots,f_n(t)\) then we say the data fitting problem is linear (but note the function \(f_1,\dots,f_n\) are not necessarily linear). In the linear case, use matrix notation to write the sum of squared errors as
where
We assume that \(m \geq n\) and \(f_1,\dots,f_n\) are linearly independently (which implies \(\mathrm{rank}(A) = n\)). Therefore the vector of coefficients \(\boldsymbol{c}\) is the least squares approximation of the system \(A \boldsymbol{c} \approx \boldsymbol{y}\).
Exercises#
Let \(A = QR\) where
Find the least squares approximation \(A\boldsymbol{x} \approx \boldsymbol{b}\) where
Solution
Setup (but do not solve) a linear system \(A \boldsymbol{c} = \boldsymbol{y}\) where the solution is the coefficient vector
such that the function
bests fits the data \((0,1),(1/4,3),(1/2,2),(3/4,-1),(1,0)\).