Appendix 2: The Principle of Least Squares January 1, 2025
Least Squares and Sample Means ¶ Let’s say we make N N N measurements, x i x_i x i , of a quantity. To find the value X X X whose deviations from our measurements are minimized according to the principle of least squares, we need:
∑ ( x i − X ) 2 = minimum \sum(x_i-X)^2 = \text{minimum} ∑ ( x i − X ) 2 = minimum Let’s denote the mean of the measurements as x ˉ \bar{x} x ˉ . We can rewrite the sum of squared deviations as:
∑ ( x i − X ) 2 = ∑ [ ( x i − x ˉ ) + ( x ˉ − X ) ] 2 \sum (x_i - X)^2 = \sum[(x_i - \bar{x}) + (\bar{x}-X)]^2 ∑ ( x i − X ) 2 = ∑ [( x i − x ˉ ) + ( x ˉ − X ) ] 2 Expanding the squared term:
∑ ( x i − X ) 2 = ∑ [ ( x i − x ˉ ) 2 + ( x ˉ − X ) 2 + 2 ( x i − x ˉ ) ( x ˉ − X ) ] \sum (x_i - X)^2 = \sum[(x_i - \bar{x})^2 + (\bar{x}-X)^2 + 2(x_i - \bar{x})(\bar{x}-X)] ∑ ( x i − X ) 2 = ∑ [( x i − x ˉ ) 2 + ( x ˉ − X ) 2 + 2 ( x i − x ˉ ) ( x ˉ − X )] The cross-term ∑ ( x i − x ˉ ) \sum(x_i - \bar{x}) ∑ ( x i − x ˉ ) equals zero by definition of the mean, so:
∑ ( x i − X ) 2 = ∑ ( x i − x ˉ ) 2 + N ( x ˉ − X ) 2 \sum (x_i - X)^2 = \sum(x_i - \bar{x})^2 + N(\bar{x}-X)^2 ∑ ( x i − X ) 2 = ∑ ( x i − x ˉ ) 2 + N ( x ˉ − X ) 2 Fitting a Straight Line Using Least Squares ¶ Consider a set of observations ( x i , y i ) (x_i, y_i) ( x i , y i ) that we wish to fit with a linear relationship:
We’ll assume that uncertainty exists only in the y y y values, and that all measurements have equal weight (we’ll address weighted least squares later).
For each observation, the deviation from our proposed line is:
δ y i = y i − ( m x i + b ) \delta y_i = y_i - (mx_i + b) δ y i = y i − ( m x i + b ) According to the principle of least squares, we want to minimize the sum of the squares of these deviations:
∑ ( δ y i ) 2 = ∑ [ y i − ( m x i + b ) ] 2 \sum(\delta y_i)^2 = \sum[y_i - (mx_i + b)]^2 ∑ ( δ y i ) 2 = ∑ [ y i − ( m x i + b ) ] 2 Expanding this expression:
∑ ( δ y i ) 2 = ∑ [ y i 2 + m 2 x i 2 + b 2 − 2 m x i y i − 2 b y i + 2 m x i b ] \sum(\delta y_i)^2 = \sum[y_i^2 + m^2 x_i^2 + b^2 - 2m x_i y_i - 2b y_i + 2m x_i b] ∑ ( δ y i ) 2 = ∑ [ y i 2 + m 2 x i 2 + b 2 − 2 m x i y i − 2 b y i + 2 m x i b ] Or more compactly:
M = ∑ y i 2 + m 2 ∑ x i 2 + N b 2 + 2 m b ∑ x i − 2 m ∑ x i y i − 2 b ∑ y i M = \sum y_i^2 + m^2\sum x_i^2 + Nb^2 + 2mb\sum x_i - 2m\sum x_i y_i - 2b\sum y_i M = ∑ y i 2 + m 2 ∑ x i 2 + N b 2 + 2 mb ∑ x i − 2 m ∑ x i y i − 2 b ∑ y i Where M M M represents the sum of squared deviations that we want to minimize.
From the first condition:
2 m ∑ x i 2 + 2 b ∑ x i − 2 ∑ ( x i y i ) = 0 2m\sum x_i^2 + 2b\sum x_i - 2\sum(x_i y_i) = 0 2 m ∑ x i 2 + 2 b ∑ x i − 2 ∑ ( x i y i ) = 0 From the second condition:
2 N b + 2 m ∑ x i − 2 ∑ y i = 0 2Nb + 2m\sum x_i - 2\sum y_i = 0 2 N b + 2 m ∑ x i − 2 ∑ y i = 0 Solving these equations simultaneously gives us:
m = N ∑ ( x i y i ) − ∑ x i ∑ y i N ∑ x i 2 − ( ∑ x i ) 2 m = \frac{N \sum(x_i y_i) - \sum x_i\sum y_i}{N\sum x_i^2 - (\sum x_i)^2} m = N ∑ x i 2 − ( ∑ x i ) 2 N ∑ ( x i y i ) − ∑ x i ∑ y i b = ∑ x i 2 ∑ y i − ∑ x i ∑ ( x i y i ) N ∑ x i 2 − ( ∑ x i ) 2 b = \frac{\sum x_i^2 \sum y_i - \sum x_i\sum (x_i y_i)}{N\sum x_i^2 - (\sum x_i)^2} b = N ∑ x i 2 − ( ∑ x i ) 2 ∑ x i 2 ∑ y i − ∑ x i ∑ ( x i y i ) Having determined the “best fit” line, we need to quantify the uncertainty in our calculated parameters. Since m m m and b b b are computed from measurements with uncertainty, we can calculate their standard deviations.
For the standard deviation of each y i y_i y i value from our fitted line, we use:
S y = ∑ ( δ y i ) 2 N − 2 S_y = \sqrt{\frac{\sum(\delta y_i)^2}{N-2}} S y = N − 2 ∑ ( δ y i ) 2 The standard deviations of the slope and intercept are then:
S m = S y N N ∑ x i 2 − ( ∑ x i ) 2 S_m = S_y \sqrt{\frac{N}{N\sum x_i^2 - (\sum x_i)^2}} S m = S y N ∑ x i 2 − ( ∑ x i ) 2 N S b = S y ∑ x i 2 N ∑ x i 2 − ( ∑ x i ) 2 S_b = S_y \sqrt{\frac{\sum x_i^2}{N\sum x_i^2 - (\sum x_i)^2}} S b = S y N ∑ x i 2 − ( ∑ x i ) 2 ∑ x i 2 These expressions provide statistical measures of uncertainty in our fitted parameters. When reporting results, we typically state values as m ± S m m \pm S_m m ± S m and b ± S b b \pm S_b b ± S b , indicating that the true parameter has about a 68% probability of falling within one standard deviation of our estimate.
Weighted Least Squares ¶ Weighted Mean of Observations ¶ If we have independently measured quantities x i x_i x i , each with a standard deviation S i S_i S i , the weighted mean is:
x ˉ = ∑ ( x i / S i 2 ) ∑ ( 1 / S i 2 ) \bar{x} = \frac{\sum (x_i/S_i^2)}{\sum (1/S_i^2)} x ˉ = ∑ ( 1/ S i 2 ) ∑ ( x i / S i 2 ) The standard deviation of this weighted mean is:
S 2 = ∑ ( ( x i − x ˉ ) 2 / S i 2 ) ( N − 1 ) ∑ ( 1 / S i 2 ) S^2 = \frac{\sum ((x_i-\bar{x})^2/S_i^2)}{(N-1)\sum(1/S_i^2)} S 2 = ( N − 1 ) ∑ ( 1/ S i 2 ) ∑ (( x i − x ˉ ) 2 / S i 2 ) Straight-Line Fitting with Weighted Least Squares ¶ For observations with unequal precision, we modify our least squares approach by assigning weights. If the y y y values have varying precision, but the x x x values are considered exact, the equations for the slope and intercept become:
m = ∑ w i ∑ w i x i y i − ∑ w i x i ∑ w i y i ∑ w i ∑ w i x i 2 − ( ∑ w i x i ) 2 m = \frac{\sum w_i \sum w_i x_i y_i - \sum w_i x_i \sum w_i y_i }{\sum w_i \sum w_i x_i^2-(\sum w_i x_i )^2} m = ∑ w i ∑ w i x i 2 − ( ∑ w i x i ) 2 ∑ w i ∑ w i x i y i − ∑ w i x i ∑ w i y i b = ∑ w i y i ∑ w i x i 2 − ∑ w i x i ∑ w i x i y i ∑ w i ∑ w i x i 2 − ( ∑ w i x i ) 2 b = \frac{\sum w_i y_i \sum w_i x_i^2 - \sum w_i x_i \sum w_i x_i y_i }{\sum w_i \sum w_i x_i ^2-(\sum w_i x_i)^2} b = ∑ w i ∑ w i x i 2 − ( ∑ w i x i ) 2 ∑ w i y i ∑ w i x i 2 − ∑ w i x i ∑ w i x i y i Where w i w_i w i represents the weight of each observation, calculated as:
w i = 1 S y i 2 w_i = \frac{1}{S_{yi}^2} w i = S y i 2 1 The weighted standard deviation about the best-fit line is:
S y = ∑ w i δ i 2 N − 2 S_y = \sqrt{\frac{\sum w_i \delta_i^2}{N-2}} S y = N − 2 ∑ w i δ i 2 And the standard deviations of the slope and intercept are:
S m 2 = S y 2 W S_m^2 = \frac{S_y^2}{W} S m 2 = W S y 2 S b 2 = S y 2 ( 1 ∑ w i + x ˉ 2 W ) S_b^2 = S_y^2\left(\frac{1}{\sum w_i} + \frac{\bar{x}^2}{W}\right) S b 2 = S y 2 ( ∑ w i 1 + W x ˉ 2 ) Where:
W = ∑ ( w i ( x i − x ˉ ) 2 ) W = \sum(w_i (x_i-\bar{x})^2) W = ∑ ( w i ( x i − x ˉ ) 2 ) And x ˉ \bar{x} x ˉ is the weighted mean of the x x x values:
x ˉ = ∑ w i x i ∑ w i \bar{x} = \frac{\sum w_i x_i }{\sum w_i} x ˉ = ∑ w i ∑ w i x i