Skip to article frontmatterSkip to article content

Least Squares and Sample Means

Let’s say we make NN measurements, xix_i, of a quantity. To find the value XX whose deviations from our measurements are minimized according to the principle of least squares, we need:

(xiX)2=minimum\sum(x_i-X)^2 = \text{minimum}

Let’s denote the mean of the measurements as xˉ\bar{x}. We can rewrite the sum of squared deviations as:

(xiX)2=[(xixˉ)+(xˉX)]2\sum (x_i - X)^2 = \sum[(x_i - \bar{x}) + (\bar{x}-X)]^2

Expanding the squared term:

(xiX)2=[(xixˉ)2+(xˉX)2+2(xixˉ)(xˉX)]\sum (x_i - X)^2 = \sum[(x_i - \bar{x})^2 + (\bar{x}-X)^2 + 2(x_i - \bar{x})(\bar{x}-X)]

The cross-term (xixˉ)\sum(x_i - \bar{x}) equals zero by definition of the mean, so:

(xiX)2=(xixˉ)2+N(xˉX)2\sum (x_i - X)^2 = \sum(x_i - \bar{x})^2 + N(\bar{x}-X)^2

Fitting a Straight Line Using Least Squares

For each observation, the deviation from our proposed line is:

δyi=yi(mxi+b)\delta y_i = y_i - (mx_i + b)

According to the principle of least squares, we want to minimize the sum of the squares of these deviations:

(δyi)2=[yi(mxi+b)]2\sum(\delta y_i)^2 = \sum[y_i - (mx_i + b)]^2

Expanding this expression:

(δyi)2=[yi2+m2xi2+b22mxiyi2byi+2mxib]\sum(\delta y_i)^2 = \sum[y_i^2 + m^2 x_i^2 + b^2 - 2m x_i y_i - 2b y_i + 2m x_i b]

Or more compactly:

M=yi2+m2xi2+Nb2+2mbxi2mxiyi2byiM = \sum y_i^2 + m^2\sum x_i^2 + Nb^2 + 2mb\sum x_i - 2m\sum x_i y_i - 2b\sum y_i

Where MM represents the sum of squared deviations that we want to minimize.

From the first condition:

2mxi2+2bxi2(xiyi)=02m\sum x_i^2 + 2b\sum x_i - 2\sum(x_i y_i) = 0

From the second condition:

2Nb+2mxi2yi=02Nb + 2m\sum x_i - 2\sum y_i = 0

Solving these equations simultaneously gives us:

m=N(xiyi)xiyiNxi2(xi)2m = \frac{N \sum(x_i y_i) - \sum x_i\sum y_i}{N\sum x_i^2 - (\sum x_i)^2}
b=xi2yixi(xiyi)Nxi2(xi)2b = \frac{\sum x_i^2 \sum y_i - \sum x_i\sum (x_i y_i)}{N\sum x_i^2 - (\sum x_i)^2}

For the standard deviation of each yiy_i value from our fitted line, we use:

Sy=(δyi)2N2S_y = \sqrt{\frac{\sum(\delta y_i)^2}{N-2}}

The standard deviations of the slope and intercept are then:

Sm=SyNNxi2(xi)2S_m = S_y \sqrt{\frac{N}{N\sum x_i^2 - (\sum x_i)^2}}
Sb=Syxi2Nxi2(xi)2S_b = S_y \sqrt{\frac{\sum x_i^2}{N\sum x_i^2 - (\sum x_i)^2}}

Weighted Least Squares

Weighted Mean of Observations

Straight-Line Fitting with Weighted Least Squares

The weighted standard deviation about the best-fit line is:

Sy=wiδi2N2S_y = \sqrt{\frac{\sum w_i \delta_i^2}{N-2}}

And the standard deviations of the slope and intercept are:

Sm2=Sy2WS_m^2 = \frac{S_y^2}{W}
Sb2=Sy2(1wi+xˉ2W)S_b^2 = S_y^2\left(\frac{1}{\sum w_i} + \frac{\bar{x}^2}{W}\right)

Where:

W=(wi(xixˉ)2)W = \sum(w_i (x_i-\bar{x})^2)

And xˉ\bar{x} is the weighted mean of the xx values:

xˉ=wixiwi\bar{x} = \frac{\sum w_i x_i }{\sum w_i}