Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

The Equation of the Gaussian Distribution Curve

Let’s derive the equation that describes the Gaussian distribution, beginning with a fundamental model of random variation.

Consider a quantity whose true value is XX, but when measured, it’s subject to random uncertainty. We’ll model this uncertainty as arising from many small, independent fluctuations that can be either positive or negative with equal probability.

Specifically, imagine that our measurement is affected by 2n2n small fluctuations, each with magnitude EE. Each fluctuation has equal probability of being positive or negative. The measured value xx can therefore range from X2nEX-2nE (if all fluctuations are negative) to X+2nEX+2nE (if all are positive).

What we want to determine is the probability distribution for observing a particular deviation RR within this range of possible values. This probability depends on how many different ways a specific deviation can occur.

Understanding the Combinatorial Basis

Think about extreme deviations first. A deviation of exactly +2nE+2nE can happen in only one way - when all 2n2n fluctuations are positive. Similarly, a deviation of 2nE-2nE also happens in only one way.

A deviation of (2n2)E(2n-2)E is more likely because it can happen whenever exactly one of the fluctuations is negative (and the rest positive). Since any one of the 2n2n fluctuations could be that negative one, there are 2n2n different ways this deviation could occur.

More generally, if we want a total deviation RR equal to 2rE2rE (where rnr ≤ n), this means that out of our 2n2n fluctuations, (n+r)(n+r) must be positive and (nr)(n-r) must be negative. The number of ways to select (n+r)(n+r) positions from 2n2n positions is:

(2n)!(n+r)!(nr)!\frac{(2n)!}{(n+r)!(n-r)!}

This quantity represents the number of possible arrangements that yield our desired deviation. To convert this to a probability, we multiply by the probability of getting any specific arrangement of (n+r)(n+r) positive and (nr)(n-r) negative fluctuations, which is:

(12)n+r(12)nr=(12)2n\left(\frac{1}{2}\right)^{n+r}\left(\frac{1}{2}\right)^{n-r} = \left(\frac{1}{2}\right)^{2n}

The probability of deviation RR is therefore:

(2n)!(n+r)!(nr)!(12)2n\frac{(2n)!}{(n+r)!(n-r)!}\left(\frac{1}{2}\right)^{2n}

Simplifying with Stirling’s Approximation

To evaluate our expression for large nn, we need Stirling’s approximation. Here’s why this approximation works:

Consider that

1nlnxdx=[xlnxx]1n=nlnnn+1\int_1^n \ln x \, dx = [x\ln x - x]_1^n = n\ln n - n + 1

The integral approximates the sum ln1+ln2+ln3+...+lnn\ln 1 + \ln 2 + \ln 3 + ... + \ln n, which equals ln(n!)\ln(n!).

Graph showing how the area under the curve of ln(x) approximates the sum of logarithms used in Stirling's approximation

Figure 1:The area under the curve of ln(x) approximates the sum of logarithms, forming the basis for Stirling’s approximation of n!.

Therefore:

ln(n!)nlnnn\ln(n!) \approx n\ln n - n

n!ennnn! \approx e^{-n}n^n

This gives us the basic form, though the complete approximation includes the 2πn\sqrt{2\pi n} factor.

The Continuous Limit

Applying Stirling’s approximation to our probability expression and taking the limit as nn approaches infinity (with appropriate simplifications that involve several algebraic steps), we eventually obtain:

1nπer2n\frac{1}{\sqrt{n\pi}}e^{-\frac{r^2}{n}}

This gives us the essence of the Gaussian form: the probability decreases exponentially with the square of the deviation. Converting to standard notation with xx representing the deviation from the mean value XX, and using a parameter hh related to the width of the distribution:

P(x)=hπeh2x2dxP(x) = \frac{h}{\sqrt{\pi}}e^{-h^2x^2}dx

Where P(x)dxP(x)dx represents the probability of finding a deviation between xx and x+dxx+dx.

Standard Deviation of the Gaussian Distribution

The standard deviation provides a measure of the typical spread of values in the distribution. For a Gaussian distribution, we find the standard deviation by calculating:

σ2=1NNhπeh2x2x2dx=hπx2eh2x2dx\sigma^2 = \frac{1}{N}\int_{-\infty}^{\infty}\frac{Nh}{\sqrt{\pi}}e^{-h^2x^2}x^2\,dx = \frac{h}{\sqrt{\pi}}\int_{-\infty}^{\infty}x^2e^{-h^2x^2}\,dx

This integral equals π2h3\frac{\sqrt{\pi}}{2h^3}, giving us:

σ2=12h2\sigma^2 = \frac{1}{2h^2}

Therefore:

σ=12h\sigma = \frac{1}{\sqrt{2}h}

This allows us to rewrite the probability function in terms of the standard deviation:

P(x)dx=12πσ2ex22σ2dxP(x)dx = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{x^2}{2\sigma^2}}dx

Areas Under the Gaussian Distribution Curve

A key practical question is: what fraction of measurements will fall within certain limits? To answer this, we need to find the area under portions of the Gaussian curve.

The probability that a measurement falls between 0 and xx is:

0x12πσ2ex22σ2dx\int_0^x \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{x^2}{2\sigma^2}}dx

This integral has been calculated numerically and tabulated. The table below shows these probabilities for different values of x/σx/\sigma:

x/σx/\sigmaProbability of deviation between 0 and xx
0.00.0
0.50.19
1.00.34
1.50.43
2.00.48
3.00.499
Graph showing the Gaussian distribution curve with shaded area representing the probability of a deviation falling between 0 and x

Figure 2:The shaded area under the Gaussian distribution curve represents the probability of a deviation falling between 0 and x. This integral cannot be evaluated in closed form and must be computed numerically.

For the probability that a measurement falls within ±x/σ\pm x/\sigma of the mean (the symmetric interval), we double these values.

These probabilities form the foundation of statistical inference. When we make statements about the uncertainty of measurements, we often use these standard intervals - particularly the 68% confidence interval (±1σ\pm 1\sigma) and the 95% confidence interval (±2σ\pm 2\sigma).