Skip to article frontmatterSkip to article content

Understanding Random Variation

When we make measurements, we often observe that repeated measurements of the same quantity show random variations. This is a fundamental aspect of experimental science that we must understand and account for.

Consider measuring the radioactivity of a sample. Even with perfect equipment, the number of counts in a fixed time interval will vary randomly due to the inherent stochastic nature of radioactive decay. Similarly, optical measurements might show fluctuations due to air currents causing refractive index variations or thermal effects causing mechanical instabilities in the apparatus.

The Gaussian Distribution

When we make many measurements of the same quantity, the results often follow a bell-shaped curve known as the Gaussian or normal distribution. This distribution is fundamental to understanding measurement uncertainty.

The mathematical form of the Gaussian distribution is:

y=1σ2πe(xμ)22σ2y = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}

where μ is the population mean and σ is the population standard deviation.

These percentages are crucial for understanding measurement uncertainty. The 68-95-99.7 rule (sometimes called the empirical rule) provides a quick way to assess the likelihood that a measurement falls within certain bounds of the true value.

This distribution allows us to make meaningful statements about our measurements. For example, if we measure a length multiple times and find a mean of 10.5 cm with a standard deviation of 0.1 cm, we can say:

Sample Statistics and Population Parameters

When we make measurements, we’re typically working with a sample from a larger population of possible measurements. Understanding the relationship between sample statistics and population parameters is essential.

The sample mean is calculated as:

xˉ=1Ni=1Nxi\bar{x} = \frac{1}{N}\sum_{i=1}^{N} x_i

The sample standard deviation is:

s=i=1N(xixˉ)2N1s = \sqrt{\frac{\sum_{i=1}^{N}(x_i - \bar{x})^2}{N-1}}

where NN is the sample size and xix_i are individual measurements.

Note the (N1)(N-1) in the denominator, known as Bessel’s correction, which provides an unbiased estimate of the population standard deviation.

The standard error of the mean tells us how precisely we’ve determined the population mean. As our sample size increases, this uncertainty decreases as 1/N1/\sqrt{N}.

Distinction Between Standard Deviation and Standard Error

This distinction is crucial and frequently misunderstood:

For example, if we measure the same quantity 25 times and get s=2.0s = 2.0 units:

Propagation of Statistical Uncertainty

When we calculate derived quantities from multiple measurements, we need to understand how the uncertainties combine. The propagation formulas depend on whether we’re dealing with estimated uncertainties or statistical uncertainties.

General Error Propagation Rules

For a function z=f(x,y)z = f(x, y) where xx and yy are independent variables with standard deviations σx\sigma_x and σy\sigma_y:

σz2=(fx)2σx2+(fy)2σy2\sigma_z^2 = \left(\frac{\partial f}{\partial x}\right)^2 \sigma_x^2 + \left(\frac{\partial f}{\partial y}\right)^2 \sigma_y^2
example - Unknown Directive
If we measure a rectangle's length as $(25.4 \pm 0.2)$ cm and width as $(18.6 \pm 0.2)$ cm, what is the uncertainty in the area?
**Solution**:
1. Calculate the area: $A = L \times W = 25.4 \text{cm} \times 18.6 \text{cm} = 472.44 \text{cm}^2$
2. Calculate relative uncertainties:    
   - $\frac{\sigma_L}{L} = \frac{0.2}{25.4} \approx 0.00787$  
   - $\frac{\sigma_W}{W} = \frac{0.2}{18.6} \approx 0.01075$           
3. Combine relative uncertainties: Then use the multiplication rule:
   $$\left(\frac{\sigma_A}{A}\right)^2 = (0.00787)^2 + (0.01075)^2 \approx 0.0000619 + 0.0001156 = 0.0001775$$
   $$\frac{\sigma_A}{A} = \sqrt{0.0001775} \approx 0.0133$$
   $$\sigma_A = 0.0133 \times 472.44 \approx
   $$\sigma_A = \sqrt{0.0001775} \times 472.44 \text{cm}^2 \approx 6.3 \ \text{cm}^2$$
4. Final result: $A = (472 \pm 6) \ \text{cm}^2$

Statistical vs. Estimated Uncertainties

When combining different types of uncertainties (e.g., some estimated, some statistical), we need to ensure compatibility. If one uncertainty represents a 68% confidence interval (1 standard deviation) and another represents outer limits (~100% confidence), they cannot be directly combined using the standard propagation formulas.

Central Limit Theorem and Sampling

The Central Limit Theorem explains why the Gaussian distribution is so prevalent in measurement science. It states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the original population distribution.

This theorem justifies our use of Gaussian statistics even when individual measurements might not follow a perfect Gaussian distribution.

Identifying and Handling Outliers

Sometimes our measurements include values that seem unusually different from the others. These outliers require careful consideration and systematic analysis.

Chauvenet’s Criterion

Chauvenet’s criterion provides a statistical method for identifying potential outliers. The criterion states that a measurement should be rejected if the probability of obtaining a deviation as large or larger is less than 1/(2N)1/(2N), where NN is the total number of measurements.

Procedure for Chauvenet’s Criterion:

  1. Calculate the sample mean (xˉ\bar{x}) and standard deviation (ss)
  2. For each measurement xix_i, calculate the deviation: di=xixˉd_i = |x_i - \bar{x}|
  3. Express this as a number of standard deviations: ti=di/st_i = d_i/s
  4. Find the probability that a measurement would deviate by tit_i or more standard deviations (using Gaussian tables)
  5. If this probability is less than 1/(2N)1/(2N), the measurement is a candidate for rejection

Example: For N=10N = 10 measurements, reject if probability < 0.05 (about 2σ2\sigma) For N=20N = 20 measurements, reject if probability < 0.025 (about 2.2σ2.2\sigma)

Systematic Approach to Outliers

When handling potential outliers, follow this systematic approach:

  1. Verify the measurement was recorded correctly
  2. Check for obvious experimental problems (equipment malfunction, environmental disturbance)
  3. Apply statistical assessment (such as Chauvenet’s criterion)
  4. Document thoroughly the reasoning for any rejected measurements
  5. Never reject data simply because it doesn’t fit expectations

The probability guidelines from the Gaussian distribution help us make these decisions:

Confidence Intervals and Uncertainty Statements

Understanding how to make proper uncertainty statements is crucial for communicating experimental results.

Confidence Intervals

A confidence interval provides a range of values that likely contains the true population parameter. For a 95% confidence interval of the mean:

CI95%=xˉ±1.96×sN\text{CI}_{95\%} = \bar{x} \pm 1.96 \times \frac{s}{\sqrt{N}}

This means we’re 95% confident the true population mean lies within this range.

Proper Uncertainty Statements

When reporting results:

Sample Size Effects

The size of our sample dramatically affects the reliability of our statistical estimates.

Effect on Standard Error

The standard error of the mean decreases as 1/N1/\sqrt{N}:

Reliability of Standard Deviation Estimates

For small samples, our estimate of the population standard deviation is quite uncertain. The standard deviation of the standard deviation is approximately:

σsσ2(n1)\sigma_s \approx \frac{\sigma}{\sqrt{2(n-1)}}

This means:

Combining Different Types of Uncertainty

In practice, we often need to combine uncertainties that have different statistical meanings (e.g., some estimated, some statistical).

Making Uncertainties Compatible

If combining a statistically-based uncertainty (68% confidence) with an estimated range uncertainty (~100% confidence), we need to make them compatible:

Root Sum of Squares

For independent uncertainties of the same confidence level:

σtotal=σ12+σ22+σ32+\sigma_{total} = \sqrt{\sigma_1^2 + \sigma_2^2 + \sigma_3^2 + \ldots}

This assumes the uncertainties are:

Distribution Shapes and Assumptions

While we often assume Gaussian distributions, real measurements may deviate from this ideal.

When Gaussian Assumptions Fail

Checking Gaussian Assumptions

Simple tests for Gaussian behavior:

Practical Measurement Strategy

A well-planned measurement strategy can minimize uncertainties and improve data quality.

Before starting measurements:

  1. Estimate expected uncertainty based on instrument resolution and known fluctuations
  2. Determine required sample size based on target precision
  3. Choose measurement sequence to minimize systematic effects
  4. Plan for outlier detection and handling procedures

Glossary

Gaussian distribution
A bell-shaped probability distribution that describes many random phenomena in nature.
standard deviation
A measure of the spread of a distribution, indicating how much variation exists from the mean.
standard error
The standard deviation of the sampling distribution of a statistic, most commonly the standard error of the mean.
outlier
A data point that appears to deviate markedly from other observations in a dataset.
confidence interval
A range of values that is likely to contain the true value of a parameter with a specified level of confidence.
propagation of uncertainty
The process of determining how uncertainties in individual measurements affect the uncertainty in a calculated result.
Chauvenet’s criterion
A statistical rule for identifying potential outliers based on the probability of observing such deviations.
sample mean
The arithmetic average of a set of sample values, used to estimate the population mean.
population mean
The true average value of a quantity if we could measure it an infinite number of times.
central limit theorem
A statistical theorem stating that sample means approach a normal distribution as sample size increases, regardless of the original population distribution.

Problems