Evaluating Experimental Results - Advanced Laboratory Methods in Physics

Experimental data alone tells only part of the story. The ability to properly evaluate results is what transforms raw measurements into meaningful scientific knowledge - a fundamental skill that distinguishes casual observations from rigorous scientific inquiry. This chapter focuses on why this evaluation process is crucial: it validates your measurements through uncertainty analysis, connects physical systems with theoretical models, and ultimately determines whether your findings represent substantiated scientific knowledge or merely unverified observations.

The Essential Final Analysis¶

Your primary objective in conducting an experiment is to make substantive statements about relationships between physical systems and theoretical models. This involves:

The evaluation process involves several key analytical steps. First, you must identify patterns and trends in your data that either align with or deviate from theoretical predictions. Next, it’s essential to quantify the strength of relationships between your measured variables to understand how they interact. You’ll need to determine whether any observed effects are statistically significant, that is, ensuring your findings aren’t simply due to random chance. Additionally, accounting for experimental uncertainties and understanding how they impact your conclusions is crucial for reliable results. Finally, you must critically assess whether your results support, refute, or suggest necessary modifications to existing theoretical models. This comprehensive analysis ensures your experimental findings contribute meaningful insights to scientific understanding.

Approaching Evaluation with the Right Mindset¶

Important

Before diving into specific evaluation techniques, two essential principles should guide your approach:

First, recognize that experimental results are precious resources. Whether they come from a multi-million dollar research program or a simple classroom exercise, your results represent unique, sometimes irreplaceable information. Honor this by extracting every possible insight from your observations and ensuring your final conclusions are as complete as possible.

Second, maintain unwavering objectivity. It’s nearly impossible to approach an experiment without some preconceptions about what “should” happen. However, you must discipline yourself to assess results objectively. If outcomes differ from your expectations or hopes, report them honestly and use them constructively to guide future investigations.

The Evaluation Process: Four Essential Stages¶

Let’s examine each stage in detail.

Stage 1: Calculating Elementary Quantities¶

Your approach depends on whether you’re working with estimated uncertainties or statistical treatment of random fluctuations.

Working with Estimated Uncertainties¶

Similarly, if you’ve counted oscillations and timed them with a stopwatch, you might express time measurements with their uncertainty ranges. However, the oscillation period $T$ (your actual variable of interest) must be calculated from these measurements. If you counted 15 oscillations that took $18.4 \pm 0.2$ seconds, the period for a single oscillation would be:

$(1/15)(18.4 \pm 0.2) = 1.227 \pm 0.013$ seconds

Notice that both the central value and uncertainty must be calculated through this division. This significant modification of uncertainty values is necessary whenever you perform arithmetic operations on basic measurements.

Your final result will be a set of $\ell$ and $T$ values with their associated uncertainties, preparing you for graphical analysis.

Working with Statistical Uncertainties¶

Consider how uncertainty regions will be interpreted on your graph. If both variables have similar statistical character, each point’s uncertainty rectangle will have clear interpretation. If variables have different uncertainty types (estimated versus statistical), interpretation becomes problematic. You might need to standardize them—perhaps using twice the standard deviation of the mean (95% probability) to make them comparable to estimated uncertainties.

At this stage, every experimental quantity should have a central value and uncertainty, but you’re not quite ready for graphing. If you need to plot derived variables (like T² vs. ℓ for a pendulum), you must calculate these through arithmetic operations. Remember to properly propagate uncertainties—if plotting T² values, uncertainty bars must represent the actual interval over which T² is uncertain.

Stage 2: Creating Effective Graphs¶

There are exceptions where preserving the origin is important—when examining behavior near zero or when illustrating variation relative to baseline values. Generally, however, maximize use of graph space.

If plotting multiple datasets on one graph, differentiate them clearly through different symbols, colors, or other distinguishing features.

Stage 3: Comparing Models with Experimental Data¶

Scenario 1: Fully Specified Model¶

How do you judge correspondence quality? This is where uncertainty intervals become crucial. Without them, the inevitable scatter in experimental points would make meaningful comparison impossible—what are the chances of a theoretical line passing exactly through multiple scattered points? When points represent possible value intervals rather than single values, logical assessment becomes possible.

Also recognize that agreement exists only at your current precision level. At higher precision, discrepancies might appear that weren’t detectable in your experiment.

Scenario 2: Partial Correspondence¶

Scenario 3: Unexpected Intercepts¶

Scenario 4: Unexpected Data Scatter¶

Don’t leave such discrepancies unaddressed. Check your apparatus to identify potential fluctuation sources—perhaps a loose electrical connection or unstirred heating bath. Resolving such issues is always satisfying. If continuing the experiment isn’t possible, work with your existing results and make the best assessment you can of correspondence between model and system, perhaps noting that observations distribute uniformly around the model line.

Scenario 5: Complete Non-correspondence¶

Such complete correspondence failure usually indicates experimental error—misinterpreting variables, incorrectly transforming equations, improper equipment setup, or mistakes in observation, calculation, or graphing. If possible, review everything from the beginning. If equipment access isn’t possible, check all analytical and arithmetic processes. If all error-finding attempts fail, report your results honestly and objectively. You may have discovered something novel, and an honest account of puzzling results from well-checked equipment will interest others in your field.

Stage 4: Determining Values from Straight-Line Analysis¶

In these cases, the model contains initially unknown quantities, so you cannot draw a complete model graph for comparison with experimental points. Your graph initially contains only the points themselves, as shown in Figure 6.2(a).

Consider measuring current through and potential difference across a resistor to test Ohm’s Law (V = IR). Without knowing resistance R, the model behavior encompasses all lines through the origin on the I-V plane described by:

V = constant × I

(2)

where the constant could be any positive value. In principle, you could draw all possible lines on your graph and determine: (1) the extent to which system and model behaviors overlap, and (2) the range of R values appropriate for your system (as illustrated in Figure 4.11).

In practice, this is complicated by the fact that, based on measurements shown in Figure 6.2(a), you cannot assume system behavior passes through the origin. It’s best to defer the intercept question and simply determine which straight lines are consistent with your observations.

Finding the “Best” Line and Uncertainty Range¶

Identify several significant lines: the “best” straight line by your judgment, plus the limiting lines representing how far you can reasonably rotate your “best” line before it no longer acceptably fits the data. These extremes provide uncertainty values for your slope.

If wide point scatter makes identifying best-fit and limiting lines difficult, remember that your measured points represent samples from a continuous distribution band. The sparse population of this band (due to limited observations) can complicate line selection. Visualize the band populated by millions of potential readings your apparatus might produce, then estimate the center and edges of that distribution, allowing you to select appropriate lines.

In Figure 6.2(b), you might choose AB as your “best” line and determine that lines CD and EF would contain almost all possible points from an infinite measurement set. Lines CF and ED (not shown) would represent the steepest and shallowest slopes consistent with your observations.

Once you’ve selected appropriate lines, determine their slopes numerically to calculate your desired parameter (like resistance R in our Ohm’s Law example). For slope calculation, angle is irrelevant—you need the quantitative relationship between measured variables. For a line like AB in Figure 6.3, identify precise coordinates where it crosses graph grid intersections near its endpoints. If these coordinates are (I₁, V₁) and (I₂, V₂), calculate:

slope = (V_2 - V_1)/(I_2 - I_1)

(3)

For our example, $R$ equals this slope directly. In more complex cases, you might need additional calculations involving other measured quantities to determine your final answer.

Perform this process three times: once for your “best” line (AB) and once each for your upper and lower limiting lines (CF and ED). This gives your best value for R plus upper and lower limits beyond which you’re “almost certain” the true value doesn’t lie. Typically, these extreme values are roughly equidistant from your central value, allowing you to express your result as:

R = value \pm uncertainty

(4)

Sometimes your “best” line and limiting lines won’t appear equally spaced, usually because too few points prevent good line assessment. While sometimes experimenters feel compelled to express asymmetric uncertainties as:

value (+ uncertainty_1 / - uncertainty_2)

(5)

visual graph judgment rarely justifies such precision. If identifying a clear “best” line proves genuinely difficult, you can simply delineate the edges of the value band (lines ED and CF in Figure 6.3), calculate maximum and minimum slopes, and express your experimental result as the interval between these slopes, or as their average ± half their difference.

If your desired answer isn’t directly equal to the slope, but requires calculation using additional quantities with their own uncertainties, combine the slope uncertainty with these other uncertainties using methods described in Chapter 2.

The significance of uncertainty values obtained from graphs depends on how you marked uncertainty on your original data points. If your bars represented outer limits of possible variation (either subjectively assessed or 2Sₘ for statistical fluctuations), your slope limits have similar interpretation. If points were marked with 1Sₘ limits, the limiting slopes probably represent better than 68% probability because of the conservative approach used in drawing limiting lines.

This analysis assumes that actual data scatter falls within predicted uncertainty ranges. If scatter greatly exceeds expected uncertainty (due to unforeseen fluctuation sources), you may have difficulty establishing lines that contain “almost all” possible values with confidence. In such cases and for all precision work, least squares analysis (discussed later) becomes essential.

When selecting your three lines, deliberately exclude the origin from consideration, as system behavior at the origin may be one aspect you wish to examine. If your model should pass through the origin, check whether the area between your limiting lines includes the origin. If so, your model and system show consistency at your precision level. Only if both limiting lines clearly intersect an axis on the same side of origin can you confidently identify an unexpected intercept.

If your model predicts an intercept from which you hope to determine some quantity, the intersection of your three lines with the relevant axis directly provides that intercept as: value ± uncertainty.

Handling Imperfect Model-System Correspondence¶

Obviously, restrict your slope evaluations to regions where system and model are compatible. Points systematically deviating from the straight line reflect physical circumstances not included in your model, making them inappropriate for model-based calculations. Disregard all points deviating systematically from straight-line behavior by amounts clearly exceeding estimated uncertainties and observed scatter, limiting your slope and uncertainty calculations to the linear region.

Therefore, design experiments so answers come from graph slopes, while quantities potentially subject to undetermined systematic errors appear as intercepts. This capability to provide answers free from many systematic error types represents one of graphical analysis’s principal advantages.

The Principle of Least Squares¶

The method meeting these needs is based on the statistical principle of least squares. We’ll focus primarily on its application to straight-line fitting, though it can be extended to other functions.


:::{note}
Consider a set of N (x,y) measurement pairs where uncertainty is confined to the y-dimension—we'll assume x values are exactly known or sufficiently more precise than y values that x-dimension uncertainty can be neglected. This assumption is reasonable for many experimental situations where one variable is significantly more precise than the other. If both variables have comparable uncertainty, more complex treatments are needed (see Wilson's text in the Bibliography).

Our mathematical procedure must answer: Which line on the x-y plane is “best,” and what does “best” mean? Least squares makes this determination based on vertical deviations of points from a candidate line. For line AB in Figure 6.4, consider vertical intervals between points and line (like P₁Q₁ and P₂Q₂). The “best” line minimizes the sum of squares of these deviations.

This criterion offers no automatic path to “truth” or “correct” answers—it’s simply one optimization criterion among many possibilities (we could minimize third powers or first powers of intervals, etc.). However, it can be proven that minimizing squared deviations produces smaller variance in resulting parameters (like slope) upon repeated sampling than any alternative criterion. This provides greater confidence in least squares results than competing approaches, explaining its near-universal adoption.

Mathematically, we define the best line as that which minimizes:

\sum_i \left( P_i Q_i \right)^2

(6)

giving parameters (slope m and intercept b) for that line.

If our line equation is $y = mx + b$ , each deviation $\delta y_i$ equals the difference between measured y value and the corresponding point on the line:

\delta y_i = y_i - (m x_i + b)

(7)

The least squares criterion seeks to minimize:

\sum_i \left[y_i - (mx_i + b) \right]^2 = \chi

(8)

with conditions:

$\frac{\partial M}{\partial m} = 0$ and $\frac{\partial M}{\partial b} = 0$

Solving these equations (full derivation in Appendix A2) yields formulas for the best-fit line parameters:

m = \frac{N\sum(x_iy_i) - \sum x_i\sum y_i}{N\sum x_i^2 - (\sum x_i)^2}

(9)

b = \frac{\sum x_i^2 \sum y_i - \sum x_i\sum (x_i y_i)}{N\sum x_i^2 - (\sum x_i)^2}

(10)

We’ve now replaced potentially questionable visual judgment with a mathematical procedure yielding results of well-defined significance and universal acceptability. Since this method has statistical foundations, we can expect more precise uncertainty calculations. The least squares principle immediately provides standard deviations for slope and intercept, giving uncertainties with known statistical significance.

These standard deviations are calculated using the standard deviation of y-value deviations from the best line, Sy:

S_y = \sqrt{\frac{\sum(\delta y_i)^2}{N-2}}

(11)

Don’t worry about the N-2 denominator rather than the familiar N or N-1; it results from applying standard deviation definition to line positioning on a plane. The standard deviations for slope and intercept are:

S_m = S_y \sqrt{\frac{N}{N\sum x_i^2 - (\sum x_i)^2}}

(12)

S_b = S_y \sqrt{\frac{\sum x_i^2}{N\sum x_i^2 - (\sum x_i)^2}}

(13)

Full derivations appear in Appendix A2.

These standard deviations, combined with m and b values, determine intervals with normal statistical interpretation—one standard deviation gives 68% probability of containing the true value, two standard deviations 95%, etc. A key least squares advantage is providing statistically significant uncertainty values for slope and intercept. These values derive objectively from actual point scatter, independent of any optimistic claims about measurement precision.

Appendix A2 also describes an extension for unequally precise data points, allowing greater weight for more precise measurements. This “weighting” procedure applies whenever we combine observations of unequal precision, even for simple tasks like finding the mean of unequal-precision values. Weighted mean and weighted least-squares calculation formulas appear in Appendix A2.

Least-Squares Fitting for Nonlinear Functions¶

Frequently, however, these equations resist straightforward solution. In such cases, we abandon analytical approaches in favor of iterative computer solutions. We construct trial functions, calculate squared-difference sums, and progressively vary function parameters until finding the minimum sum. Computer-based methods for this process are described in Draper and Smith’s text (Bibliography). When possible, testing models in linear form remains simpler.

In all cases, experimenters are responsible for choosing appropriate functions—least squares merely determines which parameter values within a chosen function class best fit the observations.

Important Cautions When Using Least Squares¶

Only after carefully considering the entire situation graphically and visually, and confirming linear fitting’s appropriateness over all or part of the observation range, are you justified in applying least squares. Ignoring this warning can cause serious experiment interpretation errors.

Finding Functions When No Model Exists¶

One option is finding functions with some correspondence to your observations. This can be valuable in complex systems where theoretical modeling seems hopeless. Even if your “model” is merely a mathematical function restating the system’s behavior, it facilitates computer processing and enables interpolation, extrapolation, and similar operations. Such empirical models help predict national economic responses to taxation changes or determine temperatures from resistance thermometer calibration curves.

With appropriate caution regarding potential limitations, here are some common function-finding approaches:

Power Law Functions¶

Such graphs can use ordinary paper (plotting actual log x and log y values) or logarithmic graph paper (with rulings proportional to logarithms, allowing direct plotting of original values).

Exponential Functions¶

Polynomial Approximations¶

Finding appropriate coefficients for such expansions typically employs the least squares principle. As noted earlier, computational difficulty increases rapidly with the number of terms needed for satisfactory correspondence. Fuller discussion appears in Draper and Smith’s text (Bibliography).

Similar approaches apply when observation scatter isn’t severe and highest precision isn’t essential. Finite difference calculus techniques can be applied to observations, and difference tables used for interpolation, extrapolation, or polynomial fitting. Comprehensive discussion appears in texts by Whittaker and Robinson and by Hornbeck (Bibliography), with elementary treatment in Appendix A3.

Assessing Overall Experimental Precision¶

Known systematic error contributions should be excluded at this stage, as appropriate measurement corrections should already have been applied. However, suspected systematic error sources whose contributions cannot be accurately evaluated should be described with appropriate allowances in overall uncertainty ranges. Final statement format depends on circumstances:

For Results Based on Measurement Sets¶

For Results from Single Calculations¶

If graphical analysis wasn’t possible and results come algebraically from several measured quantities, use Chapter 3 methods to calculate either outer uncertainty limits or standard deviations.

For Results from Graphical Analysis¶

If you’ve drawn your line by eye, the limiting possibility lines will give slope and intercept ranges. This slope uncertainty may need combining with other quantity uncertainties before stating final answer uncertainty.

Once you’ve determined overall answer uncertainty, consider how many significant figures to retain. This was covered in Section 2.11, but bears repeating in the context of experiment evaluation.

When reporting percentage precision, significant figures are automatically implied. A measurement reported as 527.64182 ± 1% implies absolute uncertainty of 5.2764. However, since precision is quoted to just one significant figure (1%, not 1.000%), the uncertainty itself warrants only one significant figure. Calling it 5 implies the tens digit in the original number is uncertain by 5, making subsequent digits meaningless. The measurement should be quoted as 528 ± 5 or 528 ± 1%.

For sample means, significant figures depend on the mean’s standard deviation, which in turn depends on the standard deviation’s standard deviation.

Finally, always ensure answer and uncertainty expressions are consistent—neither “16.2485 ± 0.5” nor “4.3 ± 0.0002” represents good practice.

Understanding Correlation¶

Many scientific fields deal with subtle phenomena where effects can be partially or completely masked by statistical fluctuations or other perturbations. In these scenarios, detailed model-system comparisons may be impossible—you might struggle even to establish whether the effect you’re studying exists at all. This scenario commonly occurs in biological, medical, and environmental studies.

Consider familiar public health debates about smoking’s role in lung cancer, low-level radiation’s relationship to leukemia, or dietary influences on cardiovascular disease. In these contexts, “proof” frequently enters discussion: “We haven’t proved smoking causes lung cancer” or “Can we prove heart attacks are less frequent with margarine versus butter consumption?”

These scenarios operate in fundamentally different domains from our earlier experimental approaches. Understanding what we mean by terms like “proof” and “cause” becomes critical.

Philosophers have warned for centuries that simultaneous events aren’t necessarily causally related. However, accumulated experience with this experiment, involving multiple repetitions and careful control of other variables, gradually convinces us potential difference and current are genuinely related. Only philosophical purists would dispute that potential difference causes current flow.

The situation differs dramatically in less clear-cut cases. Another experiment might yield results like Figure 6.6(b), typical when studying, for instance, university student cold incidence versus daily vitamin C consumption. Can we conclude cold frequency depends on vitamin C dosage? We might conduct a well-designed experiment with 100 students receiving vitamin

Glossary¶

least squares method: A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the residuals.
residual: The difference between an observed value and the predicted value from a model.
graphical analysis: The use of graphs to identify patterns, trends, and relationships in data.
linearization: The process of transforming a non-linear relationship into a linear form for easier analysis.
correlation coefficient: A numerical measure of the strength and direction of the linear relationship between two variables.
weighted least squares: A variant of the least squares method that gives different weights to different data points based on their precision.
goodness of fit: A measure of how well a mathematical model fits a set of observations.
degrees of freedom: The number of values in the final calculation that are free to vary.

Chapters

Designing Experiments: Principles and Methods

Chapters

Writing Scientific Reports