152L & 272 L home | 151L & 170L home
  • Statistics and Uncertainty
When measuring the same quantity repeatedly, we can use statistics to tell us how much uncertainty we have in the quantity.  It is also obvious what choice should be our best estimate of the quantity.

  • The best estimate of the quantity is the average or mean value.  The average of a series of measurements is the sum of the values, x, divided by the number of measurements taken, N.

The uncertainty in the value should obviously have something to do with how far a data point deviated from this average value.

First, let me define the word deviation.  The deviation is the difference between an individual measurement and the average of such measurements.



The common sense idea of error suggests that the amount of deviation from the average line should be the reported error.  Since each value has a different deviation, perhaps we should report the average of the deviations.

Example: Measuring a pen

Length (cm)
Deviation from average (cm)
18.1
-0.1
18.2
0.0
18.1
-0.1
18.4
+0.2
18.0
-0.2
average:18.2
average deviation : 0.0 (!!)

The average deviation turns out to be zero!  For random errors, the average deviation approaches zero as more measurements are done. This is a re-statement of what I said earlier -- that we are just as likely to overestimate a value as underestimate it.

So what should we do?  Well, the problem we had when we found the average deviation was that the positive values canceled out the negative values.  If we could make all the deviations positive, then we wouldn't get a zero average. There is an easy way to do this: square each deviation ! Then we can take the average and square root to get the uncertainty.
 
Length (cm)
Deviation from average (cm)
Square Deviation (cm^2)
18.1
-0.1
0.01
18.2
0.0
0.00
18.1
-0.1
0.01
18.4
+0.2
0.04
18.0
-0.2
0.04
average:18.2
average deviation : 0.0 
average square deviation: 0.02


root of average square deviation: 0.1 cm
(This is the standard deviation!)


The "root of the average square deviation" is a bit of a mouthful, so we have another name for this quantity: the standard deviation.
  • The standard deviation is the uncertainty in the value.
Unfortunately, this is not the definition which is in the lab book.  In the lab book, the standard deviation is defined to be the square root of the sum of the squared deviations divided by one less than the number of values. Both are valid definitions and both are used.  To avoid confusion, we will always use the latter (alt.) one in this lab.

The reason for the N - 1 is simple.  If you have only one measurement (N = 1), what should the statistical uncertainty be? If you used the first equation, you would get zero.  But that is a silly statement.  The uncertainty in a single measurement is not zero.

The standard deviation, then, is the uncertainty you would report on all of your data values. You should write in your report:

Length (cm)
18.1 +/- 0.1
18.2 +/- 0.1
18.1 +/- 0.1
18.4 +/- 0.1
18.0 +/- 0.1
average:18.2 +/- ???

But what about the uncertainty in the average? We'd like to report that too.

If you think about it a little, you can see that the uncertainty in our average value should be related to the number of data points we have.  If we only measured the value two times, we expect that our average value would not be as good as if we measured the value two hundred times.

To solve this problem we find the standard deviation of the mean (SDM). The standard deviation of the mean is the standard deviation of the values devided by the square root of the number of values.  (Why? See Chapter 4 in the 151/170 lab manual, our third experiment).

If we do this, we have found the uncertainty in the average.  This is our reported value:

18.16 +/- (0.1/sqrt(5)) cm = 18.16 +/-  0.04 cm



The best estimate is the average or mean value:


The definition of the standard deviation:


An alternate definition of the standard deviation.  This is the uncertainty in a single data value.
The standard deviation of the mean is the uncertainty in your reported (average) value:


You should report values as:


Report single data values as:


Report average values as: