Kornilios Ampatzis/ April 17, 2019/ Tutorials/ 0 comments

One of the most useful measures in statistics is that of variance. It is a measure that is used to quantify the amount of variation or dispersion of a set of data values. I’ll try to explain what this means with an example, but there are plenty of other online sources if you want to explore the math behind it.

Practice File:

Transcript

Let’s assume the following population of numbers. These numbers are the exam grades for a student.

The average of this grades is 6.5. And the variance is 2.3. Variance and the mean average are not expressed in the same unit of measurement. To have a comparable measure in the same unit we have to calculate the square root of the variance. This provides us with the standard deviation, which we discuss in another of our lessons.

When we have only a sample of the data from a population, then the variance we calculate is called sample variance and when the data from the whole population is available then it is called population variance.

Excel has 6 functions to calculate variance. 3 of them calculate the sample variance and 3 the population variance.

Although VAR and VARP functions can still be used in office 2016 for compatibility, they have been replaced by VAR.S and VAR.P respectively, and should be avoided, since they might not be supported in future versions of Excel.

VAR.S calculates the sample variance and VAR.P the population variance.

VARA and VARPA also calculate the sample and population variance respectively. Their difference to the previous functions is to the way they treat logical values and text when they are part of our population.

VAR.S and VAR.P ignore text and logical values, while VARA and VARPA count the text as zero and the logical values as 1 and zero for TRUE and False respectively.

Let’s see an example.

This example has mostly the same data with some minor differences. We have already calculated the mean average as 6.4.

Since this table consists of all of the grades for a certain student we assume this table is our whole population.

So in cell B10 we will calculate the Variance using the VAR.P and in cell B11 its alternative VARPA.

In B10 we type the name of the function and select the data set. The values of SICK and FALSE are ignored during the calculation.

In B11 as before we type the function and the cell range. The value FALSE is calculated now as 0 and the same goes with value SICK. This is the reason for the different result. Otherwise both the functions should produce the same result.

Let’s assume, for the sake of this example, that we only had the exam results for the months from August to December. That means we would have only a sample of the whole population.

So in cells B12 and B13 we will calculate the sample variance using the data for these months.

In B12 we type the name of the function same as before and the cell range, I3 to M6.

We do the same in B13 but using the VARA function instead.

Notice the difference between the 2 sets of functions. The ones that calculate the sample and the ones that calculate the population variance.

If this was a real life statistical problem, we should choose to use the functions that consider the absences of the student as zero-graded instead of ignoring them. So we should go with VARA and VARPA functions