|
|
It is important to note that this discussion of variance is written by a statistician, rather than a meteorologist.
What is variance? Variance, as discussed here, is the difference between a single datum and an average.
For the purpose of this discussion, we will consider a very simple data set comprising the values 9, 4, 1, 5, 7, 4, 3, 8, 2, which we will assume to be ratings of the weather for a recent period of time. The sum of the data is 45. There are data for 9 days. The average (arithmetic mean) of the data is 45/9 = 5. The middle day of the period was our birthday - how did it compare?
The middle day scored 7 which compared with the average shows a variance of +2. So, our birthday was an above average day in terms of good weather.
But what happens if we take our birthday out of the data set and compare it again. If we look at the remaining eight days, the sum of the data is 38 and the average is now 38/8 = 4.75.
Our birthday scored 7 and it now has a variance +2.25 ... even more above average than before. So, by comparing our birthday with a period of time which excludes the day in question, we have enhanced the variance.
Why might we want to make such a comparison? We might want to compare current weather, for example, with the average weather for a decade representing the central peirod of our school days. By doing so, we could test our belief that it was much sunnier then, or that the snow was deeper, or both. This would be a perfectly valid, and interesting, comparison to make. But for comparing current weather with the average for normal purposes, we would really want to compare with a period which includes the present data, as in the simple example above.
It is common to compare current weather with a thirty-year average as this is considered a long enough period to smooth out the variations which inevitably occur year on year. But why do we choose an apparently arbitrary thirty-year average from the past to compare with? This almost inevitably has the effect of enhancing the variance. Why do we want to do that?
At the time of writing, the most recent complete month is May 2007. If I want to compare that with the thirty-year average, that average should be constructed from the May data for the thirty years up to and including May 2007, not a period selected for whatever purpose starting in the sixties or seventies of the last century. This would obviate an explanation for why we choose any particular period of thirty years over any other - we choose the last thirty. It also removes the possibility of exaggerating the variance, be that by accident or design.
|