Sample variance

Autor:Elliot Malkin

Tema:Probabilidad, Variables Aleatorias, Varianza

Set up We have a random variable

with unknown mean

and variance

. We have taken a sample from the distribution:

. The sample has mean

. We want to estimate

, using the sample. Question Why do we use

as an estimate for

, instead of

? i.e. why divide by

instead of by

? Dividing by

is what we usually do when calculating variance... A suggestion... Let

and let

. Instead of thinking that

, think about it as

(an equivalent statement). Why? Play with the app below to get a feel for what it is doing. Click "sample" to take different samples. Then read on...

We want an estimate for the population variance:

. This relates to the green lines: how the

vary around

. Let

be our estimate for

. If we knew

, we could use

(the mean of the squares of the green lines). But we don't know

. We have

(the mean of the squares of the blue lines). This is something we can calculate: we have values for

and for each

. This is the sample variance and does give a measure of how the

vary around

. We also know that if the mean of our sample is treated as a random variable,

, then its variance is given by the expression:

(the "..." is left as an exercise). So if we're estimating

, then an estimate for

. This relates to the purple line: how

varies around

. Putting it together... The green arrows are equivalent to following the purple arrow and then the blue arrows: how the

vary around

depends on how

varies around

, and then how the

vary around

. So roughly,

. (... the maths checks out on this; again, an exercise, with a starting point given below*.) Solving this equation for

gives:

. i.e.

. *The detail To consider algebraically why

should satisfy

, start here:

aiming at:

. Recall that, as each

are sampled independently,

for

Sample variance

Nuevos recursos

Descubrir recursos

Descubre temas