Quandaries and Queries
 


Hi,
I'm a teacher trying to find a way to make some stats work from first principles. The topic is expectation algebra and it is for the top age level in high school.

By using a set of data I can show how the mean is the same as the expected value ie 2,2,3,4,4 is 15 / 5 = 3 also, 2x0.4+3x0.2+4x0.4=3. I can also show this to work for the variance in the same style but using Sum(x-mean)2/n and the Var(X) version of squaring x then multiplying by the probability ( all this for random independant samples).

Now comes the tricky bit.....when I try to show E(X+Y) = E(X) + E(Y) from setting up two data sets I get it to work only if I add each item from X to each item from Y. However, I can't get it to work for V(X+Y)=V(X) + V(Y) from two sets of data.

There must be something missing in my knowledge of how the sets are required to add together or my knowledge of expectation algebra. I am familiar with proofs but still want to show my students that the basic formulae work from groups of data whether done the long way or by use of formulae. Can you help please?

Thanks,
Reuben



Hi Reuben,

When you wrote the mean of 2,2,3,4,4 as 20.4 + 30.2 + 40.4 you are using the expression that

E(Y) = Sum(yP(Y = y)) where P(Y = y) is the probability that the random variable Y takes on the value y, and the sum extends over all possible y. In a similar fashion E(X + Y) = Sum(zP(X + Y = z)) where the sum extends over all possible values of z. Thus you are looking at all possible combinations of values of X and Y that add to z. This is why you needed to "add each item from X to each item from Y" when verifying that E(X + Y) = E(X) + E(Y).

The case of variance is more complicated, in fact the expression V(X + Y) = V(X) + V(Y) is not always valid. If the random variables X and Y are independent then V(X + Y) = V(X) + V(Y) but in general this expression is not true. Here is a situation that may help.

Suppose that you have a test that you give students and it is divided into two parts. Each part is graded out of 50% and then the two parts are added to give a final score. I want X to be the score on the first half and Y the score on the second half.

The first half of the test is very easy and almost all of the students score between 45% and 50%. The second half is very hard and most of the students score between 0% and 5%. In this situation V(X) and V(Y) are small so V(X) + V(Y) will be small. There are however a few students who do poorly on the first half and a few students who do well on the second half and this can result in V(X + Y) being much larger than V(X) + V(Y). V(X) measures the variation in the first half and V(Y) measures the variation in the second half. What is missing is a measure of the variation between the first half and the second half. There is such a measure, it is called the covariance of X and Y and written Cov(X,Y). The correct statement about V(X + Y) is

Var(X + Y) = V(X) + V(Y) +2 Cov(X,Y)

Cheers,
Penny


Go to Math Central