Quandaries and Queries
By using a set of data I can show how the mean is the same as the expected value ie 2,2,3,4,4 is 15 / 5 = 3 also, 2x0.4+3x0.2+4x0.4=3. I can also show this to work for the variance in the same style but using Sum(x-mean)2/n and the Var(X) version of squaring x then multiplying by the probability ( all this for random independant samples).
Now comes the tricky bit.....when I try to show E(X+Y) = E(X) + E(Y) from setting up two data sets I get it to work only if I add each item from X to each item from Y. However, I can't get it to work for V(X+Y)=V(X) + V(Y) from two sets of data.
There must be something missing in my knowledge of how the sets are required to add together or my knowledge of expectation algebra. I am familiar with proofs but still want to show my students that the basic formulae work from groups of data whether done the long way or by use of formulae. Can you help please?
When you wrote the mean of 2,2,3,4,4 as 20.4 + 30.2 + 40.4 you are using the expression that
The case of variance is more complicated, in fact the expression V(X + Y) = V(X) + V(Y) is not always valid. If the random variables X and Y are independent then V(X + Y) = V(X) + V(Y) but in general this expression is not true. Here is a situation that may help.
Suppose that you have a test that you give students and it is divided into two parts. Each part is graded out of 50% and then the two parts are added to give a final score. I want X to be the score on the first half and Y the score on the second half.
The first half of the test is very easy and almost all of the students score between 45% and 50%. The second half is very hard and most of the students score between 0% and 5%. In this situation V(X) and V(Y) are small so V(X) + V(Y) will be small. There are however a few students who do poorly on the first half and a few students who do well on the second half and this can result in V(X + Y) being much larger than V(X) + V(Y). V(X) measures the variation in the first half and V(Y) measures the variation in the second half. What is missing is a measure of the variation between the first half and the second half. There is such a measure, it is called the covariance of X and Y and written Cov(X,Y). The correct statement about V(X + Y) is