Quandaries
and Queries |
|||

If I have a set of data points (14 to be exact)of unknown pedigree from a large population, what tests can I apply to see if they constitute a random sample from the large population? In a related question: if I knew a priori that the larger population fell along a general distribution (say lognormal) and if a lognormal line plot of this sub-data (14 points, unknown pedigree) fit rather snugly, could I safely assume that my 14 data points were randomly selected? Or would I be making a potentially damaging assumption? Thanks, |
|||

Hi Stu, Randomness of the sample is a question to ask at the stage of COLLECTING the data not after the data is collected. Can you tell us how you designed the experiment for collecting the data ? Andrei and Penny Stu wrote back
The procedure described seems to be sufficient to state that the sample is really random. We can safely produce some statistical inference, while the sample size 16 is small for a population of 4 million. The results derived maybe not be very precise and I would have serious concern about this problem, but not about randomness or bias. A small sample size can result in a strong inference but the result might be insignificant and inaccurate. Yes, if we did know something a priori about the general population of widgets (say that there should be a general lognormal distribution), then these 16 widgets should fit a lognormal probability plot. But I do not know about any results how we can back track. It may be an indication that the sample was really random, but I never heard that this PROVES randomness. Once again, from our point of view this is a question about how we design the experiment. After sample has been collected nothing more can be done. Andrei and Penny |