Viewing a single comment thread. View all comments

ForgetTheWords t1_jedzfh1 wrote

Generally speaking, the more observations you make (e.g. survey responses), the easier it is to detect an effect. Probably what you heard is that, for the kind of effect sizes one usually sees in whatever context was being discussed, it takes ~30 responses to be reasonably sure (probably a 5% or less chance of being wrong) that the difference observed is caused by a true difference in the population and not mere chance (i.e. you just happened to get a sample where your hypothesis was true, even though it isn't true for the population).

The classic example is pulling coloured balls from a bag. How many balls do you have to pull to get a good idea of what percentage of the balls in the bag are what colour? It depends, of course, on how many balls there are and how the colours are distributed. You have to at least estimate those numbers before you decide what kind of test to do. If there are only ten balls, you could probably just do a census - i.e. look at every ball. If there are 500k balls, you'll only be able to observe a sample. But how big a sample do you need? If you expect the distribution to be ~evenly divided between two colors, you may be able to get away with only 30. If, however, you expect ~25 colours, or that some colours will show up only ~1% of the time, say, you'll need a lot more observations before you can be reasonably confident your sample resembles the population (every ball in the bag).

Bear in mind that most statistical tests assume the sample was drawn randomly. In practice, it is very hard if not impossible to randomly sample humans for a survey. So you generally will want to get more responses to make your statistical tests more powerful (more likely to distinguish a true effect) while keeping your significance level (likelihood that the effect observed is only by chance) reasonably low.

If you could get a truly random sample, you'd need fewer observations to have a good chance that your sample is representative. If it's only mostly random, there's a higher chance that any effect you observe is because of a bias in the sampling. Thus, you will probably want to be more strict in declaring that an observed effect is genuinely present in the population.

But by choosing to reject more findings that could have happened by chance, you make it harder to accept findings that are because of a genuine effect in the population. A real but small effect in the population is not easily distinguishable from a small effect in the sample caused by nonrandom sampling.

4