5 Things YOu Need to Know About Sampling & Statistics

17 October 2022

Whether you like it or not, if you're doing research with customers or staff you are relying on the methods of sampling and statistics. These can be pretty scary topics, but we think there's no reason why everyone shouldn't be able to understand the basics.

In this post we're going to outline the top 5 things you need to know about sampling and statistics. Let us know if you want to hear more!


1. We use a sample to find out what a population (e.g. all customers) thinks

It's really important to be clear which population you're targeting, and think about anything that may bias your sample away from being a fair representation of it. Don't, for example, assume that everyone has a telephone number, an email address, or a Facebook account.

When it comes to analysis, remember that the people who take part in your survey are representative of a wider group of people in the population. Don't just fix their problems, fix the underlying cause for everyone.

2. The sample must be representative, so it accurately reflects the population

If your sample is unrepresentative (whether in terms of demographics or how people think) then you will draw incorrect conclusions. Bad research can be worse than no research at all.

In some cases you may want to deliberately weight your sample by oversampling from certain groups (for instance, if you want to make sure you get an accurate read of the views of a minority group). This is a sensible thing to do, but you will then need to weight your analysis to make sure your overall results remain representative.

3. Everyone in the population must have an equal chance of being sampled (i.e. it's a random sample)

On paper a simple random sample is the best way to make sure that your survey is fair, but the need to use a sample cost-effectively and account for different response rates often means that a stratified random sample is a better choice in practice.

But what if you can't draw a random sample? You can improve convenience samples by thinking about all the things which may affect who takes part. What times of day are you interviewing? What methods are you using? How accessible is your research?

4. The sample must be big enough to be robust

The sample size you need for a reliable result at an overall level is often surprisingly small - just a few hundred people will give you a pretty reliable overall Satisfaction Index. What drives the need for larger samples is wanting robust results at a lower level. You can work out exactly what sample sizes you need if you're able to specify the size of real difference you want to be able to detect (this is known as power analysis).

As an illustration, a sample size of 30 would give you a margin of error of around ±4 on your Satisfaction Index (the margin for an NPS would be much higher), a sample of 50 would give you a margin of error around ±3, and a sample of 100 would give you a margin of error around ±2.

5. A low response rate risks introducing bias if the people who take part are different from the people who don't

Imagine you asked a question in your survey along the lines of "Do you normally respond to surveys?" How reliable would the results be? This is an extreme example, but whenever a big proportion of the people you invite to take part in your survey don't respond there is a risk of non-response bias (this can affect the whole survey, or just individual questions).

The more likely it is that the decision to take part is related to the answers they would give, the bigger the problem. Are extremely satisfied or dissatisfied customers more likely to respond? What might that do to your score, or to the conclusions you draw about what to improve?

There are lots of tactics you can use to improve your response rate, but the most powerful of all is to show customers that you are listening and taking action off the back of the survey.


1. All sample estimates have a confidence interval or margin of error

As we saw above, when we run a survey we're using a sample to draw conclusions about the whole population - we create sample estimates of population parameters. Those estimates come with a margin of error, and the clever bit of statistical estimation is that we can work out how precise our estimates are.

We normally work at the 95% confidence level, which means that our margins of error will include the true population parameter 95% of the time.

2. Bigger samples mean smaller margins of error, but you get diminishing returns

As the chart above shows, the precision of our estimates is on a curve which halves every time we quadruple our sample size. That means that there's a lot of benefit in moving from 30 responses to 50, but much less benefit in moving from 100 to 120.

Plan your sampling scheme so that you maximise the robustness of scores that you need to report by targeting the sample to the places where it will bring most benefit.

3. The absolute size of your sample is what matters, not the proportion of the population it represents

There's a common misconception that a reliable sample is one that hits a certain threshold as a percentage of the population. That's not true—it's the absolute size of the sample that counts.

The only exception is if you have a very small population and nearly everyone responds, in which case we can use something called the Finite Population Correction to adjust the margins of error accordingly.

4. If the margins of error for two groups don't overlap, then we can be confident there's a real difference in the population

Confidence interval estimation is a simple and intuitive way to understand the degree of uncertainty we have about the true score for different groups, and the degree of confidence we should have that there's a real difference between them, or that there's been a real change from year to year.

5. When margins of error do overlap, we can't say whether or not there is a real difference

If the margins of error overlap, it's a lot less clear what's really going on. We can't say that there's no difference (after all, the scores could be as far apart as the furthest tips of the error bars), only that if there is a difference it was too small for us to detect it with the sample sizes we've got. A larger sample might have found a significant difference.

Needless to say, there's a lot more to all of these points than we've had time to go into here, and we've linked to some more resources below. These 10 things are all important for you to know and understand if you're working with research data.

Get in touch if you'd like any tips or pointers, if you have any questions, or if you think there's something we've missed!

Get in touch

Get in touch if you've got any questions about how to get the most out of your data.