Our Thoughts

Finding the right statistical tool for you

29th April 2021

Statistical tools are incredibly powerful, but there’s such an array of options nowadays that it can be hard to know where to start. At TLF Research we use a range of techniques depending on the nature of the data and the needs of each client. This article gives a quick overview of what we see as best practice tools, and where we’d use them.

Read the article below or download a PDF copy.





We don’t advocate using statistical tests to conduct “fishing expeditions” for statistically significant differences, because that generates lots of false positives.

When you have a specific hypothesis to test we can use traditional tools such as t-tests, ANOVA, and Chi-square tests to test whether differences in the sample are likely to translate to real differences in the population.

Increasingly we find that confidence interval estimation is both a more informative and a more robust alternative to statistical testing, and it’s also often easier for people to understand




As standard we include in our reporting an impact score based on correlation. This is a robust technique which gives the upper bound on the strength of the relationship between each predictor and the outcome, and is the best tool in many cases.

We also offer more advanced techniques, which require a degree of judgement in choosing the best fit for the data at hand. Detailed below, these include multiple regression, logistic regression, latent class regression, relative importance analysis, and structural modelling.




We turn to regression when we want to understand the relationship between different variables (often linking survey data with other data about customer behaviour or from internal MI). This allows us to answer questions like “If we improve our satisfaction index by 5 points, what would happen to customer retention?”

We use multiple (linear) regression when the outcome variable is interval or continuous (e.g. spend) and logistic regression when it’s dichotomous (e.g. retention/defection).





Relative importance techniques sit alongside multiple regression, and allow us to measure how much of an outcome variable (such as NPS or overall satisfaction) is explained by each of the drivers. This is something that multiple regression cannot reliably do because of the problem of massive multicollinearity in satisfaction data. We normally use the lmg algorithm.




Another approach to dealing with the problem of multicollinearity is to use statistical techniques to combine variables which share a lot of information. It’s common, for example, for questionnaires to have many items about topics such as staff or product quality which are closely related.

By analysing patterns of correlation in the data, tools such as factor analysis and principal components analysis can absorb much of the multicollinearity and give a clearer picture of how customers think about the items on the questionnaire.

Principal components regression is a common approach to key driver analysis which helps to address the weaknesses of multiple regression, but which is less rigorous than the structural modelling techniques it resembles, and which is very sensitive to the way the components are constructed.




Techniques such as Structural Equation Modelling (SEM) and Partial Least Squares path modelling (PLS-PM) are the gold standard when it comes to testing causal relationships in your data. They need a lot of data, and can be very difficult to build, but do represent best practice.

These structural models work on the same principles as factor analysis or principal components analysis to extract shared information from questionnaire items, but they do so within a context of a theoretical causal model at the structural level (for example looking at how 4 or 5 factors relate to satisfaction and loyalty). They also provide a test of how well that theoretical model fits the data, which principal components regression does not.




Multiple regression, and techniques such as relative importance analysis which are based on it, finds the average strength of drivers across the entire sample of customers. In some cases this can fail to capture important differences, for example if some customers are very price driven where others are very quality driven.

Latent class regression segments customers according to the strength of the drivers, allowing us to identify where these important differences exist.




In the same way that factor analysis groups variables together, cluster analysis groups customers together based on how similar they are on a range of variables (which can be scores, demographic, or behavioural data). This can be used as a basis for customer segmentation, or simply to get a better understanding of the ways in which customers clump together.

A closely related technique, archetypal analysis, is sometimes useful as it searches for extremes rather than central representations of its clusters, which can more accurately reflect the way we think about clustering customers.

When faced with large quantities of customer data it can sometimes be difficult to know what do with it. Analysis techniques such as these are helpful tools when it comes to identifying trends, and understanding what is happening and why.

Get in touch if you'd like to find out more about any of these techniques. If you're struggling to make sense of your customer data and would like some advice, we're sure we can help. Whether you're interested in consultancy or a standalone data analysis project, we'd love to find out more about your data challenges.

Call us for a friendly chat on 01484 517 575 or complete our short online form and we'll get back to you. 

01484 517575
Taylor Hill Mill, Huddersfield HD4 6JA
Twitter LinkedIn