By Stephen Hampshire, Client Manager, TLF Research
If you’re anything like me, you’ll have come to be suspicious of technologies that are long on marketing and short on case studies. Text analytics, unfortunately, is a typical example. I could list you a whole raft of suppliers who offer it, either as a sole software offering or as an adjunct to their research or consultancy work. It’s much harder to give you a list of organisations who are getting lots of value from using text analytics with customers, though many are trying.
That doesn’t mean that text analytics has nothing to offer. Far from it. I’m extremely excited about the potential of these techniques. But it’s important to use them in the right place and in the right way, rather than simply throwing them into the mix because they sound cutting-edge. Getting value from text analytics requires you to be clear about what you’re trying to achieve, and prepared for the investment of time and resources needed for the best results.
In this article I’m going to map out the landscape of text analytics, explain what it can and can’t do, and discuss some of the particular challenges of using it with customer survey data. I’ll also show a couple of examples that may whet your appetite.
What is text analytics?
Text analytics, or text mining, is about using words as raw data for the types of analysis that have traditionally been used with numbers. To do that we need to teach our computers a limited understanding of ‘natural language’? text. In the past researchers often approached this from a relatively technical, linguistic standpoint.
Does grammar matter?
For example, computers can be taught grammar. They can figure out, for example, the subject and object of a sentence. You can see this in action online using the Reed-Kellogg Diagrammer tool - try a sentence of your own and see how well the computer does.
In practice, this approach might be used to set up a programme to scan financial news feeds for acquisition data. From a chunk of text such as this:
Our programme could figure out that the prospective buyer is Whirlpool and the prospective acquisition is Aga Rangemaster. It could also pick out the likely price. Scraping articles from across the web, we could construct a database based on articles like this.
‘Bag of words’
Most applications of text analytics use a simpler ‘bag of words’? approach. This means that we’re making no attempt to understand the grammar of a sentence or the way words in a document relate to each other. It turns out that, for many practical applications, you can do very well without going any further than looking at the words that appear in a document.
Let’s say we were trying to classify newspaper articles by subject. One includes the words ‘football’?, ‘Chelsea’?, ‘midfielder’?, ‘referee’ and another includes words like ‘EBITDA’?, ‘FTSE’, ‘shareholders’?. Pretty easy to categorise them, isn’t it? With enough examples to learn from, we can successfully categorise much more difficult cases. Understanding grammar is not always necessary, we can often do a good job just by looking at which words appear. Let’s look at an example of how text analytics can help us.
Spam, spam, spam, spam
The text analytics tool that most people use most often, without realising it, is their spam filter. These use a mixture of indicators (or ‘features’?) to figure out which messages are spam and which are real. Those features might include things about the provenance of the message like whether the sender matches the from address, but also specific words that are highly correlated with spam (like ‘viagra’?).
This is a really good example of what we should try to emulate:
- A very clearly-defined task
- A high success rate (tuneable to balance false-positives and false-negatives)
- A combination of types of features
- Learning from mistakes to improve.
How might we go about doing something similar with our data?
Understanding what data we have
The first job, in most text mining tasks, is to process our raw data to make it better suited for later steps. Even if you never pursue any other text analytic paths, and there are good reasons you might not, as we’ll see in a moment, this sort of analysis is well worth doing. This work gives us easy access to quantitative information about our text data, something we’re often likely to forget to consider.
Size and shape, tidying up
Start with the basics. How many ‘documents’do we have and how many words? In the case of survey text, we’d normally treat each person’s response to a single question as a document. This means that our documents are much shorter than some used in other tasks, but there are good reasons to treat each open-ended question as a separate text analysis task.
Processing the text in those documents starts with some tasks to tidy them up. More or less universally:
- Remove punctuation
- Convert everything to lowercase
- Remove ‘stopword’s?, such as ‘the’?, ‘and’?, which don’t add meaning. Lists of stop words in various languages are readily available.
And many people would also:
- Expand contractions (e.g. ‘don’t’ to ‘do not’)
- Invert negatives (e.g. ‘not good’ to ‘bad’, although this is not always as simple as it first seems)
- Remove custom stopwords, known to be common and uninformative for that task (e.g. ‘account’ for bank customers)
- Stem words (so that ‘customer’ and ‘customers’ become ‘custom’)
After that we should have a set of data with the same number of documents, but far fewer words. The words that remain are more interesting, and much more likely to differ between types of customer.
The next step is to think about what words we have left, and the best way is usually to look at the ones that appear most often.
You may be thinking that this is just a long-winded way of getting to a word cloud, and you’d be right.
Word cloud software often does some of the processing we’ve just talked about behind the scenes. The difference is that we’ve made all these decisions ourselves, and this is just the start.
The frequency of occurrence of words is not just the basis for a pretty picture. It’s a quantitative measure that we can use to understand our data. We can compare subgroups based on how customers feel (e.g. ‘promoters’ vs ‘passives’ and ‘detractors’) or customer type (e.g. gender). And we can monitor trends and changes over time.
As well as looking at which words appear, it can be instructive to look at which words tend to appear together.
n-grams and co-occurrence
Like much of the jargon to do with text analytics, ‘en-gram’ sound frighteningly complex, but are actually a very simple idea. There are many examples of concepts which are made up of more than one word, but which actually refer to a single idea. Some examples: ‘account manager’, ‘newcastle upon tyne’, ‘customer service’. You may well have come across this phenomenon when making word clouds. When two or more words consistently appear together like this, you have an n-gram. Two words together is a bigram, three is a trigram.
These can be interesting to look at in the same way as the most frequent words, and often form a useful feature for any models we may go on to build. For example, bigram features such as ‘cheap watches’ are useful for spam filters. More generally, we can look to see if there are any patterns in the way words occur. Do some words tend to occur in the same document (e.g. ‘slow’ and ‘service’) across many comments with similar meaning?
‘Co-occurrence’ can be used, in a large enough data set, in very much the same way as you would look at the correlation between numerical variables. It gives us a quantitative basis for understanding how strongly two concepts are linked together in customers’ minds.
Looking at the patterns of correlation between words is interesting in its own right, but it also underpins the work we may go on to do looking for ways to categorise the topic of each comment.
What we’ve done so far is interesting, sometimes even insightful, and it’s useful to be able to think quantitatively about text data. But it isn’t giving you what I know you’re reading this article to get to. You want a computer that can do all the hard work of reading loads of comments and coding them up into themes. So... is that possible? Sort of. Let’s start by considering what is it that happens when you read through comments and code them?
1. You use your knowledge of English (or whatever language) to figure out what a comment is ‘about’, and then assign it a code or theme. To a large extent this is based on the words in the comment.
2. Usually there’s a bit of back-and-forth between comments and coding frame while you figure out to what level of detail each theme needs to be broken down, striking the right balance between usability and precision.
3. Often you combine the text of the comment with your knowledge about the business to make ‘intelligent’ inferences about meaning.
In those terms, a summary of current text analytics is:
1. It’s relatively easy to train a computer to categorise based on words.
2. The person refining the model will refine categories to reflect the same balance.
3. Step 3 is, as it stands, very difficult. This requires understanding of what causes issues, not just grouping words, and that requires more complexity. It’ll happen one day, but the tools that will help here (e.g. ‘Deep Learning’ networks) are cutting-edge and still being developed.
In basic terms, the best you can expect is for a computer to do roughly as well as a human who knows absolutely nothing about your business.
The question now becomes one of cost-benefit. If text analytics maxes out at ‘as good as a cheap human’, maybe it’s cheaper to pay a human than it is to train and refine software. Why do you think Amazon employs people in its warehouses? Because they bring a quintessential humanity to the picking and packing; or because it’s cheaper (for now) than automation? The decision criterion here is not quality, but quantity. If you have 200 comments to analyse then text analytics is a pointless waste of time and money. If you have 200 million it’s a no-brainer. What we can say for sure is that if you have fewer than 1,000 documents (i.e. 1,000 comments for an individual question probed in a survey, not for the whole survey) it will be quicker to read and code the comments than it will to teach a machine how to analyse them.
In most practical applications, setting up classification models still requires a surprising amount of human effort. Most models use ‘supervised learning’, which basically means that the computer needs a sample of human-coded documents from which to learn. In principle you can cheat the learning process by using off-the-shelf models that may suit your situation (e.g. one trained on comments from a customer satisfaction survey in your industry), but most organisations prefer to develop their own.
How far are we away from getting computers to do the whole process? Perhaps not all that far. It is possible to get the computer to look at the words that appear in documents and, based on patterns of which words tend to occur together, group documents together. ‘Topic modelling’ is the broad term for this, and there are a number of technical approaches that can be quite successful.
Unfortunately there are, as yet, few good examples when it comes to using topic modelling with survey open-ends. Part of the problem is that our ‘documents’ are much shorter than newspaper articles, which means there are relatively few terms for us to base our classification on. We’ll solve this problem, though. Either with topic modelling, or with even more sophisticated tools such as ‘deep learning’ neural networks. It’s only a matter of time. I think a lot of progress will have been made by 2020.
Figuring out what customers are talking about, and grouping comments into themes, is useful. No question about that, particularly if we can start to look at how these themes change over time or look at differences between types of customer. But what difference does it make?
Perhaps the most exciting thing made possible by thinking analytically about words is that we can start to incorporate them in predictive models. This means that we can combine the words people use with the scores they give us, and potentially other information such as behaviour, to build more complete predictive models. What can we predict? The starting point is often overall satisfaction or NPS, but there’s no reason to stop there. Why not look at actual defections? Or sales? The more insight we have into each individual’s emotional landscape, the more likely we are to be able to predict their behaviour.
Scores are useful for this, and (unlike some) I doubt we’ll be getting rid of them any time soon, but why not take advantage of the verbatim comments as well? Think about the difference between these two customers:
“6 out of 10 - Yeah, no problems. Did what it said on the tin.”?
“6 out of 10 - It was just okay. I’ll probably try someone else next time.”
Neither of these customers is particularly happy, but if we can make use of their words we’re much more likely to be able to see that the second is most likely to stray.
This has been a brief summary of a very deep field. We’re flirting with Artificial Intelligence, and embracing its more respectable cousin Machine Learning. There’s no question that as machine learning tools develop they will take over more and more tasks that were previously the preserve of humans. Understanding text is just one example, although it’s certainly a big one.
So what should you do about it? If I could choose three take-aways, they’d be these:
- Start thinking quantitatively about survey open-ends. There’s a lot of potential value in doing so, and some of it is relatively easy to get to.
- Don’t get excited by ‘text analytics’?, the sexy buzzword. Think hard about why you want to do it, and honestly about whether it’s the right tool for the job. Then, when you have a cogent answer, get excited about being able to solve a real problem that was impossible before.
- If your problem is interesting enough, then it’s worth taking the time to solve it properly. Be prepared for a considerable investment of time to refine your model.