Believe Data

We were sailing back to our home port and a dense fog descended. Suddenly we couldn’t see more than a boat length ahead. My father, a mariner by profession, plotted a course and steered by it, sending my brother and me forward as lookouts.

My mother was convinced we were sailing in the wrong direction, that we’d steered off course (and this was before the reassurance of GPS). “No,” said my father “you must trust your instruments”.

We made it safely home; it was an early lesson in believing data.

The amount of data produced and collected every day continues to grow. “Big Data” is a well-known, although poorly understood term. In many companies we’ve moved on to “data-driven decisions”. But we’re not always good at believing the data.

I was in a meeting recently where the most senior person in the room looked at a graph of twitter follower growth and said “I just don’t believe this data”. The data showed that goals for follower numbers would not be met. Leaving aside the argument on whether follower numbers is a good goal, the data don’t lie. If there’s a straight line of progress that won’t reach the goal then you need to change something or accept missing the goal.

It made me think about when we believe data and when we should be sceptical.

We tend to measure progress against an expected path, and in a large organisation invariably report that progress upwards in the organisation. In our plans and projections that progress follows a nice upward curve. But the reality is different, every project encounters setbacks, and the graph is more jagged than smooth.

In fact a smooth graph, where targets are always met should raise questions.

Years ago I was chatting to a guy who left his previous company after about four months. He left because the targets for the quarter were increased by 25%, and everyone met them. As an experienced business person he knew that a situation where every business unit met the stretch goal in the first quarter it was applied was very very unlikely. His suspicions were raised and he left as quickly has he could. A year later the company collapsed under its own lies. The company? Enron.

In his articles (and books) Ben Goldacre campaigns for greater journalistic care in reporting data, and better education on scientific method. He points to the dangerous habit of pharmaceutical companies in cherry-picking their data, choosing studies that support their product and ignoring those that don’t.

I said earlier that we should trust the data, but we also need to know how the data was collected, what errors might be inherent in the data collection methodology, and what limits there might be to interpreting the data. This should be part of everyone’s mental toolkit. It would help us evaluate all those advertising claims, refute 90% of the nonsense on the internet, be honest about progress to goals, and finally make data-driven decisions.

 

Image; Data via pixabay

 

 

Big Data

Big Data is often touted as a solution to all our problems, a panacea for all ills often by people who struggle to define it. So what is big data and what kind of problems has it solved?

Big data refers to sets of data so big and complex that they cannot be analysed by traditional methods and tools, but which release new value when analysis is achieved.

Google translate is an example of a problem solved by the use of big data. Although the translations are imperfect they are often good enough to have an understanding of what the writer intended whatever language it was written in. Google does this by statistically analysing millions of documents online that exist in multiple languages and figuring out what is most likely to be a correct translation. The more documents available that have been accurately translated by humans the more accurate the Google translation will be.

Big data analysis has been used in predicting maintenance needs for UPS, New York city council and various car manufacturers. It’s been used in healthcare to predict the onset of infections in newborns, and outbreaks of flu.

So it sounds like it could solve some tough business problems, and it can. But it has limits.

  • messiness of data means tricky to anaylse and interpret – google translate occasionally gets the translation between Dutch and English completely wrong, and this is a language pair that must have millions of documents, you need good analytical expertise and data governance to get the valuable insights out of the data.
  • hidden biases in data collection, for example if you’re relying on smart phone data  you are probably selecting against the lowest income earners.
  • identifies correlation, but that explain causality and doesn’t necessarily tell you what to do.
  • privacy concerns; relating to the collection, use and reuse of data. People may not realise that if enough anonymised data is combined it is possible to identify an individual.

And sometimes all that extra data may induce a sort of paralysis by analysis, a belief that you could make the perfect decision with just a little more data.

Right now we’re only beginning to unlock the value of big sets of data, and it’s still very much in the hands of the experts. It’s going to take some re-learning for managers/business leaders to ask questions that big data can answer, and to understand that correlation does not imply causation.

image: geralt via pixabay