The era of big data with machine learning, artificial intelligence and algorithms is here. The future is said to be decisions made according to insights from data with its objectivity being superior to human subjectivity.
Sport has not been immune to this spreading ideology. Making decisions on data, from player recruitment to evaluating individual performance, has gained widespread acceptance. This was helped by Moneyball, the book and film starring Brad Pitt, that told the story of how a baseball team used statistical insights to achieve success. Data scientists are now working in top teams.
The truth is, in all the excitement of the potential data promises, its capabilities have been overplayed. It can certainly help improve decisions, but it is not capable of completely eliminating subjectivity nor is human judgement obsolete if we want to make decisions that are effective as possible.
The world is hugely complex, and this is equally true of sport. Even in those with minimal rules and limited choices, like chess, have more possible game iterations than atoms in the observable universe!
Making decisions is about predicting future outcomes, but the environment comprises of so many variables and possible situations. How can we account for such a vastly complicated world that is impossible to fully comprehend?
The truth is we can’t. So, we simplify. We convert the reality of a world of complex systems with interconnections of numerous elements and feedback loops into one based on basic linear cause-and-effect relationships; i.e. changing this here will consistently produce that effect there.
But this comes with a cost. In reducing the complexity we risk losing the nuance and arriving at conclusions that might not provide a true understanding of reality.
In the filtering process we try our best to consider only the relevant data (signal) that impacts the outcome of our decisions and dismiss the irrelevant (noise). However, it is unavoidable that we unintentionally keep irrelevant data and omit some of the relevant.
With the information captured we try to identify patterns to inform us of likely cause-and-effects chains that can be used to guide us in our future decisions and actions. To give a straightforward example, maybe we find a link between having more ball possession and an increased chance of winning.
This belief becomes a useful rule of thumb. We apply these shortcuts to guide us as we go through the world. They aim to reveal underlying truths while preventing us from becoming overwhelmed by having to rigorously assess the best option every time we encounter a new situation.
But they come with the risk that during the process of arriving at these conclusions either the quality of the data inputted wasn’t good enough or the cause-and-effect relationship observed was misjudged and our assumption is false. This type of error is known as a bias.
For the ball possession example, maybe information about the opponent’s tactics were not considered in the model and so having more possession against certain styles of play might have no influence or even make us worse. Or possibly the link between more possession and winning wasn’t causal, one doesn’t directly lead to the other. So, players were encouraged to increase possession, but they stopped making forward passes and the team’s performance deteriorated.
A good example of how we simplify but lose important context is highlighted in the NFL Draft. They assess players through extensive testing that incorporate speed, strength, power, agility, various technical skills, psychometric ability, physical measurements, injury evaluations and cognitive capacity.
Yet it is still consistently very unreliable at identifying who will be successful. In fact, possibly the greatest quarter-back in NFL history was the 199th pick and that only included the players entering the league in one single year (which brings into question those who claim they can identify children that will become elite players but that’s a whole other topic).
The fact is a player’s true playing ability depends on those areas tested and many more, plus how they all interact with each other. This is the real world of complex systems with interconnectedness of numerous elements that can’t be accurately replicated when simplifying.
The general process of arriving to an assumed cause-and-effect relationship between two things is the same for computers as it is for humans. The difference is a computer uses a database, mathematics and computer processing and humans use memory and the brain’s processing power to subconsciously identify patterns.
Both are very good at giving us a generally accurate picture of how things work but both have biases and subjectivity that occasionally lead to errors. This is because, due to the complexity mentioned earlier, there is always a gap between simplified models of the world and reality.
It is not a case of objective data versus subjective human intuition as is often claimed.
So, if both have biases, which method should be preferred for making decisions? Neither. Instead, they should be used together because their differences can cancel each other out.
When people at a fair guess the number of sweets in a jar, the average of everyone’s guess is usually very close to the actual amount. Randomness means the false information that overestimates is cancelled out, more or less, by the false information that underestimates. What remains is the information based on true beliefs.
Further, each has different strengths and weaknesses. There is a version of chess called freestyle where players are allowed to use computers. Amateur players with everyday computers won tournaments against the best supercomputers. While computers have the advantage of immense calculation power to explore data for patterns, humans can apply a strategic outlook whilst integrating broad concepts and principles.
There are many fields outside sport that have attempted to rely on exclusively data-based decision making. The initial optimism of better outcomes was misplaced, and it was realised it is essential to incorporate human judgement. Examples include court judges deciding whether suspected criminals should be awarded bail after assessing the chances they will reoffend or pathologists who try to identify whether human tissue is cancerous or not.
The push for decisions based only on data suggests this lesson is yet to be learned in sport. The simplification from the real world means the utopian ideal of perfectly accurate decisions are unattainable but using a hybrid method will get us as close as possible.
At a time when it is popular to adopt dogmatic positions on the extremes, in this case either having an old school and out of touch approach or a numbers geek that doesn’t understand the game, there needs to be some balance. Truth often lies in the less glamorous position, nearer the middle of the continuum.
In sport, just as in life, the approach we choose to take is determined by our beliefs about the relationships that exist in the world. If we take this action, we’ll get this result. In recent years the increased capabilities of computers and data collection has provided us with a new tool that can help us arrive at better conclusions and decisions. But it would be foolish to discount the value of human intuition, a tool that has evolved over millions of years for the same objective.