Nowadays, data is considered the most valuable commodity in the world, taking over the crown from oil in the marketplace. As a result, companies invest immensely in technology and data science compared to other business sectors.
Then, how to handle such data becomes a challenge. Not only for data scientists for how to analyse it in an objective approach but also for customers about their privacy. In other words, data advancement is a double-edged sword. It can help businesses to optimise their services and target more accurately on problem-solving. But, still, it can also be a weapon for manipulation toward both customers and the company itself.
One of the biggest pitfalls for a data scientist is survivorship bias. When I was working in the Fast-Moving Consumer Goods (FMCG) industry, I find brand managers or category managers are selective on the data they used to present the brand performance. Of course, it was wrong, but they thought they were doing the right thing at that moment. This is how bias works. It attacks people without them realising it.
The most representative example is a study conducted on aircraft during WWII. The United States armed forces faced a dilemma during the war because returning bomber planes were riddled with bullet holes and they needed better ways to protect them.
The army knew they needed armour to protect their planes, but the question was, “Where should they put it?” When they plotted out the damage these planes were incurring, it was spread out, but largely concentrated around the tail, body and wings.
But Abraham Wald, a statistician at the Statistical Research Group (SRG), made a glaring observation—the military would make a terrible mistake by upgrading the armour along these sections of the plane. Why? Because the military was only looking at the damage on returned planes. They hadn’t factored in damage on planes that didn’t return.
Planes that didn’t return were the ones that sustained damage in ways not seen on returned planes— their engines. Unlike the body, tail, and wings, the engine was extremely vulnerable. Once hit there, planes went down, and they didn’t make it back home to have their damage charted out.
So, how relevant is survivorship bias to our business? The most straightforward example is entrepreneurship. Many young people with good ideas look up to Amazon, Facebook and Tesla. They believe they can achieve if they put their ideas into a real business. However, 90% of new businesses fail in 10 years. All the success stories we have been told are rare cases. For this post, let’s have a look at Buffet’s illustration of survivorship bias. He begins the paper with a wonderful short story:
Let’s assume we get 225 million Americans up tomorrow morning and we ask them all to wager a dollar. They go out in the morning at sunrise, and they all call the flip of a coin. If they call correctly, they win a dollar from those who called wrong. Each day the losers drop out, and on the subsequent day the stakes build as all previous winnings are put on the line. After ten flips on ten mornings, there will be approximately 220,000 people in the United States who have correctly called ten flips in a row. They each will have won a little over $1,000.
Now this group will probably start getting a little puffed up about this, human nature being what it is. They may try to be modest, but at cocktail parties they will occasionally admit to attractive members of the opposite sex what their technique is, and what marvellous insights they bring to the field of flipping.
Assuming that the winners are getting the appropriate rewards from the losers, in another ten days we will have 215 people who have successfully called their coin flips 20 times in a row and who, by this exercise, each have turned one dollar into a little over $1 million. $225 million would have been lost, $225 million would have been won.
By then, this group will really lose their heads. They will probably write books on “How I turned a Dollar into a Million in Twenty Days Working Thirty Seconds a Morning.” Worse yet, they’ll probably start jetting around the country attending seminars on efficient coin-flipping and tackling skeptical professors with, “If it can’t be done, why are there 215 of us?”
By then some business school professor will probably be rude enough to bring up the fact that if 225 million orangutans had engaged in a similar exercise, the results would be much the same — 215 egotistical orangutans with 20 straight winning flips.
Survivorship bias frequently happens in data analysis as well. For example, companies value customer feedback, which is very rightful to do as success is tied to customer relationships. When businesses get negative feedback, they’re eager to dig in and figure out what went wrong. Studies have shown that the most vocal customers are the ones who’ll express their feelings. Everyone else will either give companies another chance or just leave.
Consider this, of the 90% of customers who don’t say anything, 78% of them leave. The 10% who complain are 90% more likely to stay. Therefore, instead of focusing on your unhappy customers, look at the behaviours of your happiest customers as well.
That being said, how to avoid being duped by survivorship bias? Firstly, before making any judgment, my rule of thumb is to consider what you don't see. What is the limitation of the data representing? What is not involved in the data and the analysis? You probably will not find out the answer right away, but raising the awareness of knowing something we don't know is always a good start.
Secondly, we have to embrace openness and transparency, accept the fact both good things and bad can happen in the business. So ask your team members for their opinions and perspectives, including those who won't agree with you most of the time.
Thirdly, having a team from diverse backgrounds can sparkle more valuable insights than those more traditional. However, it is noteworthy that without openness and transparency, companies can't afford diversity. They will only get conflicts.
Last but not least, data scientists have to communicate with relevant stakeholders continuously. Data is more than numbers, and they are numbers with meanings, which were indicated by different activities and events. Thus, a good business analyst won't sit in front of the computer all day but always asks the right questions to the right people to show the right picture of the data.