In the world of data science and analytics these days, we are all faced with the key question of what technique or methodology solves which problem. During many of the interviews I have conducted over the last several years, I have heard all fancy algorithms being paraded around, without candidates really understanding why those are to be used. The most common ones I hear from candidates are: support vector machines without understanding what support vectors are, deep learning without understanding what neural networks are, random forests without understanding what really makes them random, and naive Bayes classifier without understanding why it is called ‘naive.’
The package driven languages of data science like R, Python, SAS, etc have made it extraordinarily easy for people to use all these complex algorithms without actually understanding the underlying statistics, mathematics, and optimization principles. This is called the famous “hammer and nail analogy.” When you have a hammer everything looks like a nail. It has become commonplace for people to use R or Python and to try all algorithms and simply pick the one with high accuracy measures, without really understanding what the business needs and the problem needs. Not all problems require deep learning. No really, they don’t! Some of the common challenges in the insurance industry for example, may simply need association rule mining or decision trees. Some may need more complex modeling and simulation for risk-based analyses.
One of the first exercises I used to give to my doctoral assistants in research or my team in industry was to code an entire algorithm without using any packages in R or Python. This gives candidates a deep understanding of the internal workings of algorithms. Data Science and Analytics are part art and part science. Use it wisely. Your goal is to solve a business challenge and drive business value, not to show off what technique is the latest and greatest trend on social media. Do not use a chainsaw where a scalpel will do.