Bigdata will soon be just—data.
Bigdata will soon be just—data.
Here’s the video from our talk while I was a Senior Data Scientist at Impetus (collaborative work with Dr. Vijay Agneeswaran, currently Director of Big Data Analytics at Sapient), at the Spark Summit in San Francisco, 2014.[I am the guy in the White Shirt, starting at the 1:45 mark]
In the world of data science and analytics these days, we are all faced with the key question of what technique or methodology solves which problem. During many of the interviews I have conducted over the last several years, I have heard all fancy algorithms being paraded around, without candidates really understanding why those are to be used. The most common ones I hear from candidates are: support vector machines without understanding what support vectors are, deep learning without understanding what neural networks are, random forests without understanding what really makes them random, and naive Bayes classifier without understanding why it is called ‘naive.’
The package driven languages of data science like R, Python, SAS, etc have made it extraordinarily easy for people to use all these complex algorithms without actually understanding the underlying statistics, mathematics, and optimization principles. This is called the famous “hammer and nail analogy.” When you have a hammer everything looks like a nail. It has become commonplace for people to use R or Python and to try all algorithms and simply pick the one with high accuracy measures, without really understanding what the business needs and the problem needs. Not all problems require deep learning. No really, they don’t! Some of the common challenges in the insurance industry for example, may simply need association rule mining or decision trees. Some may need more complex modeling and simulation for risk-based analyses.
One of the first exercises I used to give to my doctoral assistants in research or my team in industry was to code an entire algorithm without using any packages in R or Python. This gives candidates a deep understanding of the internal workings of algorithms. Data Science and Analytics are part art and part science. Use it wisely. Your goal is to solve a business challenge and drive business value, not to show off what technique is the latest and greatest trend on social media. Do not use a chainsaw where a scalpel will do.
Image Source: http://on.thestar.com/2jcpuYI
Have you seen the movies: The Minority Report, The Terminator Series, The Net, and Live Free or Die Hard (Die Hard 4.0). This book is a print version of all those movies and numerous other flicks rolled into one. Sure, its written by the head honchos of Google, so I guess it deserves (or rather expects) to be read by people all over. But it could have just as well been written by someone anonymous down the street. The reason I say that is unless you have been living under a rock, it is common knowledge that there are no disconnected devices or disconnected individuals, at least in the developed world. And in the developing world, generations of people are leap frogging into the connected era with smart phones by the virtue of having entirely skipped the personal computer revolution. We are already in the era of smart money (bitcoins), smart homes, smart phones, smart cars, hyper-loop and Mars colonization in the horizon, Amazon echo, Apple Siri, and whatever else Schmidt and his team at Google are thinking up.
Don’t get me wrong, it is a decent book to understand the inherent dualities when it comes to everything around us going digital. Each chapter of the book examines how the many facets of our lives will be fundamentally transformed: ourselves, the people around us, institutions, and governments. Schmidt and Cohen also theorize on how the digitized world would influence terrorism and counter-terrorism efforts, how it can influence repressive regimes and the people who would rebel. There is also a dedicated chapter about how environmental and man-made catastrophes in the digitized world can unleash innovation to speed up the reconstruction efforts. A chapter that stood out was the one on the “Future of Revolution.” It discusses how ordinary citizens in the Arab Spring used technology to spread the message of freedom and brotherhood, and to coordinate peaceful protests despite technological and physical oppression by their respective regimes.
Each chapter in the book examines the pros and cons of the digital world. Each chapter has a “protagonist:” ourselves, or governments, good people, and bad people. By the end of Chapter 2 or 3, it gets rather repetitive and quite frankly, a little depressing. I am sure the the book was intended to be thought-provoking, as we step into the connected digital future, and it did its job! At the end of the book, I was wondering if I should relocate to a small village in a serene corner of the world, disconnect from the internet, grow my own food, and live out a simpler life with my family.
My rating is 3.0/5.0. I just had to finish it since I started it. Was not a compelling read.
Originally written in 2016, edited in Jan 2017.
I cannot turn a page in newspapers or browse for news on the internet without reading (and rolling my eyes) about the emergence, reemergence, take over, new era, new age, and deluge, of Big Data Analytics or how machine learning algorithms like deep learning or some other cognitive, neuro, learning are going to save the world. We all have heard some banalities being bandied about quite a bit…
My personal favorite rebuttal for all these is “Not everything that counts can be counted, and not everything that can be counted counts.” This quote was said to be hanging in Einstein’s office in Princeton (not sure if this is true or not, but the saying makes sense). With all due respect to data scientists and other analytics professionals (I am one of them) can we all please go easy on the hype and not make it all sound so cheesy. It’s like the dotcom bubble dejavu all over again. Every startup I hear about is using some fancy ‘new’ algorithm, every company is talking about how analytics will change the world. Sure, some will and should..for the better.
Let’s get a few things in order. Analytics and data science have been there for a decades. They were just known with different non-appealing names: statistics, optimization, computer science, algorithms, etc. Clearly, none of them sound as appealing as “Data Science” or “Analytics.” We should all be thankful that industry as well as academia woke up and took notice about “smart decision-making,” and I guess some amount of branding was necessary for it to be taken seriously. Duly noted.
Now, can we get back to doing good work and not sell snake oil. All of us end up sounding ridiculous, naïve, and quite frankly a little annoying. The field runs the risk of being turned into a sham by some used-car salesmen (no offense to them). Let me give you a personal anecdote. I approached a conference organizer (in India) about submitting a proposal to speak at a conference. He unabashedly sent me a brochure with a detailed price list of how I can buy slots to talk about my ideas. Never once did he talk about my proposal, what the idea was, or even what the model / algorithm / application was. All he cared about was $$$.
The brochure even said I can pay extra to talk more (buy an entire session that is). This is what knowledge in our world has come to. Who can sell the snake oil better…who can market things and make them sound better… who can come up with more cool sounding jargon…who can create entire fake conferences where people pay to talk and ideas go to die. Sure conferences cost money, but it has to have a rigorous review process, such as KDD or most of the IEEE conferences.
So how do we stop this madness? I have a few pointers that some of you may agree with. I have already spoken to a few serious data scientists and they share my views.
If we do not give any value to our own profession, trust me no one else will. We will all end up looking like used car salespeople.
The key take-home message that Altman delivers through this book is that we live in an interconnected world and we neglect this axiomatic truth at our economic peril. Altman does a nice job of explaining some very complex concepts (credit default swaps, currency futures, and trading among several other complex financial jargon) in terms that any educated layperson can understand. The author essentially picks a random day in 2005–June 15th– and discusses how business transactions around the world are all interlinked and how a ripple in East-Timor’s energy economy has an impact in Italy or India and vice-versa.
What is conspicuously missing is any stargazing about the then impending global economic depression. A chapter discussing the subprime mortgages would have been prescient. Of course, that wasn’t the intent of the book, and with all due credit to the author–hindsight is 20-20, but it did feel like a miss by the author.
image source: http://images.macmillan.com/
The most interesting chapter of the book is the one on credit markets and currencies. With simple yet illustrative examples the author paints a great picture of how truly the connected the world has become. If there is inflation in UK, people around the world choose to buy products from elsewhere. With reduced demand the value of the Pound falls further. It is a slippery slope from there on. Such phenomena were seen around the world in countries such as Turkey. The last chapter about how disruptive shocks could sometimes strengthen the economy is quite interesting as well.
As the author admits, it is a piecing together of some events around the globe on that random chosen day. I was not entirely riveted to the book, its a decent read nevertheless. My rating is 2.5/5
While this was not the first book I started to read about to understand the Bitcoin, I am glad I gravitated towards it. I started with “Mastering the Bitcoin” by Andreas Antonopolous, but it turned out to be a heavy hitter and with all the code in the book– I was lost (I will certainly get back to it soon). I browsed through Amazon and the web for other books on the bitcoin and was led to this listing which I had tweeted about in December 2016, and I picked up the book by Dominic Frisby (see <<). [image source: https://www.cryptocompare.com/coins/guides/the-best-bitcoin-blockchain-and-crypto-books-our-top-picks/]
Some people on Amazon complained about it being Libertarain rambling, but I beg to differ. Frisby’s book read less like non-fiction and more like a crime-scene thriller and I was hooked. I am glad I read the book and now I am on to other books on the Bitcoin. Frisby does a very nice job of introducing readers like me who were interested and had read about Bitcoins on blogs and articles but never read a complete book about the topic. It left me wanting to delve deeper into the world of Bitcoins (Satoshi, your paper is next on my reading list: https://bitcoin.org/bitcoin.pdf).
Frisby, I believe leverages his varied life experiences as comedian, sports commentator, among others to make this book a great read. It begins with explaining what Bitcoins are and how Bitcoins are made / mined. One of the most interesting phrases of the book appears toward the beginning where Frisby says “In ordinary life…no one can create money, we only earn it. Bitcoin is different and it is possible to make them yourself…” This is perhaps the most fascinating aspect of the Bitcoin, which has gotten millions around the world excited about its potential. Frisby then goes on to explain how Bitcoins are mined. He does spare us the intricacies of the code and algorithms involved, but does keep the reader engaged about Bitcoins and the entire ecosystem and all the “organisms” it has spawned.
There is a fair bit of detective feel to the entire book with Frisby (among others) trying to uncover who the real Satoshi is. He describes linguistic analyses in the book with respect to coding styles of the people closest to being considered to be potential Satoshis. There are no firm conclusions drawn though, only some hints. There are also his interesting interactions with Ethereum’s Charles Hoskinson who likens the Bitcoin revolution to the one that was unleashed by the Internet. Bitcoin at the end of the day is a decentralized monetary system, disconnected and independent from the boundaries (and to some extent control of governments), borders, and the bureaucracy. That should enthrall everyone isn’t it?
Read it I am sure you will thank me for the recommendation…