In this ESG Video Blog, I discuss the current and future state of the Hadoop market.
The Hadoop hysteria is heating up lately, so I thought I'd chime in on what's real and what isn't. First of all, Hadoop is not a panacea for everything in the world. It won't replace every database. It won't eliminate the need for every other analytics platform, and it's not going to create world peace. Second, it's not just a science project anymore. It's being used widely and in some places you might not have thought.
Let's remember what Hadoop is. It was a paper in 2006 that essentially pushed the idea of an über data repository, where both the data and compute capabilities could sit right next to each other in a massively scalable repository. It contains primarily two big chunks of technology: Map Reduce, which aims to make outlandish volumes of data tenable by reducing its overall footprint, and the Hadoop Distributed File System, or HDFS.
In short, the design goal was to create a place where one could store outrageous amounts of data with localized distributed compute power and a file system to manage it all, so you could do interesting things, like analytics, across massive data sets. Without it, you do sampling of data sets because you simply can't run jobs realistically that span pedabytes of data in a traditional fashion. That combination of cheap scale-out storage plus compute frameworks within the same system is really the magic of Hadoop.
From the paper spawned a few companies. Cloudera was the first to commercialize Hadoop. Their deal is to take the open source community science project and add enterprise features and functions, so that real companies who make real money can run this for real profit ideally. MapR is going after the same thing, but they use their own file system, not HDFS. And finally Hortonworks, who is the pure play open source distribution for all of this stuff. And so why the huge valuations in the space? And I think it's because early market traction has moved from tire-kicking science projects to commercial production. Cloudera alone has over 600 paying customers, and there are thousands and thousands of freebie installations going on.
What's interesting to me is not everyone who's running this in production is a Fortune 1000 shop, where you'd expect it to be. There are a lot. Maybe half of all the companies that run Hadoop in production are medium-sized businesses, and I didn't expect to see that at this point.
Where this business is likely to really take off is when we move to the next level. Right now, we're still using Hadoop primarily to be the world's biggest bucket to store data in. We need data scientists to get value out of it. No offense, but anytime I hear the word scientist, as it relates to a commercial endeavor, I'm thinking we're way too early still. Scientists created the iPhone. App developers made it worth a trillion dollars.
So the next wave of value generation we'll derive from Hadoop is when we stop talking about scientists and start talking about data artists, those who write the brilliant, easy-to-use apps that take advantage of the giant repository of data Hadoop provides for us. And it will happen. I just don't know when quite yet.