A curious reversal is happening around cloud-based big data

Big_Data_Cloud.jpgRemember when the cloud was just getting going? Somehow it morphed from the traditional concepts of hosting and XSP (I always hated that acronym) into a new class of service. Virtualization and management tools were a big part of facilitating this transformation, as was the sudden shift in economics. The problem with hosting and "other" managed services was it usually wasn't any more agile or any less expensive than doing it yourself.

Then when cloud computing got going properly, there was a widespread perception that the main advantages were:

  1. Renting is going to be cheaper than buying stuff with CapEx budgets.
  2. Deploying and managing infrastructure will become somebody else's problem.
  3. There wasn't really a number three, but all lists need at least three items.

The disadvantage of cloud that everyone moaned about was security, or perceived lack thereof. Some people called this issue "control" but if it works, then who controls it really isn't so important. Availability was also regularly questioned, until businesses looked at their own IT departments track records for outages and decided the cloud wasn't so bad at staying alive after all. There was also concern about the prospects of the cloud providers, i.e. "what happens to my data when they fail?", but as most gravitated to the winners and they consolidated their strengths, this issue faded away too.

So cloud is winning, because the advantages proved (mostly) true and the disadvantages proved (mostly) false. 

Enter the modern era of big data and analytics. Same situation as other applications and data, only more so, right? Wrong. Look at this chart from recent ESG research about the top advantages of cloud-based big data.




There were more choices on the list, but clearly security and availability are ranked very highly, meaning cloud is generally seen as safer than DIY. Time to value and time to deploy are also up there. I'd say this is unfair to all the hardworking folks in your IT departments, but they are ones who answered the survey, so good on them for being self-critical and honest. "Data and applications already in the cloud" is interesting too, because it speaks to the gravity of big data. No one wants to ETL shuffle petabytes across a WAN or the Internets.

So what are the concerns about cloud-based big data and analytics? Would you look at that, security and availability again in the same ranked positions. Clearly there are two schools of thought here, and roughly as many sceptics as optimists. Fair enough. More surprisingly, cost of compute and cost of storage are seen as top disadvantages now. I'm not sure how many can actually beat the price of a mega-scale cloud provider, but it seems some still believe it so. This is probably a direct result of the intensive workloads and extreme volumes of data in most big data use cases. And the worries about being tied to a specific vendor have resurfaced here.



Where does this leave us? Cloud-based big data is now frequently thought of as more expensive, but as often better as it is worse on the security and availability fronts. That's a very significant shift in the market. Look at the major players:

Those are some of the biggest players, but there are many focused startups rapidly innovating in this space too:

I haven't even mentioned the "cloud-based data platform extensions approach" for vendor proprietary solutions. This camp has Oracle cloud, Teradata cloud, Splunk cloud, and many others, all of which are worthy offerings in their own right.

So in summary, there is a lot going on here. Further, I fully expect the momentum to cloud-based big data will greatly accelerate as IoT reinforces the need for analytics everywhere.

software defined storage insight

Topics: Data Platforms, Analytics, & AI Cloud Services & Orchestration