In mid-2012, ESG found only about 10% of organizations working on “big data” projects using public cloud services and infrastructure in the context of the project in some fashion. Few companies ran production big data instances in the cloud and usage was experimental or for initial discovery purposes. Much has been written about security being an adoption hurdle for public cloud for enterprises, in fact security was cited as the #1 concern for big data projects too, a double whammy for cloud plus big data. The next greatest challenge cited for big data was integration, and SaaS apps typically ran in silos for the first decade of the 2000s; in the last few years, cloud demand for integration exploded and many SaaS providers were caught napping. Looking at the evidence from six months ago, it sure seemed bleak for big data on cloud.
But, due to the appeal of quick provisioning enabled by cloud providers offerings Hadoop-as-a-Service, and augmented by several analytics databases made available as-a-service, the latter half of 2012 saw a rapid changing of fortune for big data on the cloud. Mind you that full-scale cloud-based big data implementations remain few and far between—security, data movement, and integration all remain key concerns. But let’s face it, the notion of SaaS big data is every bit as appealing, in terms of provisioning and ease of access/distribution, as SaaS apps. Also, at least in some cases, IT departments found that the type of infrastructure required for Hadoop and parallelized analytics fit nicely into the notion of cloud elasticity.
While many organizations have tapped into Google Analytics for web marketing analyses, it is a specialized use case, and therefore Google doesn’t make it to my finalists for Big Data Cloud Provider of the Year for 2012. Microsoft's Windows Azure, however, does make the short list, not just because it offers a healthy set of Microsoft’s and others’ relevant data, integration, and analytics services, but because, after a long quiet period, Microsoft started offering Hadoop compatible big data services through its HDInsight offering. HDInsight includes a Hadoop connector for Microsoft SQL Server as well as Apache Hive drivers for Excel and ODBC. While HDInsight didn’t appear until October, it certainly places Microsoft in the midst of the larger big data movement going forward.
Just as HDInsight offers a big data bridge for Microsoft technology oriented developers and data analysts, Joyent seems to be the cloud of choice for the on-the-cutting-edge analytics developer and data scientist. What I find particularly appealing is Joyent’s flexibility: Either use your own analytics database as long as it runs under Linux (and almost all do) or Joyent’s own distribution of the skinny OS/hypervisor called SmartOS, or fire up one of 3 pre-defined cloud databases for big data. And for those wanting blazing bare metal implementations of Hadoop, Joyent is a top performance choice—Hadoop ISVs take note.
Winner: Amazon Web Services (AWS) however, is, by far, the big daddy of big data in the cloud, let us attempt to count some of the ways: Hadoop on AWS through Amazon MapReduce? Yes; the most NoSQL databases for analytics and related apps on AWS? Check, including their own DynamoDB; enhanced storage and data movement services optimized for big data purposes on AWS? Yes and yes. Plus AWS runs several BI/analytics offerings, offers a variety of third-party public data feeds to plug into your analytics, and of course offers the type of native elastic infrastructure that all big data implementations would hope to find. One could legitimately argue that, with the exception of IBM, AWS has the most comprehensive set of offerings and partnerships of any big data provider.
Big Data Cloud in 2013: In Search of Safety, Speed, Scalability, and Simplicity
While the latter half of 2012 showed an uptick in interest and use of public cloud for big data purposes, 2013 will still rate primarily as it “prove it to me” year for most enterprises. I am frankly a little shocked that some of the top security vendors in the world, and you know who you are, have done little to offer targeted solutions and partnerships to better secure big data in the cloud. Take a look at the Cloud Security Alliance’s big data working group and you will find a low level of activity and participants. AWS has a great big data web page, but as you parse through it you will find barely a mention of security. Is there an opportunity for vendors here? Absolutely, and despite lack of activity thus far, expect more focus on cloud security for big data in 2013.
Cloud competes primarily with appliances and the promise of on-premises “commodity hardware” associated with Hadoop implementations, but being outside of enterprises' data centers, public cloud is susceptible to network speed and capacity concerns. Maintaining healthy bandwidth associated with data movement is an essential ingredient of big data lifecycles. Data comes from everywhere—various on-premises applications, machine logs, SaaS apps, third parties, social media, sensors, etc. Cloud providers have an opportunity in 2013 to convince enterprises that the cost of bandwidth for data movement is not punitive—that the rapid provisioning and flexible virtual infrastructure advantages of cloud are not offset by data movement costs and latency. Similarly, the scaling and memory requirements of big data analytics are unique, driven by query, not just data volume, calling for more carefully configured clusters. AWS began to attack the scalability and data pipeline issues late in 2012, some at re:Invent, and we expect to see others following suite during 2013 as well as additional forays from AWS to address these concerns.
And finally, today cloud service providers, whether offering their own solutions or partner solutions, treat the entire set of big data services as “throw it over the wall.” ESG believes, however, that the do-it-yourselfers will move into the big data minority in 2013, and enterprises will increasingly want productivity tools and applications for big data—that is, simplicity. ESG suggests, therefore, that cloud service providers invest in a creating a clear set of services, managed through equally clear workflow and configuration options, and commit to partner services that also deliver productivity. It would be ironic if all the momentum built for big data on cloud ceased because cloud-based offerings didn’t keep up with on-premises and appliance-based offerings in terms of productivity.