Boiling the Ocean of Control Points in the Hadoop Big Data Market

First a Nod to Datameer and Mr. Popescu for the Diagram Above
For the past few weeks my mind has swirled around the question, "What are and who owns the control points in the Big Data market?" Why? Customers, investors, and the suppliers themselves need to understand who has or may accrue market power in Big Data.

Fortunately I spotted a pertinent tweet by Alex Popescu who (1), using a script, had captured basic partnership data in the Big Data space, and (2) with that data created a compelling Big Data partnership visualization using Datameer, resulting in the diagram above. For more info see The Hadoop Ecosystem Relationships, and yes, I downloaded the Datameer tool and played with the visualizations, but frankly couldn't improve on Alex's. Though not a perfect control point metric, certainly, for a Big Data supplier "how many partnerships do you have" reflects on one's influence in the Big Data ecosystem. Thus the diagram offers a jumping-off point to consider control points in the Hadoop Big Data market.

What is a Control Point?
Despite the incredible diversity of Big Data technologies, there are some critical technology and service control points just as in any IT segment. For example, Apple (iOS), Google (Android), and perhaps Microsoft (Windows 8) own key control points, the operating systems and the primary app stores, for Smartphones. All mobile roads eventually pass through those vendors. Similarly VMware is the leader in a key control point for virtualization, which itself is a control point for Cloud. Typically in any technology market the suppliers who lead in the key control point(s) have the opportunity to reap the most influence and revenue out of the segment. Tech buyers tend to flock towards control points.

The Long List: Big Data Control Points Track the Big Data Solution Stack
Big Data requires its own solution stack and appropriately trained humans to develop, manage, and render benefit from that stack, but it also deeply integrates with other IT stacks and technologies. Servers, storage, bandwidth, databases of all kinds, data integration tools, algorithm development, software development tools, management software, machine learning, analytics engines, visualization tools, applications and services of various flavors all harmonize within the Big Data ecosystem. If you check out all the suppliers in Popescu's diagram (and while there are some well-known suppliers missing, the list is nearly comprehensive), out of this long list, which suppliers own the control points? For the exercise I primarily parsed control point nominees from the top 25 most-partnered list represented in the bar chart on the right side of the graphic, though made some exceptions.

Control Point Short List 1: Independent Hadoop Distribution Providers
At least in the Hadoop-oriented Big Data solution community, those suppliers who are the primary providers of “Hadoop distributions” are clearly nominees for control point winners. As suggested by the diagram, among the "independent" vendors, Cloudera, HortonWorks and MapR stand out.

Control Points Short List 2: Infrastructure and HaaS Providers
IBM (including Netezza), Oracle, EMC, HP, Cisco, and NetApp all show up on the top 25 partnered list. Big Data adopting companies are concerned about obtaining enough servers, storage, and often-overlooked bandwidth for Big Data projects and certainly these providers address those needs. In addition, these infrastructure providers partner quite directly with the independent Hadoop distribution providers, and even with one-another (though not always happily) - forming control point powerhouses. Though VMware doesn't sell hardware per se, they do so as a proxy, so we would include VMware in this control point nominee list as well. Though not appearing in the top 25 bar chart Dell certainly belongs on this list, and ESG also sees SuperMicro as a perhaps less well-known infrastructure provider who exhibits no less commitment to Hadoop than their larger cousins. We also see firms like Emulex creating options for the bandwidth supply side for Big Data beyond the usual suspects.

ESG believes that the Cloud offers a nice infrastructure alternative for Big Data projects, and we refer to these as "HaaS" or Hadoop-as-a-Service offerings. Even though Amazon didn't make it to the top 25 we know they are there are on a number of key fronts both in terms of EMR and some key Hadoop distribution partnerships. We include Microsoft in this control point segment as well with their Azure-based Hadoop services, and we are closely watching Google here as well. The aforementioned IBM and HP also offer HaaS.

Control Points Short List 3: Analytics Databases/Platforms
What kind of database, and related tools, do you use for Big Data? While certainly the Hadoop and related open source options have and will be used for many projects, a variety of next gen analytical databases/platforms, the databases often columnar and/or in-memory in terms of architecture, may rate as the focal point for many Big Data style analytics projects. They may operate in conjunction with or instead of Hadoop.

Here we see Vertica (HP), ParAccel, AsterData (Teradata), Greenplum (EMC), SAP’s Hana and Attivio. Note that all these vendors have developed rich tools sets around their respective databases, such as data integration and visualization features either on their own and/or through partnerships. Thus we could categorize these Big Data offerings as "Analytic Platforms." Typically we would associate MicroStrategy with "analytics platform" versus being a DB provider. We also cannot ignore some of the scale-out specialty databases here such as EnterpriseDB and DataStax which offers a commercially comfortable version of the Hadoop-related open source database, Apache Cassandra. In fact, there is quite an explosion of new database entries into the market to deal with Web-scale and/or Big Data use cases, far too many to list here, which will be the focal point of a coming blog post. We also cannot forget the classical RDBMS and their related data warehouses as Big Data options, or at least data sources for Big Data, from the likes of Oracle, IBM, Microsoft and SAP/Sybase.

Control Points Short List 4: Data Integration
How do we connect the Big Data stack to all the potential data sources out there, and do so in a flexible fashion? Data Integration and DB tools vendors now can step forward as potential control points for the Big Data market. While the already mentioned IBM, Oracle, Microsoft, and SAP have offerings in data integration, the data integration specialists called out in Big Data partnership analysis include Informatica, Talend and data virtualization/integration specialist Composite Software. I found it curious that Tibco, typically a key player in the integration space and who recently received glowing treatment from stock market witch doctor Cramer and their own CEO for cashing in on Big Data, is not particularly well-connected in the Popescu visualization. In fact Tibco showed fewer connections than smaller integration specialist SyncSort or Big Data DB tools specialist Quest Software who have each exhibited a deep Big Data commitment. While the Popescu sample is certainly not perfect, it still seems fishy that Tibco seems so isolated.

Control Points Short List 5: Integrators, Analytic Apps, and Visualization
Why throw together Big Data services suppliers (aka "Integrators), plus those vendors that produce Big Data apps or offer developmental technologies to do so, and those that focus on offering Visualization solutions? All of these services and/or technologies lead to the finished Big Data analytical results for the business user - the last mile of Big Data if you will! For those who believe that the budget-holders for Big Data primarily come from line-of-business, this final segment of providers can certainly be considered for potential control point owners. Revolution Analytics fits here nicely, but many of the aforementioned providers, particularly the analytic platform providers, can take Big Data solutions all the way to "application." Of course the visualization tool used to create the partnership graphic comes from Datameer, and Alteryx has emerged as a promising Big Data app platform. In terms of integrators, IBM has jumped in with both feet, as some of the usual large system integrator types, but notice that Think Big Analytics made the top 25 cut due to its strong focus on Big Data projects. We also see medium-size integrators with a strong analytics heritage like Lilien LLC being strong "go to" options for Big Data interested enterprises and Web 2.0 companies. One of the most fascinating hybrids between services specialist and application generation is Opera Solutions.

Who Has the Control?
We have listed many Big Data suppliers, some true specialists, some with wide breadth, some in-between. And unfortunately we have also not listed a huge number of vendors who are focused on Big Data. The field is literally littered with newcomers, with existing BI/analytics specialists adapting for Big Data and with large vendors turning their attention to Big Data. This five "control point" oriented segmentation only offers a rough guide, since many providers bridge many of these segments. It seems ironic perhaps that two of the biggest contributors to the Hadoop movement, Yahoo and Google, do not stand out on this list of potential Big Data market control points.

Now that hard question: Who has control today out of all of those short lists? I nominate 3 categories: (1) Certainly that first category of "Hadoop distribution provider," with Cloudera having the most partnerships, seem most plugged into the ecosystem. (2) Some of the large players that offer breadth, with strong partnerships with the Hadoop value-added distributors, covering infrastructure/HaaS, analytics platform, including existing enterprise technologies to supply data sources, and offer related services, are in strong positions - IBM, EMC, Teradata, and Oracle are good examples. (3) Enterprises with strong commitments in the past to BI/analytics and who have preferred suppliers in that domain, whether pure BI/analytics, or apps related, may choose to continue those relationships for Big Data. We need to add SAP and Microsoft here of course, and a true best-of-breeder in the analytics space, SAS.

We see, over time, those big breadth players (categories 2 and 3) augmenting their offerings in data integration, database, analytics platform, visualization, and application development through acquisition. We also see some of the services organizations obtaining a key foothold in certain vertical industries, where whomever has "the most data scientists with a vertical bent" may win. Market consolidation must happen as the Big Data supply-side attempts to simplify Big Data for the everyday enterprise. And perhaps most fascinating, we will watch to see who snaps up the likes of Cloudera, MapR, and HortonWorks, or if one or more of them survives on their own and becomes what Google is to search, SAP is to enterprise apps, or Apple is to the smartphone/tablet market.

Topics: Storage IT Infrastructure Networking Data Platforms, Analytics, & AI Cloud Services & Orchestration