The Big Data Platform Metamorphosis

Many key big data suppliers have moved beyond the piece-part larval form, shed their respective skins to varying degrees, and are in the process of blooming into big data platform butterflies. Which vendor(s) will emerge as the stunning Blue Morpho of big data butterflies remains to be seen. The elements of what a big data platform consists of, while still somewhat hazy and varied based on what an organization thinks it wants to accomplish with big data, seem more sharply defined than six months ago. Suffice it to say that a transmutative race down a path of platform evolution is well under way, driven by the recognition that most enterprises will not invest in too many big data piece-parts. Consider some of the offerings and announcements, many of which have appeared around or during the past crazy data conference week:

  • In Teradata's Big Analytics Appliance you will find Hortonworks Hadoop sitting next to the Aster MPP analytics database and Teradata data warehouse. Note that the appliance does not contain visualization, but Teradata typically leaves that "V" to partners like Tableau and Microstrategy.
  • Greenplum already offers a variety of platform choices, including the modular and non-modular versions of its Data Computing Appliance, but regardless the appliance includes both Greenplum databases and a plug-in architecture for third party visualization. Since MPP analytics database and Hadoop combinations seem in vogue, it should be noted that Greenplum MR, in partnership with Cisco UCS, offers a "reference architecture" implementation for MapR Hadoop. While Greenplum just announced the availability of now open source Chorus through Kaggle, we will address analytics developer productivity in subsequent postings.
  • Pentaho, who combines data connectors and data integration with analytics modeling and its expansive visualization layer—feels like a platform to me—in conjunction with the Cloudera Hadoop distribution, just obtained a C round of funding to the tune of $23 million.
  • Newcomer Platfora, one of the numerous big data investments made by VC Andreessen Horowitz, augments Hadoop with an in-memory data management engine using what they call Fractal Cache. Platfora adds HTML5-based memory-intensive visualization, and works with all the major Hadoop distributions.
  • MapR's latest version, M7, transforms the heretofore not-enterprise-class Hbase, Hadoop's default non-relational database, into something far more enterprise-friendly. MapR argues that if you are going to load Hadoop, then why not use it for compute as well, rather than one further step down the big data process post-extraction. In short, how many databases to you really want involved in big data?
  • Cloudera unveiled Impala, a real-time alternative to the native but typically slow and batch-oriented warehouse/query Hive layer of Hadoop. Impala already picked up a variety of visualization layer partners promising Impala support, including the aforementioned Pentaho, Tabeleau, and Microstrategy, plus several others.

Winners of the Week

Between the Teradata Partner's Conference, IBM's Information on Demand and the NYC Strata/Hadoop World combined event, and announcements around surrounding these conferences, which announcements really struck me as major steps forward? Here are my top three, with ties.

Bronze: Cloudera and MapR both stepped off the ledge and directly addressed commonly understood weaknesses in the Hadoop platform. Fortunately, they did not directly address the same parts—Hive in the case of Cloudera, Hbase in the case of MapR. Hive and Hbase are essential ingredients to Hadoop, but frankly both pale in comparison to the several commercially available warehousing and Not-Only-SQL database offerings swarming the big data market. Both major Hadoop distribution vendors had the nerve to make large R&D bets and investments to keep other parts of Hadoop relevant, which while helping their own positions, also will help the many customers already using or about to use Hadoop. ESG believes Cloudera will gain traction in the more general BI/analytics space because of Impala, and that MapR M7 will entice customers and partners in the embedded and industry-specific analytics big data arenas.

Silver: Also, a tie, between IBM and Teradata: IBM, already offering all the pieces that when combined carefully yield a fully formed big data platform, are taking dead aim on industry and role-specific big data packages, and among other announcements, unveiled the IBM Digital Analytics Accelerator. The accelerator, which, when packaged with the IBM's Pure Data offering, takes on the task of helping Global 2000 CMOs and marketing professionals climb the complex big data curve. It is generally accepted that customer and consumer analytics is a sweet spot for big data—it certainly has the greatest cross-industry span of appeal—and IBM has the relationships, technologies, and now a specific offering to appeal to large enterprise CMOs.

Teradata, being one of the two largest pure play data warehousing/analytics vendors along with SAS, has about as much to win or lose as any supplier in the big data movement. Combining data warehousing, MPP analytics database and Hadoop under the same appliance hood is a major step in the right direction (you will need two separate Oracle appliances to accomplish the same thing). Adding the appliance to Teradata's new big data roadmap services, called Analytic Architecture Services, will help keep Teradata directly in front of its big data curious customers.

Gold: And the winner is, tah dah, well, not yet Pentaho or Platfora, though both may show gold in the near future. Pentaho now has the funding to take its platform to the next phase, and Platfora has the opportunity to prove that it is indeed the cutting edge offering for quasi-Hadoop based big data platforms. But the gold medal for crazy data conference week goes to Hortonworks, who landed both the aforementioned Teradata and also Microsoft as distribution partners. We have long awaited Microsoft's big data position, and Microsoft came out with authority on the subject, with Hortonworks as the key partner. Cloudera, already with a wide swath of distribution partners, and MapR, who has lined up Greenplum as well as many of the key cloud providers including Google, had begun to put some distance between themselves and Hortonworks. Adding Microsoft and Teradata to the Hortonworks column, however, immediately closes the gap.

Topics: Storage IT Infrastructure Data Platforms, Analytics, & AI