Ascent to the Hadoop Summit

This week San Jose was home to the Hadoop Summit, which boasted 3,000 live attendees, 88 sponsoring companies, and many, many streaming viewers around the world. It was a microcosm of the big data industry and the overall energy, breadth of solutions, and new announcements reflected the momentum of a very disruptive, competitive, and exciting area of IT.

A quick round up of notable views and news then…

Hortonworks, one of the main hosts, spoke about four pillars of their strategy: 1. Develop the open source core of Hadoop, 2. Extend features to fill functional gaps in the platform, 3. Build up a coherent partner ecosystem to facilitate integration of complete solutions, and 4. Retain their open source principles to engage and benefit the community as a whole. Alongside this, Hortonworks emphasized the compelling economics of Hadoop, the ability to unlock new insights, how to optimize existing investments in IT generally, and in analytics specifically. This matches the maturity of Hadoop solutions that are going from new analytics apps to new data architecture (whether lakes, hubs, or other marketing metaphors….)

Microsoft talked about “making data work for everyone” and described this approach as linking data, people, and analytics as interlocking circles of equal importance. The gist of this is to have all data made usable without friction and easy for people to get their minds around. This easy access was exemplified by using standard Excel to take data from other external sources, not least HDFS (yay!), with PowerQuery, a freely available add-on (assuming you aren’t using a Mac, boo!) Microsoft is hedging their bets on deployment models with consumption via software, appliances, or the cloud, and with flexibility provided by a federated query layer on structured and unstructured data in all these locations, giving single instance access from anywhere. The simplicity of needing only 15 clicks to have a Hadoop cluster in the Azure cloud reflects their ideals.

Redhat told a different story of big data, with a focus on a variety of handy tools to accommodate the interstitial spaces of a solution. These included the JBoss Data grid for in-memory performance, enterprise Linux, JBoss Fuse works for ETL, JBoss data virtualization for a common data model, and JBoss BRMS for metadata extraction. They don’t aspire to be the platform in the same way others talk about it, but do have the virtues of being all open source, with a relatively low cost of ownership, and helping to make Hadoop enterprise ready.

SAP implored the audience to move from just experimenting with big data to seeking real business results. The message was to learn from ERP and speak to the lines of business and IT both. SAP brings four offerings to assist: experts, analytics, applications, and (surprise!) a data platform. The proposed idea was to use Hadoop as a data lake in combination with in-memory HANA analytics to mine that lake, mixed metaphors be damned. SAP plays nicely with Hortonworks, Cloudera, and MapR alike, and has worked to integrate Hive, Pig, and MapReduce with HANA. They also have many of the data sources in the forms of SAP ERP, SAP Sybase IQ, and more already in many enterprises.

SAS focused on the idea of high performance analytics, and with real-time analysis becoming a common goal, it’s worth heeding their warning that “data movement will kill you.” Violence aside, it does make sense to bring the analytics to the data rather than fight physics at volume on the network. SAS prefers to score models directly on the Hadoop cluster, lift data in-memory, run the model, and provide the visualization of the statistics. Like SAP they declared their heritage better prepared them to deliver high-scale, enterprise-ready solutions.

Teradata waved a similar flag of “bringing Hadoop to the enterprise” and called out their portfolio of purpose-built, high-end Hadoop appliances, featuring Hortonworks 2.1. The simplified deployment of a ready appliance, the safety of name node failover, the easy administration of disk replacement, and the management Viewpoint with a single plane of glass were all held as examples of why this approach can be faster, safer, and more mature than DIY on commodity. Aster and Hadoop are offered with expanded professional services to help, and it’s clear Teradata won’t let themselves be marginalized as yesterday’s data warehouse.

MapR also told a story of maturity with their enterprise-grade Hadoop distribution offering meaningful differentiation in the operational requirements of production environments. This is complemented by a new App Gallery of meaningful and valuable partnerships showcasing ready-to-use add-ons to the platform from 30+ adjacent vendors, trumping the glib “logos on slides” approach of some ecosystem stories.

While there were many, many other interesting plays, I think these companies are all good examples of the tone of the event. Big data is growing up and getting real, and both the industry dominators and disruptors are upping their game to the benefit of their customers.

Topics: Data Platforms, Analytics, & AI