Just home from the latest Strata+Hadoop World in NYC, with over 6,700 participants and at least 150 vendors, and I wanted to share some reflections on the event and the big data market as a whole.
My grandfather had an axe. He handed it down to my father, and one day the handle broke and had to be replaced. I've had it a while now and had to eventually fit a new sharper head. Is it still my grandfather's axe? This is the question around Hadoop as now you can choose HDFS or from a variety of other storage engines, files systems, and databases for the handle, and also swap out MapReduce for Spark or specialized analytics engines at the head end. Is it still Hadoop? And if Databrick's Spark, MapR's FS or DB, Cloudera's Kudu, or Hortonwork's Dataflow becomes the focus today, for how long? Can the new ODPi set a standard and keep it long enough for IBM and Pivotal and many new members to build solutions before it's obsolete again? Will Apache Flink be the next hot flavor of batch and stream processing? How heterogeneous and multi-faceted does your data platform need to be? The major distributions are in a rush to move beyond the basics. This rapid innovation in the industry is exciting for everyone, but for some customers, it's exhausting. Traditional enterprise infrastructure and application lifecycles operate on time scales of years, not weeks.
That said, there is more interest in moving beyond the architecture concerns and finding applications that offer real value in a focused use case. The great success of the Oracle Big Data Appliance (BDA) and Teradata's competitive offering suggest that data warehouse optimization and offload remain common strategies, but many other ways to derive value are emerging. BI on or in Hadoop is hot, with quite a few companies finding ways to connect SQL or user friendly dashboards to make insights accessible to self-service business users. One example I liked was Actian's role in reducing churn of cell phone customers based on contract expiration, older device hardware, and call quality issues, by identifying those at risk and making the right offer to retain them. Security and fraud analytics also have a very compelling business value.
Another big theme was the emergence of what I see as cross-platform enterprise quality offerings, by which I mean tools that improve a characteristic of multiple Hadoop and NoSQL environments at once. This makes solutions far more acceptable for mission-critical production use cases, and is independent of any one distribution. Some examples include: Blue Talon for role-based data access controls, Blue Data for on-demand cluster provisioning, Pepperdata for heterogeneous cluster workload and performance optimization, Gridgain for in-memory speed, and Zaloni for providing a well-named Bedrock foundation of data management for big data.
A last consideration is how to get faster time to value and reduce the hardware and software systems integration. Big data remains a very complex proposition with many layers to the technology stack. If a company wants to evaluate multiple vendors at each, they will have not just a couple dozen products to test, but also an NxM compatibility matrix to solve. Not easy at all. This is where you see people looking for ways to consolidate tiers of the solution or skip building them entirely. Appliances (already mentioned) are appealing here, as well as cloud services like those from Microsoft (Azure, Data Lake, HDInsight, ML), Rackspace, Amazon (EC2, S3, Redshift, EMR), IBM (BigInsights on Cloud, BlueMix), and the spanking new Google service (Dataproc). Or at least look for ways to collapse the stack using broader solutions like Platfora, Arcadia, or Interana as more complete packaged offerings.
As a wise man once said, "There's a lot going on there." Drop me a line if you'd like to dig in deeper or have your own observations to share...