Pentaho's Three-Legged Race to Big Data (with HDS)

three-legged-racePentaho World 2015 was held in sunny Orlando this year, with over 500 attendees, and was by all accounts a friendly and informative affair. About the only question no one could answer is why the company is called Pentaho, but a rose by any other name is still very nice. One thing that was quite clear is that the team is hitting its stride with HDS as a powerful running mate.

I see three major angles that will play out well for Pentaho:

  1. Data integration remains the most common challenge in big data and analytics. Here Pentaho is in an interesting position, it has none of the baggage that traditional data management companies carry. Informatica, IBM, and Oracle are all rapidly modernizing their offerings, but have to catch up to requirements of the big data era, with Hadoop and NoSQL variants taking hold, and it’s not always easy to re-shape older tools. Some of the efforts in this area by others are just plain odd--for example, who wants to try to manipulate a 1,000 column table on a smartphone? Other Pentaho competitors are young start-ups still working on entry-stakes functionality or picking a smaller niche. By contrast, Pentaho was born and built in the current era, but has also had enough development time and market success to build in the operational maturity required for enterprises, leaving rough edges around upgrades and maintenance behind. Not only does the product work well with Hadoop and Spark, it smoothly handles the data flow and blending reuqirements. Pentaho Labs wants to both stay super fresh as surrounding (often open source) data platforms rapidly evolve, but also act as the “heat shield” to protect its customers from getting burned by too fast a pace of change.
  2. Developing a business-oriented business intelligence platform is another frequent goal. Here Pentaho has some strength as well. Better user experience is critical to mass adoption in a customer organization, and integration, visualization, and model building have all improved. The ability to auto-learn, blend, refine, and standardize data and meta-data helps a lot, as does auto-modeling to set up likely analytics for the user. Even if these approaches aren’t 100% perfect, they certainly move the starting line much closer to the final goal of getting insights quickly. The easier you make it to understand the data, analyze, and get answers, the more people will want to interact with the tools.
  3. HDS is Pentaho’s best friend and will hugely accelerate results. Not only does it increase the friendly install-base for selling by at least 10x, HDS brings a discipline of operational excellence that will bring faster acceptance by not only data scientists and analysts, but the critical blessing of the IT infrastructure and operations teams too. I’m vocal about the need for an HDS big data computing platform, like the UCP 6000, but including Pentaho and partner Cloudera too, fully integrated and optimized. Deployment models are getting more varied, not more constrained, and there is no reason that Oracle and Teradata should lead in appliances without challenge. HDS clearly has the know-how and the products to build this, with huge market success around Oracle database infrastructure for example. Having Pentaho in-house should make this an easy decision, though they are being cagey about when it might happen.

Importantly, the bigger Hitachi story for social innovation remains compelling. I love the idea of changing the world for the better, but adore the fact that they now have real world and repeatable offerings for problems around smart cities and safety, smart power grids, smart instrumented cars, and yes, the obvious smart IT operations analytics. This kind of play puts HDS and Pentaho into a very small cohort of IT vendors who can deliver comprehensive vertical industry and government solutions. More excitement to come in the race….


big data analysis

Topics: Data Platforms, Analytics, & AI