ESG Delta-V awards recognize the top 20 companies that made an impact in big data and analytics in 2015. This post focuses on the companies to congratulate for their success in the category of Big Data Platforms. This covers the software systems that seek to organize big data into something much more usable in the real world. If you wanted hardware appliances, please see the Engineered Systems category instead.
To learn a little more about the awards, check out the overview post.
DataTorrent — A rose by any other name would be a real-time stream processing platform, and that's what you get with these guys (and gals.) Given enough data scientists and enough time, you could probably cobble together some of these capabilities from freely available packages. Of course, that would mean hiring a bunch of really expensive talent and making them work on plumbing for a year or more. Then they'd need to keep administering and upgrading it, as all that open source stuff keeps changing. Instead DataTorrent gives you a fast, reliable, scalable, fast, secure, supported, and automated processing platform. Did I mention fast? You focus on the logic, let them do the tricky bits.
Interana — One of the most common questions in big data is "why?" Why did that happen? Why did the customers do that? Why is it so freaking hard to understand cause and effect? If you want behavioral analytics on event data, Interana puts all the pieces together. Designed to be self-service for everyone, you get lightning quick questions on all your data, not a sample. Built-in database? Check. Rich analytics? Check. Easy to read visualization? Check. Runs anywhere? Check. Answers on demand? Check out Interana.
Platfora — I've ranted before about "end-to-end" solutions being only a question of where you define the end points. Platfora starts from the raw data in Hadoop and handles preparation, processing and analytics, all in one integrated platform. Lenses offer views into a catalog of all data for secure self-serve discovery and collaboration. Business analysts can get what they need, without having to hassle the data scientists or having the bother of becoming data scientists themselves. Hadoop actually becomes useful for everyone.
Trifacta — Going one click deeper into data preparation, we find Trifacta as the champion of wranglers everywhere. Garbage in, caviar out. Trifacta's platform helps analysts, business users, and IT teams span a range of data improving activities from dicovery, definition, cleansing, enriching, validating, and sharing. Skip any of these steps at peril of getting good answers based on bad data. The common joke is "80% of data scientists' time is preparing data, 20% complaining about preparing data." Trifacta destroys the joke's premise, which is less funny, but way more productive.
Zaloni — When I took this job, I wanted to focus on the excitement of "big data", not the boredom of "data management." Zaloni shows how you can combine the benefits of both, and in fact can't separate the two and still expect to be successful. Bedrock sets up your data pipeline to deliver a well-managed, fully operational data lake. It also works natively with all branches of the Hadoop ecosystem, and given that many organizations are not standardizing on one distribution, this is pretty dang important.
Maybe these platforms aren't directly comparable, but what they all have in common is a clear and coherent approach to making big data work for the enterprise. This is a critical path for the maturity of big data. These companies are moving the whole industry forward.