Nice data you have there. We wouldn’t want anything to happen to it now, would we?
Like many newer technologies, the initial focus of big data has been around the flashy bits: speed and capabilities. Vendors have been keen to show they are significantly better, faster, cheaper than traditional approaches, even “blue ocean,” if you like MBA terminology. Which is great for getting attention from a possibly complacent installed base, but can’t be the full story.
As they say in golf, “driving for show, putting for dough” which in this context would imply it’s not only the exciting parts seen in a demo or POC that matter, nuanced enterprise operational requirements must be also met for the overall project to be considered a success in production.
One of the bigger gaps today is around the tightly linked concepts of high availability, data protection, disaster recovery, and even business continuity. While Hadoop has some redundancy built in, much more must be done to keep the lights on in the event of the many various catastrophes that lurk. Or should I say “datastrophes”?
As businesses become truly data-driven, they will need the promise of uninterrupted operations with both the data itself and the complex architectures and platforms that are used for the advanced analytics.
ESG has recently completed some research into best practices for data protection and disaster recovery for big data--if you want to know more, please holler.
UPDATE: there is a new brief here, now available to everyone: http://www.esg-global.com/briefs/data-protection-and-disaster-recovery-for-big-data