In theory, one of the appealing aspects of Hadoop is that one can use commodity servers with DAS to inexpensively and rapidly spin up clusters. However, if you have invested in storage management, and have concerns about compliance, availability, governance, and information management, you might not find the Hadoop barebones approach appealing whatsoever.
That is, if you run a tight data center, why litter it with a commodity server farm that introduces additional headaches from a cost, management, and risk perspective? Symantec, through its Veritas Cluster File System (CFS), long a harbinger of storage management best practices, just made a fascinating and free option available to its customers who want to dive into Hadoop without leaving behind investments they have made in optimizing storage, governance, and security. Symantec calls its offering Symantec Enterprise Solution for Hadoop. Here's a 3 minute video on how it works.
How specifically does Symantec's Hadoop solution help customers? First, IT has tried to run away from server sprawl for many years - interest in virtualization, appliances, and cloud/SaaS are proof points. While the commodity server/DAS approach works for Web 2.0 companies who possess plenty of deep in-house technical personnel to make it work and keep it working, most enterprises want to deal with fewer boxes. Symantec's approach eliminates or reduces the need to add hardware for Hadoop.
Second, in order to make Hadoop work there is a fair amount of data replication between nodes, and Hadoopists should be prepared for plenty of data migration to populate Hadoop in the first place. For Hadoop, the embedded and external information management requirements are considerable. Again, the Symantec approach of using data in-place reduces overall data footprint and data movement requirements.
Finally, Hadoop has a bit of an availability reputation problem, due to the single-point of failure possibilities associated with Name Node and Job Tracker. Symantec supplants Name Node with CFS to make Hadoop HA, and uses VCS (Veritas Cluster Server) to overlay the frailties of Job Tracker.
Symantec primarily accomplishes "instant Hadoop" by (a) adding a connector for CFS and (b) rerouting HDFS through CFS. Is what Symantec doing pure Hadoop? Well, MapReduce runs fine in CFS mode, and Symantec has partnered with Hortonworks for Hadoop distribution and support (the Symantec piece is free to CFS customers, but you will pay for the Hortonworks distribution as you normally would). Rather incredibly, if you are an existing CFS customer you can download the connector for free from Symantec, download the Hortonworks distribution, and literally be running Hadoop in a day or less.
Are there limitations? Potentially, because you are potentially running down production paths to your data for Hadoop, so this CFS-based reuse of your existing data infrastructure may require some careful timing. You may need to run Hadoop at off hours if possible, but if you can you will have attained hardware-less Hadoop. If you don't want Hadoop to travel down production data paths, instead you could, with minimal hardware investment, still benefit from Enterprise Solution for Hadoop by taking advantage of CFS/VCS native storage optimization, protection, high availability and disaster recovery for Hadoop, in either a dedicated cluster or in a hybrid mode (CFS enables you to mix the old with the new).
Note that Symantec Enterprise Solution for Hadoop does not strictly require Hortonworks, which means if you already have chosen other Hadoop distribution(s), or have settled on pure, open source Hadoop, you still receive support for Symantec for CFS/VCS, but will need to find other support arrangements for Hadoop. At least with Hortonworks the implementation path is clear.
That said, ESG expects Symantec will try to add other Hadoop distributions, and to partner with other big data ecosystem providers, particularly on the visualization layer. For example, Datameer and Tableau on top of CFS-based Hadoop with one of the popular distributions would make for an enticing end-to-end fast path for companies to dip their toes and trunks into Hadoop-based analytics.
I am not suggesting that using CFS will replace the need for analytics databases, data warehouses, and dedicated infrastructure for analytics. However, for customers of CFS, and for potential Hadoopists looking to ride the elephant in a safer and more protected manner, Symantec Enterprise Solution for Hadoop introduces a tantalizing, and speedy, option.