A lot of people talk about "democratizing big data" and providing more analytics-based insights to more of the business. This has obvious value, and being data-driven in decision making is a fundamental principle of most initiatives. There are differing approaches to this, however. Some believe the best way to achieve democratization is through pretty dashboards and pictures, and, as Arlo Guthrie once said, they had "twenty seven 8"x10" color glossy photographs with circles and arrows and a paragraph on the back of each one, explaining what each one was, to be used as evidence against us." While I love a persuasive data visualization as much as the next guy, well, this only solves part of the problem.
More often the issue is simply getting enough system resources available to apply to the questions and access to the various data sets at hand. Not so easy. ESG finds that many businesses are now selecting purpose-built appliances or public cloud-based solutions for big data, simply to avoid the lengthy, multi-disciplinary effort of provisioning an appropriate hardware environment on-premises. One of my blogs last year talked about how Alpine Data Labs and MarkLogic are addressing some issues this area, and you can still analytic-power-to-the-people-alpine-data-labs-marklogic/index.html" target="_blank">read it here.
Now BlueData is coming out of stealth mode and is promising the next big thing in easy access to big data clusters and data sources. The startup has built a nifty tool for on-demand provisioning, making it a few clicks to provision and allocate nodes, choose a Hadoop distribution, point at data, and even start, monitor, and manage jobs. The offering which is available now as an Epic One community edition, and coming soon as Epic Enterprise for the more demanding clientele, but both essentially help create an optimized private cloud on-premises for the big data users.
Shown below is a screen shot of the simplified cluster management, with a variery of clusters humming along happily
A data scientist or analytics power user with an analytics query to run can really just identify the type of job, application, distribution, version of MapReduce or YARN, priority, and class of servers to be used. To close with Arlo's wisdom, "you can get anything you want at Alice's Restaurant" and BlueData just made it lot simpler for both the patrons to order off the menu and for the chefs in IT infrastructure and operations to have orders filled instantly.
Pretty clever, and this will have additional benefits such as facilitating more efficient utlization of hardware resources for better cost control, more flexibility to use the compute and storage as required for specific jobs, and more security with pre-checked configurations used every time.