A Distribution of Assets: Cloudera in the Big Data Era

Let’s start with this week’s puzzler:

What can store everything, but has no mass of its own?

What is part pig, but looks more like an elephant?

What has all the answers, but none of the questions?

What is free for all, but much better when bought?

Write your answer and #hadoop on the back of a postcard, find a stamp, and mail it to @nrouda on the Twitter. Correct answers will be awarded 10 points. Funny answers will get 20.

Recently, I had the distinct pleasure of catching up with my old friend and mentor Alan Saldich, now with Cloudera, to talk about big data, hadoop distributions, and the answer to last week’s statistical analysis.

Well, that’s what I thought we’d talk about anyway. Here are three key points that I took from our conversation:

1. Any CIO who declares they have a grand “Big Data initiative” is in trouble. A project needs a much tighter definition to accomplish something meaningful and ultimately successful. Scope out the business goals before you even think of architecting and building a massive new data center or overhauling your entire environment. Better yet, start small, show the value, and expand incrementally from there.

2. The ultimate goal is to construct the “Enterprise Data Hub" for your organization, which will act as a central clearinghouse for all data. To do this you will need storage scale and protection, security controls and governance, and access usability and familiarity for different users of diverse backgrounds. Which leads us into integration.

3. The ecosystem is as important as the distribution. The world of big data, business intelligence, analytics, Hadoop, visualization, and related software can get quite complex. Add in many other vendors’ choices around hardware and/or cloud offerings and it all gets confusing very quickly. Make sure you can facilitate connections between the underlying infrastructure, the various data repositories, and the overlying applications.

What do you reckon? Are there other factors which you consider critical in starting out? Is building an Enterprise Data Hub the right goal?

Cloudera would like to be both your software provider and your brainy friend when you need help tackling these challenges, and they probably do belong on your short-list of companies to evaluate. Just don’t call them a Hadoop distribution.

