Say IT Ten Times Fast

speedometerBig data has a real complexity/credibility problem. There are too many variables to the system that make even basic product evaluation tricky for even the sharpest technical decision maker in the IT shack. Let's say you want to achieve a simple design goal for your environment, such as speed (or velocity if you insist). Certainly there is much talk about "fast data" these days. Look at the vendors all promising a billion rows a second*... But hark, what asterisk through yonder window breaks? Well, what exactly do you want to be fast? Fast ingest? Fast data integration or transformation? Fast modeling? Fast discovery? Perhaps you mean fast analytical calculations? Or fast querying? Fast, fast, fast. Good luck. 

You've got to break the problem down to specifics, and then play a marathon game of find the bottleneck. Disk too slow? Not enough memory? Scheduler not distributing workload evenly? Database organized in rows instead of columns? Another job hogging the resources? ETL taking hours? IT department showing a six month waitlist for anything but Sev-1 tickets?

Think you can just procure the one silver bullet to slay the dreaded beast of slowness? Ha. You're probably looking at buying and tuning a dozen different technologies from as many different vendors, with no clue how it will actually perform in your real world environment. 

I might politely suggest that an independent review of how that billion-rows-per-second was determined exactly. It isn't hard for a vendor to tweak the data set or optimize the query to look good, but a well documented lab validation will at least show what's what. Don't even get me started on other desired attributes like "secure" either....

 

big data analysis

Topics: Data Platforms, Analytics, & AI