Big Data's Big Problem

big data problemsI like big data and analytics. This should not be a surprise to my dear readers. Yet I don't really like the complexity of most "solutions" on the market today. Too many moving parts to be bought from vendors or downloaded from open source, then deployed, configured, integrated, tested, and managed. You need teams of data scientists, data architects, data stewards, data analysts, database administrators, data warehouse managers, etc. Not to mention the infrastructure teams for servers, storage, networking, security. Nor the application developers. This blog isn't long enough to list everyone who should be involved, and by the way, we're still talking within the IT department itself. Try to engage every interested stakeholder in the lines of business, and you'll fill a stadium for the weekly status meeting on your big data initiative.

Collaboration is good. Needing a small army to build a big data solution is bad. How will we as an industry address this problem? I see three possibilities:

  1. Consolidation. The big IT vendors will buy enough of the little ones that at least you can source most everything from a single technology provider. IBM, Microsoft, Amazon, and others are going this way now. Partnering helps, but not too much, not if I need to pick the right 8-12 vendors out of 1,000+ logos on an alliance list to assemble the components required. I've had Hadoop vendors show me their compatibility matrices, and while quite impressive, they are also terrifying. Twenty or more software modules, updated monthly, all needing to be perfectly aligned to work safely, and that's just for Hadoop, nevermind the rest of the technology stack.

    Someone has to bring it all together, and the sooner the better. Cloud feels like a big help here, not just for the quick on-ramp to hardware resources, but for making the software compatibility and updates somebody else's problem. Was it coincidence that the three vendors named above are also aiming for dominance in cloud-based services and infrastructure for big data? I think not.
  2. Simplification. Even if you successfully assemble all the pieces of the jigsaw puzzle, you still need to organize the data into an understandable picture. I don't mean just visualization, though of course that helps tremendously, but the ability to ask a question and get an accurate answer is frustratingly elusive. I studied geophysics and math at an Ivy League university, but I'm embarrassed to say I would likely struggle to use most analytics tools without significant effort, specialized training, and a lot of help from my friends.

    It shouldn't be so hard. Everyone loves to say "democratization", but frankly we're barely approaching the point where top computer science and statistics PhDs can leverage the information in front of them with the vendor and open-source applications available. I'm not suggesting you dumb it so "average" is the only function and a histogram the only output, but please, please make it easier.
  3. Embedification. Ok, I made up that word, but if we can't solve the first two issues, this is going to be the only feasible route to value. The people who actually make the business applications, products, and other applied solutions will have to absorb all of the complexity above and completely hide it from the user. This will greatly intermediate IT vendors (who do so love disintermediation) and give them way less influence with their customers.

    You'll have operational technology players like Siemens, Fujitsu, GE, and Hitachi running the show for IoT environments. You'll have business application vendors like Salesforce, SAP, and Oracle capturing the available budget. You'll have systems integrators like CapGemini and Accenture finally waking up, defining practices, and winning many more massive outsource and insource projects. I'm not sure this is end game most big data technology vendors want to see, but it'll be only M&A and OEM sales in the future unless they make some big strides, and pronto.

software defined storage insight

Topics: Data Platforms, Analytics, & AI