Schrodinger's Cat and Analytics Accessibility

schrodingers-catEveryone loves the concept of Schrodinger's cat, with the possible exception of a few serious PETA members. The metaphor that an entrapped feline can be both poisoned and/or not poisoned until directly observed is a catchy way to understand uncertainty around various possible states and outcomes.

I see a similar problem with big data. Companies are going to great lengths to gather data and sophisticated workflows to analyze it all. The goal, of course, being nothing less than omniscience about the business. Intimately know every project, every product, every customer, every dollar. The problem is sometimes the black box isn't open yet. Access to the information is limited. Business decisions are still guesses. The promised insight remains elusive. We are no closer to seeing if that poor cat is dead or alive.

Accessibility should be another V of big data. Volume, velocity, variety, veracity, et al. are all worthy qualities to pursue in a solution, but they don't amount to a hill of beans unless the people who need the answers get the answers. Within an appropriately governed framework, your average business manager should be able to see in the box anytime they want.

There are a few ways to pursue this goal. 

One way to think about accessibility is direct access. Just put access to all files in my file system tree, instantly visible on my desktop, and let me discover it anytime, independent of where it really lives, when it was last touched, or who originally created it. Engineering, research, and manufacturing firms could use this kind of knowledge base, but only a few vendors play here—one notable entrant being Peaxy.

Simpler business intelligence (BI) tools are also a good start. I mean really simple, like so simple that literally anyone working in an office can use it without explanation. A few companies are getting pretty close here, such as Microsoft's PowerBI, Birst, Domo, Pentaho, and Logi Analytics, though their assumptions about the aptitudes of an average office worker may vary.

Ok, I understand that sometimes accessibility can be important higher up the foodchain, and your SQL-loving business analysts may be the important cat-hunters in the organization. Having an accessible and friendly solution for SQL on Hadoop might be the right play, and there are dozens from which to choose. Some of my favorites are Actian's Vortex, Cloudera's Impala, MapR's (Apache's) Drill, Hortonworks' (Apache's) Hive, the emergence of Databrick's Spark SQL, and new player Arcadia for SQL "in" rather than "on" Hadoop. Startup AtScale kind of splits the difference between this approach and the previous, letting you bring your own BI tool to Hadoop.

The point is made. All the data lakes in the world are only as good as the access you provide. Open the box, let's see the cat already.

big data analysis

Topics: Data Platforms, Analytics, & AI