Big data by definition combines many diverse sources of information into a central repository for analytics and delivers insights that couldn’t otherwise be found. This is healthy and good. Put all your medicines in one place for easy access when you need it.
There is a downside to this accumulation of assets, and it’s a privacy problem. Look in my backpack and you might find some Benadryl, fine, I don’t care if you know about my spring hayfever. Spend ten minutes in my medicine cabinet and you’ll really know all my dark secrets.
Big data essentially combines all of an organization’s medicine cabinets and says, “Hey, you, have a gander, and bring your friends, let’s throw a party.” Locking the front door with extensive perimeter security and mad dobermans kind of misses the point if I’ve invited you in to look around the place.
There is a solution to this privacy and governance issue, but few companies can deliver on it. The sensitive data must be appropriately anonymized or obscured, in a way that still allows the analysis but protects the innocent. This can be done through judicious application of masking and encryption.
My credit card number becomes XXXX-XXXX-XXXX-1234, my social security number is now FA#$!%^GA$^5QDFG#@$DF. You can see which drugs I take, whether prescription, over-the-counter, or under-the-counter. You know what I am willing to pay and when. My alias is Thaddeus Eckblad III. All extremely useful data on me, and I’m cool with that, just as long as you don’t know quite who I really am.
As a leader in data privacy, Dataguise has built some nifty tools to do just this in a few easy steps:
- Discover the sensitive data, both structured and unstructured
- Cover the specific fields with granular masking and/or encryption
- Then hand off the “safe” data for the advanced analytics
Address the concerns before moving the data into your cluster, lake, or hub. All of this is done in a very nicely visual dashboard, making it very easy to know what’s at risk and how it has been addressed, without inadvertantly exposing any other private info. It comes pre-integrated with common databases and Hadoop tools, role-based access controls, auditing and logging, and optional throttles on when to take action, too. You can even choose which law, regulation, or best practice to follow, and have relevant rules applied automatically.
Dataguise isn't the only vendor to be thinking about these issues, others such as IBM, Informatica, and Oracle also offer their own solutions to varying aspects of the problem, but there are interesting differences in the approaches.
For those few of us still concerned about our privacy, this is a great development for big data. Now everyone needs to get going and apply this kind of technology to their big data environments, pronto.