Matters of Integration in the Data Economy

At the Informatica analyst conference in Menlo Park, CA during the waning days of February, James Markarian, Informatica’s CTO, reignited a line of thinking for me around the concept of “The Data Economy” – my term, not his. You might legitimately ask, “How is the data economy any different than the information technology market?” My notion of data economy overlaps with what most would consider the IT market, which I will discuss below. But first here are a few of the viewpoints I accrued from Mr. Markarian’s talk, supplemented by some of my own viewpoints, which spawned my thinking about The Data Economy, and Informatica’s role therein:

  • Few organizations actually know precisely the data they possess – where, what, or why; e.g., there are massive quantities of data generated by organizations that are never or seldom used.
  • Typically, organizations seriously over-provision storage and related infrastructure for data – better safe than sorry, particularly if you don’t have a really good handle on your data.
  • The term “data science” is overstated, in that scientific principles are not part of a data scientist's practices – we might be better off using the term “data artist.” On one hand this is nitpicking over job titles, but it highlights how far we still have to go in terms of managing and applying our data.
  • The currently hot approach to big data analytics, Hadoop, (and some are now opining that Google’s open source Dremel may soon start out-Hadooping Hadoop) involves dumping all kinds of data into a solution designed to deal with over-provisioning and to compensate for lack of data understanding: Create three haystacks of data, and start looking for a needle. If you stripped the “Hadoop” off of the label, and told somebody that is how you were going to architect a solution for analytics, they might think you were crazy.

The summarized message goes something like this: Organizations would do well to invest in understanding their data to ensure that the right information arrives at the right time for the right people in the right context of consumption, and the practices to make this happen simultaneously decreased business risk and IT costs. The more far-reaching factor is that those organizations who get their arms around their data can better participate in the data economy. Here are some bread crumbs about the data economy:

  • Recent hype has talked about the coming era of the CDO – which can mean either Chief Data Officer or Chief Digital Officer. The Chief Digital Officer is a line of business executive overseeing the “digital business” of a company – enabling the Web 2.0 business as an adjunct to Brick and Mortar business is a simple but pretty accurate description. Media companies better have Chief Digital Officers, in name or in function, given the increasingly digital nature of their business. The Chief Data Officer, however, operates more inside the firewall, helping the organization to understand and yield their data for benefit across-the-board. Suffice it to say, no matter which CDO you are, your job is to ensure your organization has a strong grip on data, both inside and outside the firewall.
  • There are companies that entirely make their living by providing data to others, usually through aggregation and refinement, and most will help with the tools for analyses should clients not want to bother doing it entirely on their own. Examples that come to mind are Experian, First Data, Reed Elsevier (e.g., Lexis Nexis), InteractiveData (where I worked once upon a time), FICO and Zyme Solutions. One could also include government data services, e.g., Edgar for SEC data, in this category. Data transformations are required when moving data to/from these services.
  • Web 2.0 companies, Google, Yahoo, eBay, Amazon, are primarily data-driven organizations. Ironically in Web 2.0 the CEO or the COO are typically the de facto “Chief Digital Officer.”
  • Industry and quasi-industry data exchanges and related standards bodies are part of the data economy. Examples include SWIFT in financial services, the Energy Technology Data Exchange (ETDE), and the Petroleum Industry Data Exchange (PIDX).

Note that in the above I have not mentioned an Oracle database, an EMC disk drive, SAS analytics, Cisco networking, Teradata data warehouse, Amazon Web Services, Hadoop, or even Informatica for that matter. Also note that I have barely started scratching the surface of the data economy. That said, the Web, Saas, and public clouds seriously enable the data economy. Without that pervasive network the data economy would be limited to private networks (e.g., EDI VANs), which are not to be trifled with, but long-term are checkers to the global network’s chess.

I believe we are still at the onset of the data economy in terms of the global network--not even two decades worth yet. How the world and business and individuals will operate in another 20 years due to the data economy is beyond my imagination. I agree with Mr. Markarian’s remark, however, that hopefully all that data will offer something of more value than more advertisements. If I have to deal with pop-up ads whilst wearing Google sunglasses, well, I just will not buy them no matter how cool they are – I just can’t wrap my head around it.

One of Mr. Markarian’s final remarks stated that Informatica was the “dark matter” that held it all together, and by “it” I believe he mainly meant data inside the firewall, but also increasingly in the larger data economy. I believe that Informatica’s early and ongoing investment in cloud integration, and resulting successes, point to the shift from an internal to a more external point of view about the value of data.

What the data economy means to Informatica is that this dark matter will not experience reduction in demand for the foreseeable future. Data is a growth business, and the data economy will continue creating pull for dark matter, just as machine data, social media, and SaaS create push for dark matter. I believe “dark matter” includes, (1) the accurate transformation of data so that sender(s) and receiver(s) see it in their own particular context and their own device(s) of consumption – for transactions and/or intelligence; (2) the security and protection of that data in flight; (3) the tools to help organizations gain a better idea of what data they have, how to potentially use it, and where to find useful 3rd party data to supplement it. Use whatever three-letter acronyms you prefer to describe the above.

Informatica’s challenge is to continually evolve its own architecture to ensure the dark matter readily crosses into any network and deployment option, and deals with transformations for any use case whether standards-based, customized, or proprietary. Informatica’s job concerns helping organizations bring data to light, and helping secure/protect organizations as they do this. They are the leading vendor in this endeavor, but in the data economy nobody should rest on their laurels. Finally, the world is not merely experiencing “data explosion” but more pertinently a “data economy explosion.” If you believe in the data economy explosion like I do, Informatica remains one of the most strategically well-positioned technology suppliers on the planet for the long-term.

Topics: Data Platforms, Analytics, & AI