For several years, there has been a tremendous demand for data scientists. Businesses and governments got really, really excited about all the possibilities from applying big data, and the data scientist was seen as the most critical role to make it happen. But a funny thing happened along the way. Much of the data scientist’s time was spent on data wrangling, meaning finding relevant data sources, preparing, classifying, integrating, cleansing, augmenting, improving quality, and addressing security, privacy, and governance concerns. Guess what? That’s not data science. That’s data stewardship.
The very nature of big data, with its wonderful variety, requires these problems be solved, while the popularity magnifies these issues. Likely you can’t manage what you can’t measure, but you definitely can’t measure what you can’t manage. Most everyone I speak with about big data initiatives now tells me that they have prioritized data steward functions in their project planning, and most have dedicated someone to be responsible for this area. It’s not just “nice to have” thing: The data steward is critical path.
A few recent events highlight the importance of this effort. Informatica’s upcoming acquisition for $5.3 billion is a good example. Long considered successful, the company has gained hugely on the growing understanding of how their offerings serve the data stewards of the world. The expanding portfolio of Informatica leads to more value, too, as it spans beyond integration and quality into top-of-mind-for-everyone security and privacy.
Further, this huge price tag will help propel all the “next generation” competitors and startups, which aren’t adapting their offerings to big data realities, but designing around the different environments from the start. Young companies like Paxata, Trifacta, Talend, Dataguise, and Bedrock Data all play in this space. I think they’re going to be happy with their valuations for upcoming funding rounds. Other larger incumbents like IBM and Oracle are surely happy to see the opportunity for growth, too. And the recent acquisition of Pentaho by HDS is another great example of a move to accelerate this class of solutions.
Don’t worry, dear data scientists, your jobs aren’t going away. But with a data steward by your side, perhaps you can actually do the really strategic work you were hired to do.