In this ESG Video Blog, ESG Founder Steve Duplessie makes a case for a Universal Distributed File System.
Announcer: The following is an ESG Video Blog.
Steve: Hi there. I'm getting old now, so forgive me if it seems like I've said these words 20 years ago, because I most likely did. To steal a term from ESG mega analyst Mark Peters, for 50 years or so, we've been in the IT business, but IT really has meant infrastructure technology. We're finally moving into the actual valuable part of the tech revolution or the real IT business where IT means information technology.
Information is data, and data is the life blood of the enterprise and fuels the business processes that either drive competitive advantage and profitability or it kills it.
Data management is the key to our digital life, not box management. It sounds obvious, but if you look at how much time and money we actually spend managing IT stuff, it's about 80% or more about managing boxes and silos of data and kit [SP] and less than 20% on actually managing and leveraging data itself.
So why? If we all intuitively know this, why are we still doing it? In the old days, like in 2010, our world was made up of distributed, disparate silos of applications and data and kit spread far and wide, with no real way to ever holistically even know what was where, let alone drive value out of all of it.
We've been forced to understand each silo individually to know exactly what's on that silo, and then to figure out what gold we want to try and mine out of that silo. If the keys to the kingdom are sitting one silo over and we don't know it, we miss out.
In the last five years, things have not gotten better. I'd argue they've gotten worse. Sure, the public cloud lets us plop tons more stuff into a single, logical place, but what has it done for us to unify the data assets that still sit outside of it? Nothing. Worse, the public cloud is clumsy and designed to be the Data Hotel California. You can come in, but you can never leave.
In order for me to find gold in them there clouds, I still need to know that the data that matters reside within that silo, albeit a very big silo.
Data is increasingly coming from multiple diverse source across multiple sites, public and private clouds, and the piles and piles of on-premise and remote locations that we've accumulated over decades. In monetizing the business value of the data that sits in all these silos is increasingly an on-demand activity, how to pull multiple data sources, structured and unstructured, across multiple sources for things like big data analytics.
Effective and efficient data management requires eliminating the traditional constraints that hold back productivity, or at least the possibility of finding gold.
For example, first, data is currently bound to its underlying infrastructure, whether it's an array, a specific site, or a public cloud, it's stuck in silos. In a perfect world, none of that would matter. We wouldn't be concerned about what box or silo something resided on, only that it was ours. Do you really think the business cares what super duper box their data sits on? They don't.
Second, there's a total lack of application analytics self-service capabilities. It's essentially impossible, at the very least, painful to get the data where and when you need it. Ideally, application and analytics teams could have full self-service control to maximize their business impact. Like why are hiding the crown jewels from those who turn water into wine for us?
Third, cost. All of this is inefficient and expensive at the infrastructure, OpEx, and the business processor or user level. Clouds alleviate some of these concerns by providing massive, elastically scalable OpEx-only environments. Clouds represent a revolutionary opportunity in data management capability simply because of their sheer size alone. But cloud-based solutions are not the end game. The public cloud is still just another silo, a massive silo, but it's still a silo. And thus, it carries the same concerns regarding vendor lock-in or lack of data mobility that other things have. Also the cloud is non-ideal for all data for various reasons -- security, governance, compliance. Not all apps are gonna run on the cloud for performance or whatever reason.
Eventually, a true hybrid cloud solution is the end game, where diverse data-centric workflows can leverage multi-site, cross-cloud infrastructures right along with on-premise and remote silos where data access is unified and self-service becomes a reality. And that doesn't happen until we accept and adopt a single-name space, truly global file system construct.
To deliver a true hybrid solution, data management must be unified across these environments, locations, and applications. Many approaches have been tried -- complex gateways or overlays, etc. -- but one of the simplest and most powerful would be the single, global name space of a true, distributed, dynamic file system. Why? It would eliminate data silos by unifying data across all those environments and locations and applications which would then enable self-service for application owners and analytics gurus.
A file system versus an object store, sure, object stores are cool, but reality says every application in human already knows how to deal with a file system, making the file system object smarter is probably a better way to go.
Turning a file system into an object store makes sense, and I'm not sure we even bother with the block-level world because that's just way too low level.
Can it be done? Sure, but we have the technology for decades, but it's always been brutally hard to deal with what has been a science experiment. But there's hope on the horizon and someone's gonna figure it out, not only how to make this work, but how to make it work in human terms so it's easy to deploy, manage, and most importantly, actually use.
We'll have to ensure the things we like about our silos continue, like enterprise-type functions, snap and copy data management, and SLA adherence, fault tolerance, etc. But just imagine what the world could look like when we stop worrying about the million underlying pieces, and instead, just focus on what matters -- the data.
The cloud has proven that we can do it. No one worries how Google or Amazon is actually making the sausage. We just enjoy the taste. What if we can do that across all of our data repositories and silos? What if we truly stopped worrying about where things were and how they function, and instead, just concentrate on the outcomes we wanna attain?
Soon, my Tesla's gonna drive me home without my help, which is good for everyone. But I'm not gonna get home and then poke around in the guts of the car's technology to twiddle with bits. Instead, I'm gonna go to sleep in the passenger seat, so try not to wake me up.