Data management — not data protection — is the future.
We are already seeing the shift in many organizations where there are fewer dedicated ‘backup administrators’ and more diversified workload/platform owners (DBAs, vAdmins, File/Storage admins, IT operations, etc.) defining backup jobs or invoking restores, coupled with infrastructure that truly has data protection “built in.”
With so much revolution (not just evolution) happening in data protection, it is easy to imagine a future where folks care less about which repositories or data protection methods (backups, snapshots, replicas) are in use – and more about:
- How many copies do I need?
- How long do I need which versions?
- Where do they need to be to satisfy my SLAs?
Along the way, the infrastructure will need to not only deliver on those mandates, but unlock additional business value out of the data within those repositories through what some today call "copy data management" or "copy virtualization."
To understand how this transformation will happen, check out this short video on the 5 Cs of data protection (and data management… and data availability… and data preservation).
The real keys to the future are a real catalog (not just an index from your backup software) and a control (policy engine) layer that spans across the various backup tools, widgets, and repositories that many are stuck in today.
- If you are an ESG subscriber, you can read more on the 5Cs here.
- If you are a vendor who’d like to discuss about how your solution(s) fit within these 5 C’s, we’d love to have that conversation — let’s talk!
- And please feel free to leave your comments below.
As always, thanks for watching.
Hi, I'm Jason Buffington, Principal Analyst at ESG. There's a lot of evolution, transformation, and confusion around data protection these days. And frankly, what one vendor calls innovation, another vendor calls table stakes that they've been doing for years. And honestly, some of the innovations get less interesting as what used to be differentiators become commoditized.
So let's break it all down to what I call the five Cs of data protection. The first C is Container. Not the virtualization concept, but a container of just where will the data be stored, including tape, Cloud services, simple disk, and the duplicated disk. Yes, there are economic and agility differences between tape, disk, and Cloud, but at the end of the day most organizations really should be using all of these media types.
Our next C is Conduit, which is just a fancy way of saying data movers, including generic backups, workload-specific backups, replication, snapshots, direct production to protection storage. How are the ones and zeros going to move from the production storage services and servers to the containers of choice? Most organizations will use multiple data movers or conduits. You'll combine snapshots, and replication, and backup. You may have VM specific or database-specific backup mechanisms. And because you're gonna have multiple data movers, conduits, because you're going to have multiple containers, disks, and tapes, and Clouds, conduits and containers really are tactical how data protection will happen. But that's going to get muddier and heterogeneous over time. So let's look at what really matters in the strategic aspects of data protection.
Our third C is Control. It's the policy engine. Because you likely will have multiple data movers or methods, it sure would be beneficial to have a single set of policies that were governing all of it. If you have unified backup engine that protects all your workloads, and also integrates with your storage for snaps and replication, then you might have a leg up on this one. But as folks bring in supplemental tools for VMs, for databases, for SAS applications, the need for policy and oversight, whether it's in the data protection UI, it's just the management UI, or somewhere else, the lack of a robust policy engine control almost inevitably cause either over protection, needlessly protecting the same data in multiple ways, and consuming space that's unnecessary, or under protection, missing production workloads. Because the VM thinks the DBA is protecting the data, and the DBA thinks the VM is protecting the whole VM, and both are wrong, and neither one figures it out until the one asks the other for restore and they can't.
Our fourth C is Console. What is the UI that you're gonna use to monitor all these? Maybe it grew out of a traditional data protection or backup UI. Maybe it's your hypervisor UI or assistance management UI. But what console or consoles will you use, and who will be the operators of those consoles. And that being said, can you do more than monitor, observe? Can you manage by taking action? Can you gain insights to actually understand what's going on and why instead of just metrics or green, yellow, red? Ideally, can the infrastructure manage itself? Can it glean the insight to what's actually happening, compare that with the policies and the control layer, and autonomously adapt or enable?
Those last ideas are exciting because it gets humans out of the grime, and lets you focus on the smarter parts of IT that require creativity, not repetitive minutia. That said, neither you nor your autonomous data management solution can do much without the catalog. What data do you have? What are the regulatory, operational, or criticality of that data? Where are all the places that you have copies of that data, and why are you keeping it or each of its copies and for how long? If you have a robust catalog, not just an index of what you backed up or archived, but a rich and comprehensive understanding of the data across your various containers, then, perhaps only then, can you do things like copy it to management, keeping what you need, discarding the rest, enabling new scenarios. In fact, with the right catalog at the center of your policy controller and your management console, you really are in data management, not just data protection, data availability, or data preservation.
Five years from now, data protection may not even be a standalone thing or at least not built out of separate products or widgets. Instead, you'll define policies and the infrastructure will just do it, how often, how long, where at. You won't or shouldn't care about where the zeros and ones are stored, how they move between platforms, on what media they reside. As long as you can reliably get to what you need when you need it. You gonna care less about the conduits and the containers. What you will care about are the policies that define the SLA retention, the efficiency. Where is my stuff? Am I keeping what I need? Am I deleting what I don't? How else can I leverage what I've got? You're going to care more about control through your console powered by your catalog.
There's some other Cs that are worth consideration, Cost, Complexity, Credibility, Collaboration, Corporate Culture, but those will have to wait for another day. Data management is the future and it starts with at least five Cs.