Can you trust your backups?
A wise person once told me: “Show me the state of your backups, I’ll tell you the state of your information system.”
Is this still true today? Probably. But most certainly not in the most obvious sense. Let’s discuss database backups.
Backup Restore solutions
Backup solutions. Backup software. Call them what you want. Every time I see advertising for these I feel like I’m going to get scammed. What is the true purpose of taking a backup? Is to have the biggest, safest, most complete collection of backups of all your servers. Or is it to restore them if you ever need to? And if so, why aren’t they called restore solutions?
Out of all of you reading this, how many are confident that if they had to restore their whole system from backups that they would be successful? How many of you have even tested your restoration processes over the last 12 months? And if there are any of you who have, have those tests been successful?
Don’t get me wrong, I’m not saying backups are bad and we shouldn’t do them. I’m saying that if they aren’t tested regularly, they have a good chance of being worthless because something is probably wrong somewhere.
Out of my short (10 years) experience, I have not once successfully restored from a tape, all my Virtual Machine restorations from disk have failed because of some inconsistency in the VM state and in the end, only database restores from disk have been successful. Thank you SQL Server.
Backup location
What do you use backups for the most? Disaster recovery or because somebody screwed up?
If drinking coffee doesn’t wake you up in the morning, try deleting a table in the production database.
We have two main use cases:
– Disaster recovery. Low probability, huge impact.
– Somebody didn’t take his morning coffee: High probability, random impact
And then we have the “can you give me a copy of that data?” use case.
For the first one we need cheap but reliable solutions. And they need to be off-site. I’ll leave that to other people to talk about. It’s serious stuff. But remember, test those DR procedures. You don’t want a disaster on top of a disaster do you?
For people screwing up, the best solution is to have a backup handy. That means on-disk and locally. And probably with a process making it quickly available to the proper people. A 24h SLA on a production restore can be frustrating, if not more.
It’s so effective that in some places, it replaces snapshot replication to have daily automated copies of production databases on another environment.
But what about huge systems? Or critical ones that need a very low RTO?
Replication
When your system is a Data Warehouse or a huge production database (1TB+), are you really going to back it up every day? Do you use differentials? Doesn’t that make your backup chain longer and increase the risk? How long is your RTO?
And wouldn’t those systems benefit from having multiple query points to make the front-end more efficient?
Data replication is usually something to consider at this point. You get a live-backup on another machine, somewhere else ideally, and it can easily be accessible as read-only. The only downside is that you are paying twice for the infrastructure. 1 minute RPO and RTO. A backup tested by half your users every day. Isn’t it worth it?
Mistakes are replicated as well, but processes can help with mitigating those. I’ll try to share ours in the future.
Replication doesn’t prevent you from doing backups, but it does mean that they are the fail-safe of the fail-safe.
Now that we discussed backups: their necessity, their quick availability, how much can we really trust them? And by trust I’m not talking about reliably restoring them. How big of a security risk are they?
The security black-hole
Dissemination
Your backups are stored in one location. Your whole information system is stored aggregated in one single location. Let’s hope its safe.
Who can request restores? At which location?
Or worst: can anybody request a backup to work on it locally? The answer is “probably” in most places.
I’m always amazed at the number of .bak files I see on shared folders.
Readability
Take a backup. Open it with XVI32 (a light HEX editor). SQLServer Central has a nice demo to justify encryption:
Yes, it’s clear text. And the worst part is often that you are not doing your backups yourselves. You rely on another team. How did they configure it? TDE (Transparent Data Encryption) can save you there. Data is encrypted at rest, on disk, but also in backups.
The downside is that you need the certificate to restore on any server. It makes things a bit more complex and you have to make sure you don’t loose that certificate. But at least, you won’t have the surprise of have backups that can be restored on any system laying around on shared folders, user computers or even worst: the public Cloud (We all love Proofs of Concepts!).
Solutions?
Monitoring
Monitor your backup reliability. sp_BlitzBackups is awesome for this. Use it.
Backup security
Don’t leave data lying around. Remember that if you’re lucky, you could be in Europe and that database could contain personal data. Wouldn’t that be fun?!
Backup reliability
Keep your strategy in line with your needs. Keep it simple. Test it. It’s your DBA’s job, get him involved. Show him love.
If you want more on this topic, go read Brent Ozar’s page about RPO and RTO goals: https://www.brentozar.com/archive/2018/07/does-your-backup-strategy-achieve-rpo-and-rto-goals-of-the-business/
And as he says: Be prepared for an unplanned restore because that is usually when your backups really needs to shine… and often don’t.