The most useless thing in the world is an untested backup[/HEADING>
Every team that's lost data had backups. Most also had backups that didn't restore — corrupt files, missed databases, broken scripts, expired credentials, off-site copy that wasn't synced for six months. The patterns below are what we apply to keep the gap between "we have backups" and "we can restore" as small as possible.
The 3-2-1 rule, with detail[/HEADING>
Three copies of the data. Two different media types. One off-site.
The detail that matters:
- The "three copies" includes the original. So: original + two backups.
- "Different media" originally meant tape vs disk. In 2026, it means different storage technologies — local disk + S3-compatible object storage is acceptable. Two copies on the same SAN doesn't count.
- "Off-site" means a different physical location, with no shared failure mode. Same data centre, different rack — not off-site. Different region of the same cloud — passable. Different cloud provider — definitely off-site.
The restore test, on a schedule[/HEADING>
Backups that haven't been restore-tested don't exist. The discipline:
- Quarterly: full restore of the largest database to a test environment, end-to-end
- Monthly: spot-restore of a random backup file to verify integrity
- Per-backup: integrity check (checksum, snapshot validation) automated as part of the backup job
The first time we did a quarterly restore test on a customer's "working" backup setup, we found that the backups had been silently failing for 6 months. Nobody had noticed because nothing had asked.
What to back up[/HEADING>
- Application databases — the obvious one, often the only one teams think of
- Configuration — IaC source, secret stores (encrypted), system config that's not in source control
- Object storage — user-uploaded files, generated artefacts. Often forgotten because "S3 is durable" — durability isn't the same as backup.
- Source code — yes, even though it's on GitHub. GitHub is a vendor; what's your fallback?
- Issue tracker / wiki — institutional memory often lives outside the codebase
- Email / communications archives — if they're business-critical, they need backup
- Encryption keys — backed up separately, with separate access control. Lose your key, lose the backup.
The list is bigger than most teams maintain.
Backup methodology[/HEADING>
- Full + incremental — periodic full, frequent incremental. Restore is full + replay incrementals. Common, well-understood.
- Snapshot-based — filesystem or volume snapshot. Fast to take, fast to restore, but the snapshot is colocated with the data unless replicated.
- Continuous (CDC, WAL streaming) — for databases. RPO measured in seconds, restore is "replay to a point in time".
- Object-storage versioning — for files in S3-style storage. Cheap, automatic, doesn't replace off-site.
The right choice depends on RPO (how much data loss is acceptable) and RTO (how fast you need to be back). Define both before picking the strategy.
The off-site question[/HEADING>
- Different cloud region — fine, fails on a region-wide event
- Different cloud provider — better, fails only on a multi-cloud event (rare but real)
- Cold off-line storage — paranoid but cheapest at scale; restore time is real
- Encrypted off-site at a colocation facility — old-school but still viable for regulated workloads
A pattern we ship: primary backup in same cloud, secondary in a different cloud / object storage provider, with weekly automated copy.
Encryption — non-negotiable[/HEADING>
Every backup encrypted at rest, in transit, with keys stored separately from the backup target. Otherwise the backup is a gift to anyone who breaches the storage.
The runbook[/HEADING>
A documented restore procedure that anyone on the team can follow. Includes:
- Where the backups live
- How to access them (credentials, paths, decryption keys)
- The restore command for each kind of backup
- Verification steps post-restore
- Communication template for "we're restoring"
The runbook is exercised at least quarterly during the restore test.
One pattern we'd warn about[/HEADING>
"The cloud handles backups." It doesn't. Cloud durability ≠ backup. Accidental delete, application bug that corrupts data, ransomware encryption — the cloud's high durability protects against drive failure, not against the things that actually destroy data.
One pattern that always pays off[/HEADING>
Game-day exercises. Once a year, simulate a full data-loss scenario. Restore to production-equivalent. Time it. Identify the gaps. Fix them before the real event.
What's your backup strategy? And — for the regulated folks — has anyone successfully passed an audit on multi-cloud encrypted backups without a third-party "backup-as-a-service" vendor?
Three copies of the data. Two different media types. One off-site.
The detail that matters:
- The "three copies" includes the original. So: original + two backups.
- "Different media" originally meant tape vs disk. In 2026, it means different storage technologies — local disk + S3-compatible object storage is acceptable. Two copies on the same SAN doesn't count.
- "Off-site" means a different physical location, with no shared failure mode. Same data centre, different rack — not off-site. Different region of the same cloud — passable. Different cloud provider — definitely off-site.
The restore test, on a schedule[/HEADING>
Backups that haven't been restore-tested don't exist. The discipline:
- Quarterly: full restore of the largest database to a test environment, end-to-end
- Monthly: spot-restore of a random backup file to verify integrity
- Per-backup: integrity check (checksum, snapshot validation) automated as part of the backup job
The first time we did a quarterly restore test on a customer's "working" backup setup, we found that the backups had been silently failing for 6 months. Nobody had noticed because nothing had asked.
What to back up[/HEADING>
- Application databases — the obvious one, often the only one teams think of
- Configuration — IaC source, secret stores (encrypted), system config that's not in source control
- Object storage — user-uploaded files, generated artefacts. Often forgotten because "S3 is durable" — durability isn't the same as backup.
- Source code — yes, even though it's on GitHub. GitHub is a vendor; what's your fallback?
- Issue tracker / wiki — institutional memory often lives outside the codebase
- Email / communications archives — if they're business-critical, they need backup
- Encryption keys — backed up separately, with separate access control. Lose your key, lose the backup.
The list is bigger than most teams maintain.
Backup methodology[/HEADING>
- Full + incremental — periodic full, frequent incremental. Restore is full + replay incrementals. Common, well-understood.
- Snapshot-based — filesystem or volume snapshot. Fast to take, fast to restore, but the snapshot is colocated with the data unless replicated.
- Continuous (CDC, WAL streaming) — for databases. RPO measured in seconds, restore is "replay to a point in time".
- Object-storage versioning — for files in S3-style storage. Cheap, automatic, doesn't replace off-site.
The right choice depends on RPO (how much data loss is acceptable) and RTO (how fast you need to be back). Define both before picking the strategy.
The off-site question[/HEADING>
- Different cloud region — fine, fails on a region-wide event
- Different cloud provider — better, fails only on a multi-cloud event (rare but real)
- Cold off-line storage — paranoid but cheapest at scale; restore time is real
- Encrypted off-site at a colocation facility — old-school but still viable for regulated workloads
A pattern we ship: primary backup in same cloud, secondary in a different cloud / object storage provider, with weekly automated copy.
Encryption — non-negotiable[/HEADING>
Every backup encrypted at rest, in transit, with keys stored separately from the backup target. Otherwise the backup is a gift to anyone who breaches the storage.
The runbook[/HEADING>
A documented restore procedure that anyone on the team can follow. Includes:
- Where the backups live
- How to access them (credentials, paths, decryption keys)
- The restore command for each kind of backup
- Verification steps post-restore
- Communication template for "we're restoring"
The runbook is exercised at least quarterly during the restore test.
One pattern we'd warn about[/HEADING>
"The cloud handles backups." It doesn't. Cloud durability ≠ backup. Accidental delete, application bug that corrupts data, ransomware encryption — the cloud's high durability protects against drive failure, not against the things that actually destroy data.
One pattern that always pays off[/HEADING>
Game-day exercises. Once a year, simulate a full data-loss scenario. Restore to production-equivalent. Time it. Identify the gaps. Fix them before the real event.
What's your backup strategy? And — for the regulated folks — has anyone successfully passed an audit on multi-cloud encrypted backups without a third-party "backup-as-a-service" vendor?
- Application databases — the obvious one, often the only one teams think of
- Configuration — IaC source, secret stores (encrypted), system config that's not in source control
- Object storage — user-uploaded files, generated artefacts. Often forgotten because "S3 is durable" — durability isn't the same as backup.
- Source code — yes, even though it's on GitHub. GitHub is a vendor; what's your fallback?
- Issue tracker / wiki — institutional memory often lives outside the codebase
- Email / communications archives — if they're business-critical, they need backup
- Encryption keys — backed up separately, with separate access control. Lose your key, lose the backup.
The list is bigger than most teams maintain.
Backup methodology[/HEADING>
- Full + incremental — periodic full, frequent incremental. Restore is full + replay incrementals. Common, well-understood.
- Snapshot-based — filesystem or volume snapshot. Fast to take, fast to restore, but the snapshot is colocated with the data unless replicated.
- Continuous (CDC, WAL streaming) — for databases. RPO measured in seconds, restore is "replay to a point in time".
- Object-storage versioning — for files in S3-style storage. Cheap, automatic, doesn't replace off-site.
The right choice depends on RPO (how much data loss is acceptable) and RTO (how fast you need to be back). Define both before picking the strategy.
The off-site question[/HEADING>
- Different cloud region — fine, fails on a region-wide event
- Different cloud provider — better, fails only on a multi-cloud event (rare but real)
- Cold off-line storage — paranoid but cheapest at scale; restore time is real
- Encrypted off-site at a colocation facility — old-school but still viable for regulated workloads
A pattern we ship: primary backup in same cloud, secondary in a different cloud / object storage provider, with weekly automated copy.
Encryption — non-negotiable[/HEADING>
Every backup encrypted at rest, in transit, with keys stored separately from the backup target. Otherwise the backup is a gift to anyone who breaches the storage.
The runbook[/HEADING>
A documented restore procedure that anyone on the team can follow. Includes:
- Where the backups live
- How to access them (credentials, paths, decryption keys)
- The restore command for each kind of backup
- Verification steps post-restore
- Communication template for "we're restoring"
The runbook is exercised at least quarterly during the restore test.
One pattern we'd warn about[/HEADING>
"The cloud handles backups." It doesn't. Cloud durability ≠ backup. Accidental delete, application bug that corrupts data, ransomware encryption — the cloud's high durability protects against drive failure, not against the things that actually destroy data.
One pattern that always pays off[/HEADING>
Game-day exercises. Once a year, simulate a full data-loss scenario. Restore to production-equivalent. Time it. Identify the gaps. Fix them before the real event.
What's your backup strategy? And — for the regulated folks — has anyone successfully passed an audit on multi-cloud encrypted backups without a third-party "backup-as-a-service" vendor?
- Different cloud region — fine, fails on a region-wide event
- Different cloud provider — better, fails only on a multi-cloud event (rare but real)
- Cold off-line storage — paranoid but cheapest at scale; restore time is real
- Encrypted off-site at a colocation facility — old-school but still viable for regulated workloads
A pattern we ship: primary backup in same cloud, secondary in a different cloud / object storage provider, with weekly automated copy.
Encryption — non-negotiable[/HEADING>
Every backup encrypted at rest, in transit, with keys stored separately from the backup target. Otherwise the backup is a gift to anyone who breaches the storage.
The runbook[/HEADING>
A documented restore procedure that anyone on the team can follow. Includes:
- Where the backups live
- How to access them (credentials, paths, decryption keys)
- The restore command for each kind of backup
- Verification steps post-restore
- Communication template for "we're restoring"
The runbook is exercised at least quarterly during the restore test.
One pattern we'd warn about[/HEADING>
"The cloud handles backups." It doesn't. Cloud durability ≠ backup. Accidental delete, application bug that corrupts data, ransomware encryption — the cloud's high durability protects against drive failure, not against the things that actually destroy data.
One pattern that always pays off[/HEADING>
Game-day exercises. Once a year, simulate a full data-loss scenario. Restore to production-equivalent. Time it. Identify the gaps. Fix them before the real event.
What's your backup strategy? And — for the regulated folks — has anyone successfully passed an audit on multi-cloud encrypted backups without a third-party "backup-as-a-service" vendor?
A documented restore procedure that anyone on the team can follow. Includes:
- Where the backups live
- How to access them (credentials, paths, decryption keys)
- The restore command for each kind of backup
- Verification steps post-restore
- Communication template for "we're restoring"
The runbook is exercised at least quarterly during the restore test.
One pattern we'd warn about[/HEADING>
"The cloud handles backups." It doesn't. Cloud durability ≠ backup. Accidental delete, application bug that corrupts data, ransomware encryption — the cloud's high durability protects against drive failure, not against the things that actually destroy data.
One pattern that always pays off[/HEADING>
Game-day exercises. Once a year, simulate a full data-loss scenario. Restore to production-equivalent. Time it. Identify the gaps. Fix them before the real event.
What's your backup strategy? And — for the regulated folks — has anyone successfully passed an audit on multi-cloud encrypted backups without a third-party "backup-as-a-service" vendor?
Game-day exercises. Once a year, simulate a full data-loss scenario. Restore to production-equivalent. Time it. Identify the gaps. Fix them before the real event.
What's your backup strategy? And — for the regulated folks — has anyone successfully passed an audit on multi-cloud encrypted backups without a third-party "backup-as-a-service" vendor?