Backups

I will be explaining the difference between a Full backup, a Differential backup and an Incremental backup. It is important to understand these three concepts in order to plan the most suitable backup strategy and be able to predict the storage required by a backup.

  • A full backup performs a copy of all the data
  • A differential backup contains all of the data that has changed since the last full backup.
  • An incremental backup only includes the data that has changed since the previous backup.

Let’s use the scenario below as an example of the differences between these three:

Backups

We will compare how the different backup strategies perform if we would execute them once a day under these conditions.

Day 0

We have an empty disk where a file named file.txt occupying 1KB of disk space has been created:

  • A full backup of the disk will consume 1KB of disk space, same as the size of the file.
  • A differential backup will perform a full backup the first it is applied to a disk.
  • An incremental backup will perform a full backup the first it is applied to a disk as well.

Day 1

We have added a new file named hello.txt of 2KB size.

  • A full backup of the disk will consume a space equivalent to the sum of the current files in the disk, 3KB
  • A differential backup will store all the data that has changed since the last full backup, so it will store the new file, consuming 2KB.
  • An incremental backup will store the difference between the previous state and the current state, consuming 2 KB

Day 2

On this day the file file.txt has grown to 5KB. Most backup systems do not detect what has changed inside the file, they will mark the file as changed and perform a full copy of it. We will work on this assumption.

  • A full backup of the disk will consume now 7KB
  • A differential backup will store both files, as both did not exist as they are now in the initial full backup, consuming 7KB.
  • An incremental backup will store the new file, consuming 5KB.

Day 3

A file named bye.txt of 3KB size is added.

  • A full backup of the disk will consume the sum of all files, 10KB
  • A differential backup will store all three files, consuming 10KB too.
  • An incremental backup will store the new file, consuming 3KB.

Summary

As can be seen from the diagram the total disk space consumed by the different strategies is quite different. At the end of day 3:

  • The full backup strategy has consumed 1+3+7+10 = 21KB of disk space
  • The differential backup strategy has consumed 1+2+7+10 = 20KB of disk space
  • The incremental backup strategy has consumed 1+2+5+3= 11KB of disk space

All of them offer the ability to restore the data to its state on any of the days, so in terms of disk space the best strategy to follow would be the incremental backup strategy. However this comes at a price:

  • Incremental backups depend on the previous backup to work. If any of the backup copies gets corrupted, backups after that copy will no longer work.
  • Incremental backups are CPU intensive, both at the time of performing backup and when restoring them. If a prompt recovery is needed, incremental backups might not be the best choice.
  • Incremental backups take incrementally longer to take. It is best to keep them in short-cycles, performing a full backup at regular intervals, such as once every week, resetting the cycle.

If the risk of using incremental cannot be assumed, the following strategy that consumes less disk space would be the differential backup strategy. Which has the following caveats:

  • Differential backups do not depend on the previous backup to work, they only depend on the initial full backup.
  • Differential backups are not as CPU intensive as the incremental backups and can perform a recovery at the same speed as a full backup.
  • If files in a system are added, but not modified (such a logging system), differential backups will perform extremely well in terms of used disk space, specially if they are combined with full backups at regular intervals.

Finally, a full backup strategy will be the strategy that will consume more disk space but in exchange it will offer more security to the user:

  • Full backups are independent to each other, they are basically a picture of the filesystem. If a backup copy is lost or corrupted, the rest are not affected.
  • Full backups are best used in combination with other strategies, performing them at regular intervals when the filesystem is not under a heavy load, like during the night or weekends.

Updated: