We have been noticing an increase in the number of ‘prospective partners’ asking if StoreGrid supports the synthetic full backup feature. StoreGrid does not yet support this feature as we have always given a low priority to this feature in the past. But now that it is being frequently asked for, we have started implementing it and hope to have this feature in the next few months. Though we would always like to give our partners as much choice and flexibility while using StoreGrid for their online backup services business, this particular feature has been haunting me for sometime. I feel using synthetic full backup is a double edged sword; it may come to haunt you when things go wrong. In fact, some of our partners actually told us they would not use this feature at all because of the additional risks it introduces! Let me clarify some of these viewpoints and try to put all the pros and cons of the synthetic full backup feature-on the table.

What is Synthetic Full Backup anyway?

Protect Your Data with BDRSuite

Cost-Effective Backup Solution for VMs, Servers, Endpoints, Cloud VMs & SaaS applications. Supports On-Premise, Remote, Hybrid and Cloud Backup, including Disaster Recovery, Ransomware Defense & more!

Synthetic Full Backup is a way to create a new full backup without actually doing a full backup. The way it is done is by combining a previous full backup and the subsequent differential/incremental backups to “synthesize” a new full backup. Note that all of these are done at the backup server and hence it does not involve actual transfer of data from the clients to the backup server. Here is a definition of synthetic full backup on the web.

The advantage of using a synthetic full backup is that the client systems (the production servers and the user desktops/laptops) do not have to do a complete full backup periodically. This would reduce the load on client systems and the time taken for periodic full backups quite significantly. This is especially much more attractive in the online backup world because synthetic full backups eliminate the need to transfer large amount of data (involved during full backups) over the internet every time a full backup needs to be done. So far so good! So why not implement this right away considering that the advantages are so obvious. Hold your horses…

During synthetic full backup, the process of “synthesizing” a full backup is done at the backup server end. In order to “combine” a previous full backup with subsequent incremental/differential backups, the backup server should have access to the encryption key used to encrypt the backup data. Note that in the online backup world the encryption is done at the client end (the production servers and the users’ desktop/laptops). One of the most debated topics in online backup is about the security of the data that is backed up – will the service providers have access to the backed up data of their customers? Almost all online backup solutions, including StoreGrid, encrypt the data before the data is sent over the internet to the service provider’s storage cloud. And during restores the encrypted data is first restored to the client and then decrypted at the client end. So unless the backup server is given access to the encryption password, temporarily at least, synthesizing a full backup from a previous full backup and subsequent incremental/differential backups would not be possible.

Download Banner

But there are workarounds that can be implemented which would avoid the need to decrypt the encrypted data in the backup server for synthesizing a new full backup. Let me describe the workaround we are planning to implement and the resultant additional risks this introduces…

Firstly, for every file, StoreGrid does a full backup and then subsequently does differential backups (which is the block level differences between the current file and the content of the original file that was backed up during the full backup). This is done because if we were to do subsequent incremental backups (that is the block level differenced between the current file and the content of the file the last time it was backed up either incrementally or fully) all the time, instead of differential backups, then it is very difficult to implement versioning.

This is because for restoring the latest file we need to maintain the full backup file and every incremental backup that was done. In the case of block level differential backups, the latest file can be restored using the full backup file and the latest differential backup that was done.

So versioning is easier as we can delete the differential backups that are not required to be kept. This is illustrated in Figure 1 below.

which is the block level differences between the current file and the content of the original file that was backed up during the full backup

Differential backup

Considering the way we are doing full backups and differential backups, we plan to implement synthetic full backup without actually physically combining a previous full backup and a subsequent differential backup. Instead, as illustrated in the Figure 2 below, we would simply create a reference in the database for a synthetic full backup with the information about which previous full backup and the differential backup make up the synthetic full backup in question.

Figure 2

Figure 2

This information would have to be used only during restores. Thus by just keeping the references of full backups and differential backups required to make up a new synthetic full backup, we can eliminate the need to have the backup server decrypt the data for combining backups to synthesize a full backup.

What are the risks introduced by the above process?

If we have to follow the above approach (having just references in the database without actually physically combining different backups) forever by actually doing only periodic synthetic full backup (to avoid a normal full backup), then, as illustrated in Figure 3 below, restores can become more complex and time consuming.

Figure 3

As during restore of a latest file, the first full backup file and every subsequent synthetic full backup file have to be restored along with the latest differential backup for that file. If this involved tens or hundreds of synthetic full backups then the restore process will surely become quite inefficient. Besides a simple restore of the latest file could mean restoring data which was stored months or years before. This introduces additional risks as even if one intermediate block of data from a synthetic full that was done months before is corrupted for some reason then all the backups done after that would be invalidated and cannot be restored. This is a serious risk. This risk can be eliminated either by physically synthesizing a full backup by decrypting the data when synthetic backup is done or by actually doing periodic full backups without relying on the synthetic full backup feature. The former option would mean that the backup server should have at least temporary access to the encryption key which introduces security risk. The latter option makes the restore process inefficient in addition to increasing the risk of losing data because of a small corruption in a block of data stored months before.

What is our take?

We strongly believe that the fundamental philosophy behind having a robust and foolproof backup strategy is to have as much redundancy for the data as possible. Any backup strategy that sacrifices redundancy for storage efficiency or for reducing time taken for backups should be avoided if feasible. Hence, though StoreGrid would have support for the synthetic full backup feature in a few months time, we would strongly advise our partners to thoroughly analyze it and understand the implications before using this feature. Our recommended approach will always be to do periodic full backups of all the data. Perhaps, one can reduce the frequency of complete full backups by doing frequent synthetic full backups in combination with less frequent complete full backups. We would certainly not recommend completely doing away with a normal full backup altogether.

This was exactly the sentiment expressed by some of our partners when we spoke to them about this feature. Like in many other spheres of life, ‘natural’ is better than ‘synthetic’, I guess!

Rate this post