As we focus on delivering products suitable for SMBs and IT professionals for, I feel a post on client side deduplication is something that should of great benefit to our audience. When it comes to SMBs and IT admins, they rely on different storage devices which in turn stores user’s data, such as photos, files, operating systems, etc. This data usually contains enormous duplicates, such as, multiple copies of folders, files or identical parts of a data set. Such duplicates can be eliminated by storing only one copy of the data with the others referencing it. This process of eliminating redundancy is known as data de-duplication.
Types of Deduplication:
Commonly used de-duplication methods are:
- File Level
- Block Level
File Level Data Deduplication:
File Level Data De-duplication eliminates redundant files. This approach is very simple and fast, but the ratio of de-duplication is very small. There might be redundant data in files, which file level de-duplication cannot find.
Block Level Data Deduplication :
In block level de-duplication, files are chunked into blocks of fixed size or variable size. This process is called chunking. These blocks are then deduplicated. The de-duplication percentage is very high in Block Level de-duplication when compared to file level de-duplication since there are more number of blocks which contain redundant data leading to more de-duplication. There are some block level deduplication approaches, one of them is Fixed-size chunking approach. It uses fixed length block size to find duplicates. The other approach is Variable-size chunking approach. This approach divides the file into variable length data chunks to find duplicates.It is one of the most widely used approach to deduplicate data since it has higher de-duplication percentage.
Methods of deduplication:
- Client-side deduplication:
Divides data stream into chunks of data to eliminates duplicate data at client side before being transferred to the server. It decreases the amount of data transmitted to the server.
- Server-side deduplication:
Eliminates duplicate data from a data stream before it is stored in server which improves the Storage Utilization by storing only the original data.
Pros and Cons of client-side deduplication:
Pros:
- Decreases the amount of data transmitted to server.
Cons:
- Additional computational overhead in client.
- Difficult to achieve same de-duplication percentage as server-side because it only deduplicates its own data.
If you’re interested in arguing with your thoughts, below is our comment box!