Do you know how Datto SIRIS Handles Compression and Data Deduplication?
This blog post explains how the Datto SIRIS provides Data Compression & Data Deduplication of backup data stored locally on the Primary Storage Volume of the device.
Definitions
Data Compression: Process of encoding information by using fewer bits than the original source of information.
Data Deduplication: Process of comparing two or more previously compressed data sets and removing duplicate chunks of data. Use of efficient Data Deduplication is dependent on also having Data Compression performed.
In-Line Data Deduplication: Application of data deduplication during the copy on write process performed during a backup.
Example of How To Calculate Ratios for Data of Which Has Already Been Through Data Compression & Data Deduplication.
Most Data Compression & Data Deduplication processes use some form of the LZ adaptive dictionary based algorithm.
Creation of these dictionaries varies from one Data Compression process to another, however in this example we will pick out repeated words and put them into a numbered list which will allow us to list the words by their much smaller associated numeric value.
Example Quote
“He can compress the most words into the smallest idea of any man I know”
-Abraham Lincoln
Dictionary Created For This Example Quote
1. He
2. can
3. compress
4. the
5. most
6. words
7. into
— the (Dictionary value #2)
8. smallest
9. idea
10. of
11. any
12. man
13. I
14. know
New Sentence Created Using This Dictionary
“1 2 3 4 5 6 7 1 8 9 10 11 12 13 14”
The new compressed quote now uses 34 characters (including spaces).
The original quote contained 71 characters (including spaces)
Calculating The Data Compression Ratio of Abraham Lincoln’s Quote
Formula:
Original Size ÷ Compressed = Compression Ratio
In Our Example:
71 ÷ 34 =2.09x
Result Ratio of Our Example:
2.09x is The Data Compression Ratio
Example: Location of Compression Percentage Displayed In The Local Web Console For Each Protected Machine
DATTO Siris Device’s Use of Data Compression & Data Deduplication
When the ShadowSnap Agent transfers the initial base image to the appliance the backup data is Deduplicated. This is referred to In-Line Data Deduplication and ensures maximum compression & efficiency.
Data Compression and Data Deduplication is performed for the Protected Machines individually at the ZFS Volume Level. It is not performed for all protected machines at the Disk Level. While the idea of performing this at the Disk Level would make for outstanding Data Compression & Data Deduplication ratios, the reality is that when this was enabled for testing during the initial development of Siris devices would instantly lock up for days at a time while the ZFS filesystem attempts to Compress & Deduplicate the volumes to one another.
If a new Base Image is forced from the Local Datto Siris Web Console’s Advanced Options page a Differential Merge is performed.
Forecasting Data Compression & Data Deduplication Prior to Taking First Base Image
Providing an accurate ratio for expected Data Compression & Data Deduplication is not possible for any system, including systems completely unrelated to Datto.
Only very rough estimates can be drawn because the Data Compression & Data Deduplication process will perform differently between different types of data sets. Audio & Text files can have Data Compression Ratios up to 10x compression, while images can have compression ratios as low as 1.5x compression.
Due to difference between Data Compression Ratios of various file types there are too many variables to consider to accurately calculate Data Compression Ratios for Backup Data.
The determination of selecting the size of a Datto Siris is based on the scale of 2x. Therefore a collection of protected machines’ volumes’ used space should be roughly 50% that of the Total Size of the Datto.
Data Compression & Data Deduplication is performed with all backups, however the greatest Data Compression percentage will be seen during the initial base image as the In-Line Backup is being performed.
Example: Perfectly Sized S3000 (3TB) Device
Have any more questions about how Datto SIRIS handles compression and data deduplication? At Invenio IT, we have only Datto Certified Advanced Technicians on staff. Need some more help, contact us.