Venue | Category |
---|---|
HotStorage'12 | Delta Compression |
Delta Compressed and Deduplicated Storage Using Stream-Informed Locality1. SummaryMotivation of this paperDelta file systemImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
Motivation
Background
it builds the first storage system prototype to combine delta compression and deduplication
A key challenge
a large number of data regions to be indexed
previous work needs
Difference from DeltaWAN-FAST'12
previous work investigated stream-informed locality for replication across the WAN, that work was limited to low-throughput environments
stream locality
grouping neighboring chunks together as a cache unit and loading the group's fingerprints whenever one of them is queried in the index
File system
fingerprint is compared against the cache and potentially an on-disk index for a match
if no fingerprint match is found
If a sketch match is found
Practical considerations
Throughput
reading back a base chunk from disk is clearly the bottleneck in processing
GC
For a complete backup storage system
decrease the locality --> miss out on potential delta compression
Compression vs. Chunk size
Delta compression starts off slightly above 1 and grows steadily as the chunk size increases because it finds compression that deduplication is now missing
Data integrity
the integrity of user data is the highest priority for backup storage
Delta compression is costly in the sense of requiring extra computation and I/O