Venue | Category |
---|---|
FAST'12 | Delta compression |
WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression1. SummaryMotivation of this paperStream-Informed Delta CompressionImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
Motivation
Replicating data off-site is critical for disaster recovery reasons
Compressing data before transfer improves effective throughput
Background
Goal
achieves identity and delta compression across petabyte backup datasets with no prior knowledge
Observation
Repeated patterns in backup dataset
Similarity Index Options
Full sketch index
Partial sketch index
Stream-Informed sketch cache
Delta Replication Architecture
Fingerprints and chunks are laid out in containers and can be loaded into a fingerprint cache
Delta replication
Network Protocol Considerations for Delta Compression
both the source and destination must agree on and have the same base chunk
the whole workflow
the backup server sends the sketches of unique chunks to the repository
the repository checks the cache for matching sketches
if the backup server has the base fingerprints
Similarity Detection with Sketches
Chunks that have one or more features (maximal values) in common are likely to be very similar, but small changes to the data are unlikely to perturb the maximal values
a super-feature is formed by taking a Rabin fingerprint over k consecutive features
an index representing the corresponding super-features of previously processed chunks
Delta Compression
Trace
Configure
3 super-feature per sketch, 12 MiB sketch cache, 4.5 MiB containers holding meta data and locally
deduplication -> delta compression -> GZ compression
Evaluation
metrics:
sketch cache size
delta encoding
multi-level vs 1-level delta
sketch index vs Stream-Informed sketch cache
interaction of delta and local compression
WAN replication improvement
performance characteristics
each chunk stored in a container also has a sketch added to the meta data section (less than 20 bytes)
overhead
a complete system for delta-compression backup
good experiments
only consider the backup task under a WAN setting
aim to save network traffic, instead of the storage space
effective for the case where the network bandwidth is bottleneck
delta compression can fit the case with limited network bandwidth (WAN backup)