Venue | Category |
---|---|
DCC'14 | Delta Compression |
Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets1. SummaryMotivation of this paperDAREImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
Motivation
main idea
workflow
DupAdj: Duplicate-Adjacency based resemblance detection
insight: the modified chunks may be very similar to their previous versions in a backup system while unmodified chunks will remain duplicate and are easily identified by the deduplication process
DARE records the backup-stream locality of chunk sequence by a double-linked list.
avoid the computation and indexing overhead for the conventional super-feature approach for resemblance detection
Improved super-feature based resemblance detection
Previous studies: use of 4 features to generate a super-feature to minimize false positives of resemblance detection
Insight in this work
an improved super-feature approach with fewer features per super-feature
storage management
non-similar or delta chunks will be stored as containers into the disk
file recipe:
Implementation
Evaluation
the impact of the number of features per SF and the number of SFs used in resemblance detection
scalability of DARE data reduction
restore performance
Intuitively, restore the resembling chunks by two reads
can cache more logical chunks than a deduplication-only system
use the stream information in backup to find the similar chunks
the overhead performance of using both deduplication and delta compression is still very limited
Delta compression
detecting and compressing similar data missed by deduplication
remove redundancy among non-duplicate but very similar data files and chunks
poor scalability
delta compression vs. deduplication