Design Tradeoffs for Data Deduplication Performance in Backup Workloads

VenueCategory
FAST'15Data Deduplication

Design Tradeoffs for Data Deduplication Performance in Backup Workloads1. SummaryMotivation of this paperDeFrameImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)

1. Summary

Motivation of this paper

DeFrame

Implementation and Evaluation

2. Strength (Contributions of the paper)

  1. provide an overview of the deduplication (all design parameters)

3. Weakness (Limitations of the paper)

4. Some Insights (Future work)

  1. It mentions that the uniform sampling achieves a significantly higher deduplication ratio.
  2. when we use logical locality, it needs extremely high update overhead

all fingerprints are updated with their new segment IDs in the key-value store.

  1. Although near-exact deduplication reduces the DRAM cost, it cannot reduce the total storage cost.
  2. Design decision

For lowest storage cost: EDLL is preferred (highest deduplication ratio, sustained high backup performance) For low memory footprint: ND is preferred,

NDPL: for its simpleness NDLL: better deduplication ratio For a sustained high restore performance: EDPL + rewriting