The Dilemma between Deduplication and Locality: Can Both be Achieved?

VenueCategory
FAST'21Deduplication System Design

The Dilemma between Deduplication and Locality: Can Both be Achieved?1. SummaryMotivation of this paperMFDedupImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)

1. Summary

Motivation of this paper

MFDedup

Implementation and Evaluation

2. Strength (Contributions of the paper)

  1. Neighbor-Duplicate-Focus (NDF) indexing: only detect duplicates of a backup version with its previous version
  2. Across-Version-Aware Reorganization (AVAR): classify and group according to the simplified reference relationship between chunks and versions.
  3. Good discussion in the limitation of current prototype.

3. Weakness (Limitations of the paper)

  1. This paper only considers two state-of-art rewriting techniques, however, there exists many works to solve the rewriting issues (FAST'18, FAST'19)

capping-FAST'13 HAR -FAST'14

4. Some Insights (Future work)

  1. Why focuses on hard-drive based deduplication

For backup storage, it remains one of the most significant use cases.

  1. Key insights

change the container from fixed-sized containers to variable-sized containers

  1. the discuss of the backup size:

VM backups: 100GB

the majority of backups are 50-500GB in Data Domain.