RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems

VenueCategory
SoCC'19Chunking

RapidCDC: Leveraging Duplicate Locality to Accelerate Chunking in CDC-based Deduplication Systems1. SummaryMotivation of this paperRapidCDCImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)

1. Summary

Motivation of this paper

RapidCDC

use the number of contiguous deduplicatable chunks immediately following the first deduplicatable chunk to quantify the locality.

the majority of duplicate chunks are in the LQ sequence.

Implementation and Evaluation

2. Strength (Contributions of the paper)

  1. propose a new chunking algorithm which leverages the duplicate locality to accelerate the chunking performance.
  2. an quantitative scheme to measure the duplicate locality.

3. Weakness (Limitations of the paper)

  1. the idea is not novel

4. Some Insights (Future work)

  1. For deduplication production system NetApp ONTAP system Dell EMC Data Domain: 4KB, 8KB, 12KB

    LBFS: 2KB, 16KB, 64KB

  2. The boundary-shift issue for insertion or deletion at the beginning of a store file.

CDC chunking: a chunk boundary is determined at a byte offset which can satisfy a predefined chunking condition.

  1. the difference in the hash function in chunking the hash function used for chunk boundary detection is different from the one used for fingerprinting

does not need to be collision-resistant