Venue | Category |
---|---|
MSST'10 | Deduplication |
dedupv1: Improving Deduplication Throughput using Solid State Drives (SSD)1. SummaryMotivation of this paperdedupv1Implementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Future Works
The limited random IO performance of hard disks limits the overall throughput of deduplication system
if the index does not fit into main memory
The size of the chunk index limits the usable capacity of the deduplication system.
8KB chunk size, 20 byte fingerprint: 1TB data 2.5GB chunk index It becomes necessary to store the fingerprint index on disk.
This paper wants to use solid-state drives (SSDs) to improve its throughput compared to disk-based systems.
However, most current SSDs suffer from slow random writes. Its design proposes to avoid random writes on the critical data path. The weak point of SSDs is the limited number of random writes.
after each filter step, the result of the filter determines which filter steps are executed afterwards.
EXISTING: the current chunk is an exact duplicate STRONG-MAYBE: there is a very high probability that the current chunk is a duplicate
a typical result after a fingerprint comparison.
WEAK-MAYBE: the filter cannot make ant statement about the duplication state of the chunk. NON-EXISTING: the filter rules out the possibility that the chunk is already known.
take index writes out of the critical path if the auxiliary index grows beyond a certain limit or if the system is idle, a background thread moves chunk metadata from the auxiliary index to the persistent index.
a block mapping contains multiple full chunks.
the amount of IO operations in the critical path is minimized because the log assures that the delayed operations can be recovered.
Generate first backup generation: 128GB subset of files Generate second backupo generation based on the first backup generation
take index writes out of the critical path. delay the IO such that the update operations can be done outside the critical path.