Venue | Category |
---|---|
FAST'19 | Deduplication Restore |
Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance1. SummaryMotivation of this paperSliding Look-back windowImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Future Works
data fragmentation: data chunks are scattered read amplification: the size of data being read is larger than the size of data being restored
This paper focuses on reducing the data fragmentation and read amplification of container-based deduplication system.
improve data chunk locality make a better trade-off between the deduplication ratio and the number of required container reads (compared with capping-FAST'13) reducing the number of container reads is a major task for restore performance improvement.
storing individual data chunk directly cannot efficiently utilize the storage bandwidth for storage systems with low random read/write performance. (HDD) typically accumulates a number of data chunks in a container before writing them out together.
The limitations of capping (FAST'13)
avoiding the need to read these duplicate chunks from other old containers.
further reduce container reads achieve even fewer data chunk rewrites
rewrite duplicate data chunks from old containers that have CNRCs lower than the threshold.
The actual capping level is decided by
the threshold the distribution of CNRCs of these segments
Two bounds for
bound the number of rewrite chunks
Use those two bounds to determine the .
existing wasted rewrite chunks (since restore cache)
The rewrite decisions made with the statistics only from the current segment are less accurate.
The LBW acts as a recipe cache that maintains the metadata entries of data chunks in the order covered by the LBW in the byte stream.
metadata entry:
- chunk metadata
- offset in the byte stream
- container ID/address
- the offset in the container
With both past and future information in the LBW, a more accurate decision can be made.
Rewrite selection policy for LWB
The whole process of rewrite selection:
a container size of data chunks (added container) will be added to the front of the LBW one container size of data chunks (evicted container) will be removed from the end of the LBW
unique chunks non-rewrite chunks (duplicate data chunks that will not be rewritten) candidate chunks (duplicate data chunks that may be rewritten)
identify candidate chunks and write them to rewrite candidate cache
reclassify these data chunks
adjust at the end of each cycle
the container I/O time dominates the whole restore time (in HDD)
Trace: FSL select six types of traces in FSL
each trace contains 10 full backup snapshots
normal deduplication with no rewrite capping scheme (Capping) flexible container referenced count based scheme (FCRC) sliding look-back windows scheme (LBW)
For restore cache:
forward assembly area (FAA) adaptive look-ahead window chunk based caching (ALACC)
Deduplication ratio vs. Speed factor Restore performance
rewrite scheme
Good experiment
Does not show a complete system with proposed scheme
HYDRAstor, iDedup, Dmdedup and ZFS
Veritas and Data Domain benefit from the high sequential I/O performance and good chunk locality.
Insight: the motivation to propose an adaptive method
more interlligent rewrite policies
how to combine garbage collection with the rewrite design