Venue | Category |
---|---|
ATC'22 | Deduplication |
Building a High-performance Fine-grained Deduplication Framework for Backup Storage with High Deduplication Ratio1. SummaryMotivation of this paperMeGAImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
motivation: fine-grained deduplication suffers from poor backup/restore performance
problem
reading base issue: reading base chunks from delta encoding (in backup process)
fragmentation issue: caused by a new kind of reference relationship between delta and base chunks (break spatial locality)
repeatedly accessing issue: repeatedly access containers to gather delta-base pairs (break temporal locality)
selective delta compression
insights: base chunks are not distributed evenly -> base-sparse containers
skips delta compression whose base chunks are located in "base-sparse containers"
delta-friendly data layout
change order-based data layout -> lifecycle-based data layout
two-level reference: directly referenced chunks and its indirectly referenced chunks
to simplify the implementation, only deduplicate redundancies between adjacent backups to ensure chunks' lifecycles are always consecutive (similar to MFDedup)
forward reference and delta prewriting
when performing a restore, delta-encoded chunks are always accessed before their base chunks
user space and backup space are asymmetric
prewrite delta chunks in the to-be-restored backup workload (in User space)
baselines
traces: WEB, CHM, SYN, and VMS
backup speed, restore speed, and deduplication ratio
I/O overhead in maintaining data layout
analyze several forms of poor locality caused by fine-grained deduplication
several designs: delta selector, delta friendly data layout, always-forward-reference traversing, and delta prewriting
hard to follow, especially for the third design
need a maintenance process to adjust the layout
term: call "delta compression" as "fine-grained deduplication"
all deduplicated chunks are stored in containers in order, and then each container will be compressed