Venue | Category |
---|---|
FAST'14 | Post Deduplication, Compression |
Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility1. SummaryMotivation of this paperMigratory Compression (MC)Implementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
motivation
compression can find redundancy among strings within a limited distance (window size)
windows sizes are small, and similarity across a large distance will not be identified
main idea
coarse-grained reorganization to group similar blocks to improve compressibility
reorder chunks to store similar chunks sequentially, increasing compressors' opportunity to detect redundant strings and leading to better compression
two use cases
mzip: using MC to compress a single file, integrating MC with traditional compressor (e.g., gzip)
archival: data migration from backup storage systems to archive tiers
design considerations
partition into blocks, calculate similarity features
group by content and identify duplicate and similar blocks
output migrate and restore recipe
rearrange the input file
a large number of I/Os necessary to reorganize the original data
block-level
multi-pass (HDD)
implementation
use xdelta for delta encoding, the chunk earliest in the file is selected as the base for each group of similar chunks
based on DDFS
evaluation
datasets
compression effectiveness and performance trade-off
data reorganization throughput
delta compression
sensitivity to different parameters
the idea is very simple and easy to follow
very extensive experiments
the ways to improve compressibility