Venue | Category |
---|---|
FAST'22 | Distributed Deduplication |
The what, The from, and The to: The Migration Games in Deduplicated Systems1. SummaryMotivation of this paperMigration GamesImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
motivation
the high-level management aspects of large-scale systems (e.g. capacity planning, caching and cost of service) still need to be adapted to deduplication storage
data migration: file are remapped between separate deduplication domains, or volumes
volumes: a single server within a large-scale system, or an independent set of servers dedicated to a customer or dataset
optimize several possibly conflicting objectives
the main goal
formulate the general migration problem for deduplicated systems as an optimization problem
problem statement
minimizing migration traffic
load balancing
trade-off between minimizing the total physical data size and maximizing load balancing
evenly distribute the capacity load between volumes
traffic constraint, load balancing constraint
Greedy (extend SketchVolume)
iterates over all the files in each volume, and calculates the space-saving ratio from remapping a single file to each of the other volumes
each phase is allocated an even portion of the traffic allocated for migration
load-balancing step
capacity-reduction step
ILP (extend GoSeed)
all varaibles are boolean
objective: maximize the sum of sizes of all blocks that are deleted minus all blocks that are copied
acceleration methods
Clustering
main idea: files are similar if they are share a large portion of their blocks
hierarchical clustering
file similarity
traffic and load-balancing consideration
sensitivity to sample
constructing the final migration plan
trace:
evaluation
basic comparison between algorithms
sensitivity to problem parameters
formulate a general migration problem with three approaches
does not provide a system to apply its algorithm
hard to follow as the data migration problem is not common yet
related work
SketchVolume-FAST'19
GoSeed-FAST'20
Rangoli-SYSTOR'13
a greedy algorithm for space reclamation
data migration in distributed deduplication systems