Venue | Category |
---|---|
SYSTOR'13 | Space Management |
Rangoli: Space management in deduplication environments 1. SummaryMotivation of this paperRangoliImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
hard to find an optimal space reclamation Space reclamation in non-deduped environments is simpler. (guarantee changes in the used space of the volume by an amount equal to the logical size of the file)
In this work, it proposes a fast and efficient tool which can identify the optimal set of files for space reclamation in a deduped environment.
Migrate them together to the new destination (storage efficiency preservation)
In this paper, it only considers the source centric dimension, and the destination is agnostic.
seek to partition the dataset such that most the data sharing between file within the same partition. files across partitions have little or no data sharing.
represent it as a bipartite graph.
In its FPDB, it stores <fp, block len, inode> such that there are multiple records with the same fp. Thus, it can achieve its goal via traversing the FPDB.
it contains one fingerprint record for every logical block of the file.
space reclamation is of the volume space. (each migration bin is approximately equal in size)
- Logical size of a bin :
- Internal sharing of a bin : denote the extent of data sharing of within the bin
- Sharing Across of a : denote the extent of data sharing of the bin with the remainder of the dataset.
four datasets: Debian, HomeDir, VMDK, EngWeb
fast and scalable and tested on real world dataset.
find the exact space reclamation and associated penalties (e.g., network cost, physical space consumption)
better than alternatives based on MinHash
min-hash, minimum hash