Venue | Category |
---|---|
FAST'13 | Deduplication Sanitization |
Memory Efficient Sanitization of a Deduplicated Storage System1. SummaryMotivation of this paperSanitizationImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
each piece of data on the physical media could be referred to by multiple namespace objects. standard techniques for tracking data references will not fit in memory
- all deleted data are erased
- all live data are available
- the whole sanitization process is efficient
- the storage system is usable while sanitization runs
access through regular file system interfaces.
reading blocks directly from disk, swap area, or unallocated blocks using non-regular interfaces.
access through exotic laboratory techniques require specific disk format
suppose it is a static version of the membership problem where the key space is known beforehand. no dynamic insertion or deletion of keys.
those two points support to leverage perfect hash vector.
- Merge phase: set the consistency point, flush the in-memory fingerprint index buffer and merge it with the on-disk index.
- Traverse the on-disk index for all fingerprints and build the perfect hash function for all fingerprints found
- Traverse all files and mark all fingerprints found as live in perfect hash vector
- select containers with at least one dead chunk, and copy all live chunks from the selected containers into new containers (copy forward), and delete the selected containers.
Evaluation: