Venue | Category |
---|---|
FAST'16 | Deduplication |
Using Hints to Improve Inline Block-Layer Deduplication1. SummaryMotivation of this paperBlock-layer deduplication hintsImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Future Works
Important information about data context (e.g. data vs. metadata writes) is lost at the block layer.
This paper argues passing such context to the block layer can help improve the deduplication performance and reliability. The root cause: the semantic divide between the block layer and file systems
This paper proposes to design the interface in block-layer deduplication system, which can allow upper storage layers to pass hints based on the available context.
Most of existing deduplication solutions are built into file systems because they have enough information to deduplicate efficiently without jeopardizing reliability. This information can be leveraged to avoid deduplicating certain blocks (e.g., metadata)
application requirement: generate data should not be duplicated. (random data or encrypted data) Overhead: hash computation, index size, more RAM space, more lookup bandwidth. main issue: unique data and reliability
For metadata: Most file system metadata is unique
metadata writes are more important to overall system performance than data writes becasue the former are oftern synchronous. add deduplication to metadata might increase the latency of those critical metadata writes. reliability: duplicates metadata to avoid corruption.
accelerating future data writes by reducing lookup delays. inform the deduplication system of I/O operations that are likely to generate further duplicates (copy file) their hashes can be prefetched and cached to minimize random accesses.
cluster hashes: files that reside in the same directory tend to be accessed together.
no-hint v.s. hint-on
no-hint v.s. hint-on
provide richer context to the block layer.