Venue | Category |
---|---|
FAST'11 | FTL Deduplication |
GoSeed: Generating an Optimal Seeding Plan for Deduplicated Storage1. SummaryMotivation of this paperCAFTL (Content-Aware Flash Translation Layer)Implementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
Motivation
the limited lifespan of SSDs, which are built on flash memories with limited erase/program cycles, is still one of the most critical concerns
The lifespan of SSDs
Data deduplication
data duplication
I/O duplication
intercept each I/O request and calculate a hash value for each requested block
Challenges
Main goal
The main workflow
intercepts incoming write requests at the SSD device level
use a hash function to generate fingerprints summarizing the content of updated data
design a set of acceleration method to speed up fingerprinting
small on-device buffer spaces (e.g., 2MiB) and make performance overhead nearly negligible
Design overview
a combination of both in-line and out-of-line deduplication
Hashing
Fingerprint store
manage an in-memory structure
optimization
range check, hotness-based reorganization, bucket-level binary search
Indirect mapping
a mapping table to track the physical block address (PBA) to which each LBA is mapped
maintain a primary mapping and a secondary mapping table in memory
the mapping tables in flash
the metadata pages in flash
reserve a dedicated number of flash pages (metadata page: LBA and fingerprint)
Acceleration methods
Sampling for hashing
Light-weight pre-hashing
Dynamic switches
Out-of-line deduplication
Implementation
SSD simlator
Evaluation
effectiveness of deduplication
performance impact
acceleration methods
SSD background
An erase block usually consists of 64-128 pages, each page has a data area (e.g., 4KiB)
read and write are performed in units of pages, and erase clears all the pages in an erase block
Rule:
Flash Translation Layer (FTL)
emulate a hard disk drive by exposing an array of logical block addresses (LBAs) to the host
Indirect mapping: track the dynamic mapping between logical block addresses (LBAs) and physical block addresses (PBAs)
Log-like write mechanism: the new content data is appended sequentially in a clean erase block, like a log
Garbage collection: periodically to consolidate the valid pages into a new erase block, and clean the old erase block
Wear-leveling: tracks and shuffles hot/cold data to even out writes in flash memory
Over-provisioning: In order to assist garbage collection and wear-leveling