Venue | Category |
---|---|
SYSTOR'21 | Secure Deduplication |
S2Dedup: SGX-enabled Secure Deduplication1. SummaryMotivation of this paperS2DedupImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Some Insights (Future work)
Motivation
current solutions are built with strict designs that cannot be adapted to storage service and applications with different security and performance requirements
TEE can be leveraged to aid the process of secure deduplication
as it allows for sensitive data to be handled in its original form (i.e., plaintext) at an untrusted storage server
Limitations of state-of-the-art approaches
TED-Eurosys'20: estimated frequency counter can lead to the speculation that a block has a higher number of duplicates than in reality
Epoch-Dedup-CLOUD'17: needs proxies that add extra computation and network operations overhead
open-source link
Design goals
enable multiple schemes that can be adapted to the performance and security requirements
with stronger security guarantees than deterministic ones
avoid the network performance overhead imposed by auxiliary trusted remote servers
Overview
main idea: leverage trusted hardware technologies to enable cross-user privacy-preserving deduplication
at third-party storage providers
threat model:
a strictly stronger adversary that gain access to the server
architecture:
key points:
inline fixed-size block (4KB) deduplication
hashing and deduplication are performed exclusively at the server-side
.
rely on the enclave to perform the hash computation step
re-encrypt the unique blocks with a a single universal encryption key
for data storage
Secure deduplication schemes
plain security
epoch based
the enclave automatically changes the hash key based on epoch duration (e.g., a threshold)
the server is no longer capable for indefinitely testing for duplicates
estimated frequency based
epoch and exact frequency based
combine above both
use an in-memory hash table at the secure enclave that maps blocks' hashes and the exact frequencies
use epoch-based approach to set up a temporal boundary
Implementation
C, SGX SDK, SPDK
AES-XTS block cipher mode, HMAC-SHA256
index: consider two cases to evaluate I/O impact
Evaluation
workloads
DEDISbench: a disk I/O block-based benchmarking tool
real traces
Compared with:
I/O performance
the impact of resource consumption (i.e., RAM, CPU, network)
space savings
the first modular solution supporting multiple secure deduplication schemes
where happens the encryption?
standard practices suggest that users should encrypt their data before outsourcing it to third-party storage services
CE reveals matching ciphertexts, these schemes provide considerably less
security guarantees than standard encryption schemes
sensitive
dataepoch in deduplication
the information leakage can be reduced by performing deduplication in epochs
, which ensures that an adversary can only infer duplicates within the same epoch
the epoch needs to be synchronized
across all clients
it needs a secure proxy or TEE on the storage server
to control the change of epoch.
the proxies add extra computation and network operations in the critical I/O path
why TED is bad?
may overestimate the number of duplicates actually found for a given block
it does not prevent a malicious adversary from knowing exactly the number of references for chunks with low duplication counts
AES-XTS block cipher mode