Venue | Category |
---|---|
ACM CCSW'16 | Secure Deduplication |
On Information Leakage in Deduplication Storage Systems1. SummaryMotivation of this paperMethod NameImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Future Works
This paper argues that all existing proposals share the same goal that enabling cloud providers to deduplicate encrypted data stored by their users. However, they do not consider the information leakage incurred by access traces.
here, this paper mainly considers the service providers
Thus, here it considers the information leaked to a curious storage provider by systems that perform client-side deduplication and encryption.
storage provider can still acquire considerable information about the stored files without knowledge of the encryption key.
by compromise, coercion access traces: entries containing a timestamp, the object ID, the object size.
The adversary wants to identify which objects forms a file.
each leaf node represents an object of f as output by the chunking algorithm. Two kinds of nodes: file nodes and object nodes
This bipartite graph can be built by examining at the access traces of the storage server, without the need to learn the contents of the stored files.
anonymity set candidate set: C(f, G)
File absence: C(f, G) = \varnothing File presence: C(f, G)\neq\varnothing
How to quantify the probability that f is actually stored on S?
take into account how likely it is for any file f^{'} \neq f to have T(f^{'}) = T(f))
Base on Bayesian Network.
the adversary knows the fixed part in template and tries to infer the variable parts of the stored contracts.
- File-based fingerprints
- Fixed-sized block-based fingerprints
- CDC-based fingerprints
CDC-based fingerprints a file of size n can have a large number of possible CDC-based deduplication fingerprints.
this paper provides the lower bound for the number of possible deduplication fingerprints for a given file size n. Based on an modified version of the stars-and-bars theorem. Show two random files can have deduplication fingerprints collision.
CDC-based deduplication techniques exhibit a significantly smaller anonymity sets and therefore higher leakage when compared to those of fixed-block and file-based algorithms. If a candidate for such a file is found
Conclusion Existing public chunking algorithms allow an adversary to construct deduplication fingerprints and compare the simulated deduplication fingerprints to those on the storage.