The commercial cloud storage services are in favor of the cross-user client-side fixed-size-chunk-level data deduplication to reach the highest deduplication gain.
the user uploads the file hash as the duplication check requestthe cloud may check the status of file existence and then sends a binary duplication check reponse (dc response) to the user (suppress the explicit uploading of the file when the deduplication occurs)
The threat from side-channel
the user needs a deterministic response from the cloud to know whether the further uploading of the file is necessary.
Key pointAny deterministic response can be seen as an indicator of privacy leakage.
RARE
Key pointthe privacy leakage of side channel is due to the deterministic relation between duplication check request (dc request), duplication check response (dc response)
this work intends to reach the probabilistic relation by allowing the cloud to randomize the dc response.keep the deduplication gain to the certain degree and eliminate the leakage of chunk existence status.
Main idea
duplicate check of single chunk does not give sufficient room for dc response randomization, this paper performs the duplicate check on two chunks at once
dirty chunksthe chunks have been queried but not uploaded eventually can be exploited to perform repeated duplicate checks
Privacy notion
existence privacy and inexistence privacy
dc response does not give any extra information about the existence status of a determined chunk.
weaker version of existence privacy
Check double chunks In order to hide the chunk existence status, RARE carries out the encodings on both the dc response and the chunks to be uploaded.
Dirty chunk listRARE prevents the case that the attacker performs duplicate check but does not upload queried chunks
it implements a dirty chunk list to keep all hashes of chunks that have been queried but are not uploaded eventually.
Security analysisThis work aims to achieve the inexistence privacy and weak existence privacy.
Implementation and Evaluation
Evaluation
Dataset Enrom Email dataset
communication costthe number of bits required during the entire chunk uploading process
duplicate check (dc response) and explicit chunk uploading (chunk)
2. Strength (Contributions of the paper)
Parameterless configuration RARE does not have the parameters that need to be determined manually.
relieve the burden for engineers
No independent serverRARE only involves the interactions between the user and cloud.
3. Weakness (Limitations of the paper)
the use of dirty chunks actually compromises the deduplication benefit. All of the dc requests relevant to dirty chunks will not trigger deduplication.
4. Future Works
the random response scheme is a typical method to defend the side channel attack in client side deduplication.
how to design a tunable random response scheme to balance the overhead of given up storage efficiency and security gain.