@ATC'15 @Cloud Deduplication
CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal1. SummaryMotivation of this paper:CDStoreConvergent DispersalImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Future Works
- Fault tolerance in cloud server side (cloud storage providers, CDStore servers) Erasure Coding
- Fault tolerance in cloud client side offloading metadata management to the server side
- Exploits multi-cloud diversity (as long as a tolerabel number of clouds are uncompromised.)
- Two-stage deduplication to avoid insider side-channel attacks launched by malicious users.
- use deduplication to reduce both bandwidth and storage costs.
Goal: two secrets with identical content must generate identical shares. (make deduplication possible)
replacing the embedded random input with a deterministic cryptographic hash drived from the secret.
- Improve performance: replaces the Rivest's AONT with another AONT based on optimal asymmetric encryption padding (OAEP) (small size words a large-size, constant-value block)
- Support deduplications: replaces the random key in AONT with a deterministic cryptographic hash derived from the secret. (preserves content similarity)
Client side: before a user uploads data to a cloud, it first generates fingerprints of data, and then checks (queries) with the cloud by fingerprint for the existence of any deduplicate data that has been uploaded by any user.
Server side: After CDStore server receives the shares from clients, it generates a fingerprint from each share (re-compute again, instead of the use the one generated by client.). And check with its deduplication index again. The reason: to prevent the side-channel attack to gain unauthorized access to the share
- To reduce network I/Os: batch the shares to be uploaded to each cloud in a 4MB buffer.
- Variable-size chunking: Rabin fingerprinting
- To provide the reliability: offloading the metadata management in server side. distribute metadata across all CDStore servers for reliability.
- Multi-threading: intensive encoding/decoding operations at secret level, utilize the network transfer bandwidth.
- Encoding speed in client side (compare with AONT-RS, CAONT-RS-Rivest)
- Deduplication efficiency (deduplication saving)
- Transfer speed: Upload speed, download speed. (Two cases: duplicate data, trace-driven )
- Cost Analysis: estimate the monetary costs using the pricing models of Amazon EC2 and S3.