CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal

@ATC'15 @Cloud Deduplication

CDStore: Toward Reliable, Secure, and Cost-Efficient Cloud Storage via Convergent Dispersal1. SummaryMotivation of this paper:CDStoreConvergent DispersalImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Future Works

1. Summary

Motivation of this paper:

CDStore

  1. Fault tolerance in cloud server side (cloud storage providers, CDStore servers) Erasure Coding
  2. Fault tolerance in cloud client side offloading metadata management to the server side
  1. Exploits multi-cloud diversity (as long as a tolerabel number of clouds are uncompromised.)
  2. Two-stage deduplication to avoid insider side-channel attacks launched by malicious users.
  1. use deduplication to reduce both bandwidth and storage costs.

Convergent Dispersal

Goal: two secrets with identical content must generate identical shares. (make deduplication possible)

replacing the embedded random input with a deterministic cryptographic hash drived from the secret.

  1. Improve performance: replaces the Rivest's AONT with another AONT based on optimal asymmetric encryption padding (OAEP) (small size words a large-size, constant-value block)
  2. Support deduplications: replaces the random key in AONT with a deterministic cryptographic hash derived from the secret. (preserves content similarity) 1548039339341

Client side: before a user uploads data to a cloud, it first generates fingerprints of data, and then checks (queries) with the cloud by fingerprint for the existence of any deduplicate data that has been uploaded by any user.

Server side: After CDStore server receives the shares from clients, it generates a fingerprint from each share (re-compute again, instead of the use the one generated by client.). And check with its deduplication index again. The reason: to prevent the side-channel attack to gain unauthorized access to the share

Implementation and Evaluation

  1. To reduce network I/Os: batch the shares to be uploaded to each cloud in a 4MB buffer.
  2. Variable-size chunking: Rabin fingerprinting
  3. To provide the reliability: offloading the metadata management in server side. distribute metadata across all CDStore servers for reliability.
  4. Multi-threading: intensive encoding/decoding operations at secret level, utilize the network transfer bandwidth.
  1. Encoding speed in client side (compare with AONT-RS, CAONT-RS-Rivest)
  2. Deduplication efficiency (deduplication saving)
  3. Transfer speed: Upload speed, download speed. (Two cases: duplicate data, trace-driven )
  4. Cost Analysis: estimate the monetary costs using the pricing models of Amazon EC2 and S3.

2. Strength (Contributions of the paper)

3. Weakness (Limitations of the paper)

4. Future Works