Venue | Category |
---|---|
HotStorage'16 | Deduplication |
Most deduplication-related publications focus on a narrow range of topic:
maximizing deduplication ratios read/write performance
This papers believes that there are numerous novel, deduplication-specific problems that have been largely ignored in the academic community.
non-deduplication system: a fairly straightforward to answer this kind of question since such systems How to write data? (the logical write value is not real value that to write) How to delete data? (the logical free value is not the real return value) A deduplication system is dynamic, it is hard to track a file in real time
an intuitive way is collected information periodically.
Future Research Opportunities:
latency, throughput Root reason: adds additional levels of indirection to map from a file representation to the data chunks locations. Performance drop off: turn sequentially written content into references to chunks scattered across the HDDs.
Future Research Opportunities
Shared content creates unpredicatable performance.
- Unauthorized access
- Knowledge of content
- Data tampering
By timing data transfers, it may also be possible to infer what already exists on a deduplicaion servers.
Reliability the complex relationship between deduplication and data reliability.
Intuitively, the combination of RAID, versioning, and replicating counterbalances a risk of data loss due deduplication How to quantitatively analyzing the reliability?
Future Research Opportunities
needs vendors to release such information
Future Research Opportunities
After reading this paper, I can get a high level picture of potential research topics in deduplication area. Some points can be followed
Security and reliability Capacity measurement