Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication

VenueCategory
FAST'13Deduplication Restore

Improving Restore Speed for Backup Systems that Use Inline Chunk-Based Deduplication1. SummaryMotivation of this paperMethod NameImplementation and Evaluation2. Strength (Contributions of the paper)3. Weakness (Limitations of the paper)4. Future Works

1. Summary

Motivation of this paper

Restore speed in such systems often suffers due to chunk fragmentation.

Modern disks' relatively poor random I/O performance compared to sequential I/O, fragmentation greatly hurts restore performance.

Due to chunk sharing between backups, it is not possible in general to find a chunk layout that reduces the fragmentation.

Rearranging chunks can also be expensive

Chunk fragmentation and restore performance gets worse as time goes by, and affects the recent backups the most.

slowdowns of over three months for one and over two years for the other.

This paper investigates

  1. how fragmentation and restore speed behave over time and under different cache size?
  2. two approaches to improve restore speed in these deduplication system.

Method Name

  1. A backup (stream): a sequence of chunks generated by a chunking algorithm.
  2. Chunk container: store chunks, one container per incoming stream.
  3. Open container (4MB): the container that is used for storing new chunks from that stream.
  4. For RAID or recipe reference indirection, it is more efficient to blindly read the entire container even when we need only on chunk.

hold n chunk containers at a time and take container size space. Measurement factor: the mean number of containers read per of backup restored for the backups of a long term data set. Proof: reading containers is the dominant restore cost. 1559037217373

  1. raw I/O performance
  2. the container size

Capping trades off deduplication for faster restore speed

  1. In order to use fewer old containers, it has to give up deduplication
  2. instead of using a reference to an existing chunk copy in an old container, it will store a duplicate copy of that chunk in an open container and point to that copy.

Step:

  1. Divide the backup stream into several segments (default 20MB)
  2. The capping requires an extra segment-sized buffer in RAM for each stream being ingested.
  3. Choose a threshold T based on the information about which containers contain which chunks of the segment.
  4. Rank order the containers by how many chunks of the segment they contain, breaking ties in favor of more recent containers, and choose the top T containers which contain the most chunks.
  5. Append any "new" chunks to the open containers.

This process guarantees that the recipe for each segments refers to at most T old containers plus a little number of new containers containing "new" chunks.

In deduplicated backup streams, two points different virtual memory paging:

  1. the effective unit of I/O is an entire chunk container (4MB), whereas the unit of the use is a much smaller variable-size chunk.
  2. at the time of starting the restore: it can have the prefect knowledge of the exact sequency of chunks that will be used thanks to the backup's recipe.

Main step: Page in chunk containers to a single buffer but cache chunks rather than containers to avoid keeping around chunks that will never be used.

consult the next part of recipe to make better decisions about what chunks from the paged-in containers to retain.

Goal: need load each container only once per range (M-byte slice) and do not keep around chunks that will not be needed during this range

This method can also combine with ring buffer to further improve the efficiency of memory usage.

Implementation and Evaluation

  1. deduplication performance
  2. fragmentation
  3. restore performance

Details: 9000 C++ program

  1. full chunk index: map the hashes of stored chunks to the chunk container
  2. chunk container size: 4MB
  3. mainly focus on deduplication ratio and speed factor

2. Strength (Contributions of the paper)

  1. This paper shows that it is possible to give up a relatively small percentage of deduplication in practice and get quite substantial speedups.
  2. By using capping, it truly can reduce fragmentation.

3. Weakness (Limitations of the paper)

  1. Given a workload, maybe it is hard to estimate the degree of trade-off between speedup factors and deduplication ratio.

4. Future Works

  1. the idea of container capping is to achieve substantial speedups while giving up only a small amount of deduplication.
  2. This paper also mentions that for deduplication restore, it should use all available RAM for restoring a system for a single large forward assembly area and associated buffers.

Unless deduplication is at a great premium, at least a small amount of capping should be employed.

  1. Adaptiving capping
  2. Given a workload, how to measure the fragmentation level at different scale?