Powered by Bitpipe Information Security Research Library

 RESEARCH LIBRARY HOME   WHITE PAPERS   PRODUCTS   MULTIMEDIA   IT DOWNLOADS 
SEARCH the Research Library: HELP   |  WHAT'S POPULAR
sponsored by Data Domain
Posted:  10 Apr 2009
Published:  10 Apr 2009
Format:  PDF
Length:  14   Page(s)
Type:  White Paper
Language:  English


ABSTRACT:
Disk-based deduplication storage has emerged as the new-generation storage system for enterprise data protection to replace tape libraries. Deduplication removes redundant data segments to compress data into a highly compact form and makes it economical to store backups on disk instead of tape. A crucial requirement for enterprise data protection is high throughput, typically over 100 MB/sec, which enables backups to complete quickly. A significant challenge is to identify and eliminate duplicate data segments at this rate on a low-cost system that cannot afford enough RAM to store an index of the stored segments and may be forced to access an on-disk index for every input segment.

This paper describes three techniques employed in the production Data Domain deduplication file system to relieve the disk bottleneck. These techniques include:

  1. The Summary Vector, a compact in-memory data structure for identifying new segments
  2. Stream-Informed Segment Layout, a data layout method to improve on-disk locality for sequentially accessed segments
  3. Locality Preserved Caching, which maintains the locality of the fingerprints of duplicate segments to achieve high cache hit ratios.
Together, they can remove 99% of the disk accesses for deduplication of real world workloads. These techniques enable a modern two-socket dual-core system to run at 90% CPU utilization with only one shelf of 15 disks and achieve 100 MB/sec for single-stream throughput and 210 MB/sec for multi-stream throughput.


Authors

Hugo Patterson
Chief Architect ,  Data Domain

Benjamin Zhu
Data Domain, Inc.

Kai Li
Data Domain, Inc. and Princeton University



BROWSE RELATED RESOURCES
Backups | Data Center Management | Data Management | Data Storage | Disk Backups | Storage Consolidation | Storage Management | Tape Backups | Tape Libraries

View All Resources sponsored by Data Domain

Library Home |  White Papers |  Products |  Multimedia |  IT Downloads |  Partner with Us
 

Bitpipe Definitions: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Other
What's Popular at Bitpipe? Daily Top 50 Reports | Daily Top 100 Topics | Popular Report Topics | Popular Product Topics
Information Security Research Library Copyright © 1998-2009 Bitpipe, Inc. All Rights Reserved.
Designated trademarks and brands are the property of their respective owners.
Use of this web site constitutes acceptance of the Bitpipe Terms and Conditions and Privacy Policy.
webmaster@techtarget.com