|Title||Impact of Data Placement on Resilience in Large-Scale Object Storage Systems |
|Publication Type||Conference Paper |
|Year of Publication||2016 |
|Authors||Carns, PH, Harms, K, Jenkins, J, Mubarak, M, Ross, R, Carothers, CD |
|Conference Name||32nd International Conference on Massive Storage Systems and Technology (MSST 2016) |
|Date Published||05/2016 |
|Conference Location||Santa Clara, CA |
|Other Numbers||ANL/MCS-P5570-0316 |
|Abstract||Distributed object storage architectures have become the de facto standard for high-performance storage in big data, cloud, and HPC computing. Object storage deployments using commodity hardware to reduce costs often employ object replication as a method to achieve data resilience. Repairing object replicas after failure is a daunting task for systems with thousands of servers and billions of objects, however, and it is increasingly difficult to evaluate such scenarios at scale on real-world systems. Resilience and availability are both compromised if objects are not repaired in a timely manner.
In this work we leverage a high-fidelity discrete-event simulation model to investigate replica reconstruction on large-scale object storage systems with thousands of servers, billions of objects, and petabytes of data. We evaluate the behavior of CRUSH, a well-known object placement algorithm, and identify configuration scenarios in which aggregate rebuild performance is constrained by object placement policies. After determining the root cause of this bottleneck, we then propose enhancements to CRUSH and the usage policies atop it to enable scalable replica reconstruction. We use these methods to demonstrate a simulated aggregate rebuild rate of 410 GiB/s (within 5% of projected ideal linear scaling) on a 1,024-node commodity storage system. We also uncover an unexpected phenomenon in rebuild performance based on the characteristics of the data stored on the system.