Argonne National Laboratory

Parallel I/O Optimization for Scalable Deep Learning

TitleParallel I/O Optimization for Scalable Deep Learning
Publication TypeConference Paper
Year of Publication2017
AuthorsPumma, S, Si, M, Feng, W, Balaji, P
Conference NameICPADS IEEE International Conference on Parallel and Distributed Systems
AbstractAs deep learning systems continue to grow in importance, several researchers have been analyzing approaches to make such systems efficient and scalable on high-performance computing platforms. As computational parallelism increases, however, data I/O becomes the major bottleneck limiting the overall system scalability. In this paper, we continue our efforts to improve LMDB: the I/O subsystem of the Caffe deep learning framework. In a previous paper, we presented LMDBIO—an optimized I/O plugin for Caffe that takes into account the data access pattern of Caffe in order to vastly improve I/O performance. Nevertheless, LMDBIO’s optimizations are limited to intranode performance, and LMDBIO does little to minimize the I/O inefficiencies in distributed-memory environments. In this paper, we propose LMDBIO-2.0, an enhanced version of LMDBIO that optimizes the I/O access of Caffe in distributedmemory environments. We present several sophisticated data I/O techniques that allow for significant improvement in such environments. Our experimental results show that LMDBIO-2.0 can improve the overall execution time of Caffe by more then 30-fold compared with LMDB and by 2-fold compared with LMDBIO.