Argonne National Laboratory

Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation

TitleModeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation
Publication TypeConference Paper
Year of Publication2016
AuthorsWolfe, N, Carothers, CD, Mubarak, M, Ross, RB, Carns, PH
Conference NameSIGSIM PADS'16
Date Published03/2016
Conference LocationBanff, Canada
Other NumbersANL/MCS-P5571-0316
AbstractAs supercomputers close in on exascale performance, the increased number of processors and processing power translates to an increased demand on the underlying network interconnect. The Slim Fly network topology, a new low-diameter and low-latency interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this paper, we present a high-fidelity Slim Fly flit-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate our Slim Fly model with the Kathareios et al. Slim Fly model results provided at moderately sized network scales. We further scale the model size up to n unprecedented 1 million compute nodes; and through visualization of network simulation metrics such as link bandwidth, packet latency, and port occupancy, we get an insight into the network behavior at the million-node scale. We also show linear strong scaling of the Slim Fly model on an Intel cluster achieving a peak event rate of 36 million events per second using 128 MPI tasks to process 7 billion events. Detailed analysis of the underlying discrete-event simulation performance shows how the million-node Slim Fly model simulation executes in 198 seconds on the Intel cluster.