Home // International Journal On Advances in Systems and Measurements, volume 9, numbers 1 and 2, 2016 // View article
Impact of the Entering Time on the Performance of MPI Collective Operations
Authors:
Christoph Niethammer
Dmitry Khabi
Huan Zhou
Vladimir Marjanovic
José Gracia
Keywords: collectives; late-arrivals; benchmarking; MPI collective operations
Abstract:
Collective operations strongly affect the performance of many MPI applications, as they involve large numbers, or frequently all, of the processes communicating with each other. One critical issue for the performance of collective operations is load imbalance, which causes processes to enter collective operations at different times. The influence of such late-arrivals is not well understood at the moment. Earlier work showed that even small system noise can have a tremendous effect on the collective performance. Thus, although algorithms are optimized for large process counts, they do not seem to tolerate noise or consider delay of involved processes and even a small perturbation from a single process can already have a negative effect on the overall collective execution. In this work, we show a first detailed study about the effect of late arrivals onto the collective performance in MPI. For the evaluation a new, specialized benchmark was designed and a new metric, which we call delay overlap benefit, was used. Our results show that there is already some potential tolerance to late arrivals for the most common collective operations - namely barrier, broadcast, allreduce and alltoall - but there is also a lot of room for future optimizations.
Pages: 48 to 57
Copyright: Copyright (c) to authors, 2016. Used with permission.
Publication date: June 30, 2016
Published in: journal
ISSN: 1942-261x