High-performance Communication

June 25, 2020

High-performance communication is critical for applications running on supercomputers. The traditional MPI everywhere (typically a process per core) model is not scaling on modern architectures. Hybrid MPI+threads (typically a process per node or socket, and one thread per core) is gaining popularity since it allows us to utilize the many cores on a node while sharing the remaining on-node resources. However, the communication performance of MPI+threads is poor especially when multiple threads are involved in communication. The reason for the poor performance stems from a quaint view, held by both MPI users and developers, of the network as a single device.

A trend that is often overlooked is the increase in network parallelism available on a node of a supercomputer. Modern network interface cards (NICs) such as Mellanox InfiniBand and Intel Omni-Path feature multiple network hardware contexts that serve as parallel interfaces to the network from a single node. The state of the art, however, is conservative in this regard. Applications typically do not expose MPI communication parallelism because MPI libraries today do not utilize such parallelism. MPI libraries, on the other hand, still employ conservative approaches, such as a global critical section and the use of only one network hardware context, because applications today do not expose any parallelism that the library can exploit.

This project investigates the above problem and proposes a fast MPI+threads library for high-performance inter-node communication.

This project is partially funded by Argonne National Laboratory via a DOE sub-contract.

Publications

Demystifying asynchronous I/O Interference in HPC applications

Shu-Mei Tseng, Bogdan Nicolae, Franck Cappello, Aparna Chandramowlishwaran

May 2021 The International Journal of High Performance Computing Applications (IJHPCA)

PDF Project

Logically Parallel Communication for Fast MPI+ Threads Applications

Rohit Zambre, Damodar Sahasrabudhe, Hui Zhou, Martin Berzins, Aparna Chandramowlishwaran, Pavan Balaji

April 2021 IEEE Transactions on Parallel and Distributed Systems (TPDS)

PDF Project

Optimizing the Hypre solver for manycore and GPU architectures

Damodar Sahasrabudhe, Rohit Zambre, Aparna Chandramowlishwaran, Martin Berzins

February 2021 Journal of Computational Science

PDF Project

How I Learned to Stop Worrying about User-Visible Endpoints and Love MPI

Rohit Zambre, Aparna Chandramowlishwaran, Pavan Balaji

June 2020 Proc. ACM International Conference on Supercomputing (ICS)

PDF Project Slides Video

Breaking Band: A Breakdown of High-performance Communication

Rohit Zambre, Megan Grodowitz, Aparna Chandramowlishwaran, Pavel Shamis

August 2019 Proc. 48th International Conference on Parallel Processing (ICPP)

PDF Project Slides

Scalable Communication Endpoints for MPI+Threads Applications

Rohit Zambre, Aparna Chandramowlishwaran, Pavan Balaji

December 2018 Proc. IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS)

PDF Project Poster

Talks

How I Learned to Stop Worrying about User-Visible Endpoints and Love MPI

June 29, 2020 — July 2, 2020 Worldwide online event

ACM International Conference on Supercomputing (ICS)

Rohit Zambre

Project Slides Watch