A trend that is often overlooked is the increase in network parallelism available on a node of a supercomputer. Modern network interface cards (NICs) such as Mellanox InfiniBand and Intel Omni-Path feature multiple network hardware contexts that serve as parallel interfaces to the network from a single node. The state of the art, however, is conservative in this regard. Applications typically do not expose MPI communication parallelism because MPI libraries today do not utilize such parallelism. MPI libraries, on the other hand, still employ conservative approaches, such as a global critical section and the use of only one network hardware context, because applications today do not expose any parallelism that the library can exploit.
This project investigates the above problem and proposes a fast MPI+threads library for high-performance inter-node communication.
This project is partially funded by Argonne National Laboratory via a DOE sub-contract.