Presentation
Comparing Distributed-Memory Programming Frameworks with Radix Sort
DescriptionDistributed-memory parallel processing addresses computational
problems requiring significantly more memory or
computational resources than can be
found on one node. Software written for distributed-memory
parallel processing typically uses a distributed-memory parallel
programming framework to enhance productivity, scalability, and
portability across supercomputers and cluster systems.
These frameworks vary in their capabilities and support for managing
communication and synchronization overhead to achieve scalability.
This paper employs a communication-intensive distributed radix
sort algorithm to examine and compare the performance, scalability,
usability, and productivity differences between five
distributed-memory parallel programming frameworks: Chapel, MPI,
OpenSHMEM, Conveyors, and Lamellar.
The Chapel implementation has the fewest source lines of code (113) and is the
most performant on 128 nodes of an HPE Cray Supercomputing EX (achieving about
17 billion elements sorted per second). The source code is available at
https://github.com/mppf/distributed-lsb, and we welcome contributions,
including optimizations to the implementations and results from runs on
different systems.
problems requiring significantly more memory or
computational resources than can be
found on one node. Software written for distributed-memory
parallel processing typically uses a distributed-memory parallel
programming framework to enhance productivity, scalability, and
portability across supercomputers and cluster systems.
These frameworks vary in their capabilities and support for managing
communication and synchronization overhead to achieve scalability.
This paper employs a communication-intensive distributed radix
sort algorithm to examine and compare the performance, scalability,
usability, and productivity differences between five
distributed-memory parallel programming frameworks: Chapel, MPI,
OpenSHMEM, Conveyors, and Lamellar.
The Chapel implementation has the fewest source lines of code (113) and is the
most performant on 128 nodes of an HPE Cray Supercomputing EX (achieving about
17 billion elements sorted per second). The source code is available at
https://github.com/mppf/distributed-lsb, and we welcome contributions,
including optimizations to the implementations and results from runs on
different systems.
Event Type
Workshop
TimeSunday, 16 November 202511:30am - 11:50am CST
Location230

