Version: AMD Zen HPL 2022-11

Dependencies:

  • This binary executable was built with AVX512 support, and will only run properly on systems that support AVX512 instructions
    • Specifically: AMD “Zen4”-based processors such as the AMD 4th Generation EPYC™ CPUs.
    • The binary will NOT run on AMD “Zen3”-based or prior processors.
  • The binary was built on Red Hat® Enterprise Linux® 8.6 and runs without issue on Red Hat® Enterprise Linux® 9 and UBUNTU® 22.04.
  • OpenMPI 4: This binary was built against OpenMPI 4.1.4 and should run without issue as long as OpenMPI 4 is in the PATH.

Recommended Settings:

  • Boost : ON
  • Transparent Hugepages : always
  • SMT : OFF
  • NPS : 4
  • Determinism : Power

How to Run:

  1. Modify the supplied HPL.dat file, according to the community tuning guide
    • By default, this will run a very small problem. For peak performance, a larger value for ‘N’ should be chosen, such that the memory use will be close to 90% of system memory. Ideally,  the N value will be a multiple of the NB value.
    • Other than selection of ‘N’, the supplied file is a reasonable starting place for most Zen4 systems
    • Peak single-node performance is typically found with 1 MPI rank per socket, and as many threads per socket as there are physical cores. This corresponds to P = 1, Q = 2.
    • AMD Zen HPL introduces a new hybrid panel broadcast mechanism, which can be enabled by setting BCAST = 7
  2. Check the ‘run.sh’ script.
    • By default, it sets the number of threads per rank to be the number of cores per socket, and the number of MPI ranks to 2.
    • If the HPL.dat file is changed to a different P & Q, these will need to be adjusted accordingly.
  3. (Optional) Clean up the system, and set various system parameters.
    • As root, invoke ‘reset-system.sh’, which will clean up system memory, tune for Transparent Hugepages, disable NUMA balancing, set the CPU governor and enable CPU boost.
  4. Invoke “./run.sh’