Navigation

Spack

AMD Toolchain with SPACK

Micro Benchmarks/Synthetic

SPACK HPC Applications

Introduction

The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. WRF features two dynamical cores, a data assimilation system, and a software architecture supporting parallel computation and system extensibility. The model serves a wide range of meteorological applications across scales from tens of meters to thousands of kilometers.

WRF official website: https://www.mmm.ucar.edu/weather-research-and-forecasting-model

Note: The “stderr” and “stdout” are lost when Spack exits because “stdout” and “stderr” are stored in a Python™ string (GitHub Link). Build might fail, if default /tmp is smaller than “stdout”, to avoid this failure always set TMPDIR.
Example: export TMPDIR=$HOME/temp

Build WRF using Spack

Reference to add external packages to Spack: Build Customization (Adding external packages to Spack)

# Format For Building WRF
$ spack -d install -v wrf@<Version Number> %aocc@<Version Number> target=<zen2/zen3> build_type=dm+sm ^jemalloc ^hdf5@<Version Number>+fortran ^netcdf-c@<Version Number> ^netcdf-fortran@<Version Number> ^openmpi@<Version Number>+cxx fabrics=auto
# WRF 3.9.1.1
# Example: For Building WRF 3.9.1.1 with AOCC 3.1.0
$ spack -d install -v wrf@3.9.1.1 %aocc@3.1.0 target=zen3 build_type=dm+sm ^jemalloc ^hdf5@1.8.21+fortran ^netcdf-c@4.7.0 ^netcdf-fortran@4.4.4 ^openmpi@4.0.5+cxx fabrics=auto
# Example: For Building WRF 3.9.1.1 with AOCC 3.0
$ spack -d install -v wrf@3.9.1.1 %aocc@3.0.0 target=zen3 build_type=dm+sm ^jemalloc ^hdf5@1.8.21+fortran ^netcdf-c@4.7.0 ^netcdf-fortran@4.4.4 ^openmpi@4.0.3+cxx fabrics=auto
# Example: For Building WRF 3.9.1.1 with AOCC 2.3
$ spack -d install -v wrf@3.9.1.1 %aocc@2.3.0 target=zen2 build_type=dm+sm ^jemalloc ^hdf5@1.8.21+fortran ^netcdf-c@4.7.0 ^netcdf-fortran@4.4.4 ^openmpi@4.0.3+cxx fabrics=auto
# Example: For Building  WRF 3.9.1.1 with AOCC 2.2
$ spack -d install -v wrf@3.9.1.1 %aocc@2.2.0 target=zen2 build_type=dm+sm ^jemalloc ^hdf5@1.8.21+fortran ^netcdf-c@4.7.0 ^netcdf-fortran@4.4.4 ^openmpi@4.0.3+cxx fabrics=auto
# WRF 4.2
# Example: For Building  WRF 4.2 with AOCC 3.1.0
$ spack -d install -v wrf@4.2 %aocc@3.1.0 target=zen3 build_type=dm+sm ^jemalloc ^hdf5@1.8.21+fortran ^netcdf-c@4.7.0 ^netcdf-fortran@4.4.4 ^openmpi@4.0.5+cxx fabrics=auto
# Example: For Building  WRF 4.2 with AOCC 3.0
$ spack -d install -v wrf@4.2 %aocc@3.0.0 target=zen3 build_type=dm+sm ^jemalloc ^hdf5@1.8.21+fortran ^netcdf-c@4.7.0 ^netcdf-fortran@4.4.4 ^openmpi@4.0.3+cxx fabrics=auto
# Example: For Building  WRF 4.2 with AOCC 2.3
$ spack -d install -v wrf@4.2 %aocc@2.3.0 target=zen2 build_type=dm+sm ^jemalloc ^hdf5@1.8.21+fortran ^netcdf-c@4.7.0 ^netcdf-fortran@4.4.4 ^openmpi@4.0.3+cxx fabrics=auto
# Example: For Building  WRF 4.2 with AOCC 2.2
$ spack -d install -v wrf@4.2 %aocc@2.2.0 target=zen2 build_type=dm+sm ^jemalloc ^hdf5@1.8.21+fortran ^netcdf-c@4.7.0 ^netcdf-fortran@4.4.4 ^openmpi@4.0.3+cxx fabrics=auto

Please use any combination of below components/Applications and its versions.

Component/Application Versions Applicable
WRF 4.2, 3.9.1.1
AOCC 3.1.0, 3.0.0, 2.3.0, 2.2.0
AOCL 3.0, 2.2

Specifications and Dependencies

Symbol Meaning
-d To enable debug output
-v To enable verbose
@ To specify version number
% To specify compiler
build_type=dm+sm Currently AOCC supports only this build type
^jemalloc To build with jemalloc dependency
^hdf5+fortran To build with hdf5 dependency with fortran enabled
^netcdf-c To build with netcdf-c dependency
^netcdf-fortran To build with netcdf-fortran dependency
^open­mpi­@4.0.3+cxx Use Open MPI for build with cxx enabled
fabri­cs=­auto Use fabri­cs=­auto variant for Open MPI, by default fabri­cs=­none

Obtaining Benchmarks

WRF 3.9.1.1

There are two commonly used WRF data sets:

  • Conus 12km benchmark – Single domain, medium size. 12km CONUS, Oct. 2001. 48-hour, 12km resolution case over the Continental U.S. (CONUS) domain October 24, 2001 with a time step of 72 seconds. The benchmark period is hours 25-27 (3 hours), starting from a restart file from the end of hour 24 (provided).
  • Conus 2.5 km benchmark – Single domain, large size. 2.5 km CONUS, June 4, 2005. Latter 3 hours of a 9-hour, 2.5km resolution case covering the Continental U.S. (CONUS) domain June 4, 2005 with a 15 second time step. The benchmark period is hours 6-9 (3 hours), starting from a restart file from the end of the initial 6 hour period

The Conus 12 km benchmark is a bit small for today’s machines. The Conus 2.5 km benchmark uses a 17 GB restart file and is preformatted to use Parallel NetCDF. However, the namelist.input file can be altered to use sequential I/O instead in case there is no parallel file system like Lustre/BeeGFS/GPFS available.

Running WRF on AMD 2nd Gen EPYC Processors

WRF can be used for a variety of workloads but is commonly run as a benchmark using data sets Conus 2.5km and Conus 12km.

The following steps are recommended to run Conus 12km benchmark on AMD EPYC 7742 processor model which has 128 cores (SMT OFF) per Node.

Setting Environment
# Format for loading WRF module into environment build with AOCC
$ spack load wrf@<Version Number> %aocc@<Version Number>
# Example to load WRF 3.9.1.1 build with AOCC 3.0
$ spack load wrf@3.9.1.1 %aocc@3.0.0
# Locate and go to WRF installation directory
$ spack cd -i wrf@3.9.1.1 %aocc@3.0.0

 

Runtime Environment settings
# Go to WRF installed directory and execute following steps
cd test/em_real
rm namelist.input
ln -s /<path_to_conus_data>/conus_12km/* .
ulimit -s unlimited
export WRF_HOME=/<WRF installation directory>
# common settings for AMD 2nd and 3rd Gen EPYC
export PBV=CLOSE
export OMP_NUM_THREADS=4
export OMP_PROC_BIND=TRUE
export OMP_STACKSIZE="16M"
export PE=4
rm -rf rsl.* 1node1tile wrfout*
# Try with 4,8,16,32,64,128,196
export WRF_NUM_TILES=128
# Open MPI binding used for AMD EPYC 7002 Series Processors
export RESOURCE=L3cache
export ITE=1
# Run command using Open MPI
$ mpirun -np 32 --bind-to core --map-by ppr:$ITE:$RESOURCE:pe=$PE numactl -l $WRF_HOME/main/wrf.exe

Running WRF on AMD 3rd Gen EPYC Processors

Setting Environment
# Format for loading WRF module into environment build with AOCC
$ spack load wrf@<Version Number> %aocc@<Version Number>
# Example to load WRF 3.9.1.1 build with AOCC 3.1.0
$ spack load wrf@3.9.1.1 %aocc@3.1.0
# Locate and go to WRF installation directory
$ spack cd -i wrf@3.9.1.1 %aocc@3.1.0

 

Runtime Environment settings
# Go to WRF installed directory and execute following steps
cd test/em_real
rm namelist.input
ln -s /<path_to_conus_data>/conus_12km/* .
ulimit -s unlimited
export WRF_HOME=/<WRF installation directory>
# common settings for AMD 2nd and 3rd Gen EPYC
export PBV=CLOSE
export OMP_NUM_THREADS=4
export OMP_PROC_BIND=TRUE
export OMP_STACKSIZE="16M"
export PE=4
rm -rf rsl.* 1node1tile wrfout*
# Try with 4,8,16,32,64,128,196
export WRF_NUM_TILES=128
# Open MPI binding used for AMD EPYC 7003 Series Processors
export RESOURCE=numa
export ITE=4
# Run command using Open MPI
$ mpirun -np 32 --bind-to core --map-by ppr:$ITE:$RESOURCE:pe=$PE numactl -l $WRF_HOME/main/wrf.exe

Calculating benchmark performance numbers

Once you have run the benchmarks, execute below commands bench.sh, stats.sh and getmean.sh to calculate the benchmark performance values.  The contents of these files are listed below.

These commands are not target specific.

get mean
# To get the statistics from rsl.out.*
# Create scripts for bench.sh, getmean.sh and stats.awk from below code blocks
export SCRIPTS=/<scripts path>
cat rsl.out.* > 1node1tile
$ $SCRIPTS/bench.sh 1node1tile
$ $SCRIPTS/getmean.sh 1node1tile

 

bench.sh
grep "Timing for main" $1 | awk 'BEGIN{t=0;at=0;i=0;}{t=t+$9;i=i+1;}END{at=t/i;print "\nAverage Time: " at " sec/step over " i " time steps\n"}'

 

getmean.sh
#!/bin/bash
grep "Timing for main" $1 | tail -149 | awk '{print $9}' awk -f $SCRIPTS/stats.awk

 

stats.awk
#!/bin/bash

BEGIN{ a = 0.0 ; i = 0 ; max = -999999999  ; min = 9999999999 }
{
i ++
a += $1
if ( $1 > max ) max = $1
if ( $1 < min ) min = $1
}
END{ printf("---\n%10s  %8d\n%10s  %15f\n%10s  %15f\n%10s  %15f\n%10s  %15f\n%10s  %15f\n","items:",i,"max:",max,"min:",min,"sum:",a,"mean:",a/(i*1.0),"mean/max:",(a/(i*1.0))/max) }