Navigation

Spack

AMD Toolchain with SPACK

Micro Benchmarks/Synthetic

SPACK HPC Applications

Introduction

The STREAM benchmark is a simple, synthetic benchmark program that measures sustainable main memory bandwidth in MB/s and the corresponding computation rate for simple vector kernels.

The general rule for running STREAM is that each array must be at least 4x the size of the sum of all the last-level caches used in the run, or 1 Million elements, whichever is larger

STREAM uses four kernels for analysis:

  1. “Copy” measures transfer rates in the absence of arithmetic.
  2. “Scale” adds a simple arithmetic operation.
  3. “Sum” adds a third operand to allow multiple load/store ports on vector machines to be tested.
  4. “Triad” allows chained/overlapped/fused multiply/add operations.

Official website for STREAMhttps://www.cs.virginia.edu/stream/

Build STREAM using Spack

Reference to add external packages to Spack: Build Customization (Adding external packages to Spack)

# Format for Building STREAM
$ spack -d install -v stream@<Version Number> %aocc@<Version Number> +openmp cflags="CFLAGS"
# Example For  Building STREAM with AOCC-3.1.0
$ spack -d install -v stream %aocc@3.1.0 +openmp cflags="-O3 -mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=250000000 -DNTIMES=10 -ffp-contract=fast -fnt-store"
# Example For  Building STREAM with AOCC-3.0.0
$ spack -d install -v stream %aocc@3.0.0 +openmp cflags="-O3 -mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=250000000 -DNTIMES=10 -ffp-contract=fast -fnt-store"
# Example: For Building STREAM with AOCC-2.3.0
$ spack -d install -v stream %aocc@2.3.0 +openmp cflags="-O3 -mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=250000000 -DNTIMES=10 -ffp-contract=fast -fnt-store"
# Example For  Building STREAM with AOCC-2.2.0
$ spack -d install -v stream %aocc@2.2.0 +openmp cflags="-O3 -mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=250000000 -DNTIMES=10 -ffp-contract=fast -fnt-store"

Compatibility of STREAM versions with AOCC versions is given below

Component/Application Versions Applicable
STREAM 5.10
AOCC 3.1.0, 3.0.0, 2.3.0, 2.2.0

Specifications and Dependencies

Symbol Meaning
-d To enable debug output
-v To enable verbose
@ To specify version number
% To specify compiler
+openmp To build with OPENMP enabled
cflags To add cflags to the Spack environment using command line

Basic Details of Flags used:

  • Mcmodel=large: Generate code for the large model. This model makes no assumptions about addresses and sizes of sections.
  • STREAM_ARRAY_SIZE= “250000000”: Sets the Array size for the STREAM benchmark. General recommendation is that “STREAM_ARRAY_SIZE” must be at least 4x the size of the sum of all the last-level caches in the system.
  • NTIMES=STREAM runs each kernel “NTIMES” times.
  • ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them.
  • fnt-store= Generate non-temporal store instruction for array accesses in a loop with large trip count.

Running  Stream

These are the steps recommended to run STREAM on AMD processors:

  • STREAM generally gives the better performance with 1 thread per CCD.
  • Example binding options for AMD EPYC 7742 and AMD EPYC 7763 Processor to bind 1 thread per CCD: “export GOMP_CPU_AFFINITY=0-127:8”  and  “export OMP_NUM_THREADS=16”
Setting Environment
# Format for loading STREAM build with AOCC
$ spack load stream@<Version Number> %aocc@<Version Number>
# Example : Load STREAM build with AOCC-3.1.0 module into environment
$ spack load stream %aocc@3.1.0
# Example : Load STREAM build with AOCC-3.0 module into environment
$ spack load stream %aocc@3.0.0

 

Running Stream
# Running STREAM:
echo madvise | tee /sys/kernel/mm/transparent_hugepage/enabled
echo madvise | tee /sys/kernel/mm/transparent_hugepage/defrag
export OMP_SCHEDULE=static
export OMP_DYNAMIC=false
export OMP_THREAD_LIMIT=256
export OMP_NESTED=FALSE
export OMP_STACKSIZE=256M
# Thread Binding Options for AMD EPYC 7742/7763 Processor
export GOMP_CPU_AFFINITY=0-127:8
export OMP_NUM_THREADS=16
echo "running for 1 thread per CCD"
$ stream_c.exe