Navigation
- Spack usage disclaimer, copyright and trademark notice
- Introduction to SPACK
- Getting Started
- Build Customization
- Technical Support
AMD Toolchain with SPACK
Micro Benchmarks/Synthetic
SPACK HPC Applications
Introduction
The STREAM benchmark is a simple, synthetic benchmark program that measures sustainable main memory bandwidth in MB/s and the corresponding computation rate for simple vector kernels.
The general rule for running STREAM is that each array must be at least 4x the size of the sum of all the last-level caches used in the run, or 1 Million elements, whichever is larger.
STREAM uses four kernels for analysis:
- ”Copy’ ‘ measures transfer rates in the absence of arithmetic.
- “Scale’ ‘ adds a simple arithmetic operation.
- “Sum’ ‘ adds a third operand to allow multiple load/store ports on vector machines to be tested.
- “Triad’ ‘ allows chained/overlapped/fused multiply/add operations.
Official website for STREAM: https://www.cs.virginia.edu/stream/
Build STREAM using Spack
Reference to add external packages to Spack: Build Customization (Adding external packages to Spack)
# Format for Building STREAM $ spack -d install - v stream@<Version> +openmp %aocc@<Version> cflags= "CFLAGS" |
# Example For Building STREAM with AOCC 3.2.0 $ spack -d install -v stream@5.10 +openmp %aocc@3.2.0 cflags="-mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=260000000 -DNTIMES=10 -ffp-contract=fast -fnt-store" |
# Example For Building STREAM with AOCC 3.1.0 $ spack -d install -v stream@5.10 +openmp %aocc@3.1.0 cflags="-mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=260000000 -DNTIMES=10 -ffp-contract=fast -fnt-store" |
# Example For Building STREAM with AOCC 3.0.0 $ spack -d install - v stream@5.10 %aocc@3.0.0 +openmp cflags= "-mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=260000000 -DNTIMES=10 -ffp-contract=fast -fnt-store" |
# Example: For Building STREAM with AOCC 2.3.0 $ spack -d install - v stream@5.10 %aocc@2.3.0 +openmp cflags= "-mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=2600000000 -DNTIMES=10 -ffp-contract=fast -fnt-store" |
# Example For Building STREAM with AOCC 2.2.0 $ spack -d install - v stream@5.10 %aocc@2.2.0 +openmp cflags= "-mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=2600000000 -DNTIMES=10 -ffp-contract=fast -fnt-store" |
Compatibility of STREAM versions with AOCC versions is given below
Component/Application | Versions Applicable |
STREAM | 5.10 |
AOCC | 3.2.0, 3.1.0, 3.0.0, 2.3.0, 2.2.0 |
Specifications and Dependencies
Symbol | Meaning |
-d | To enable debug output |
-v | To enable verbose |
@ | To specify version number |
% | To specify compiler |
+openmp | To build with OPENMP enabled |
cflags | To add cflags to the Spack environment using command line |
Basic Details of Flags used:
- Mcmodel=large: Generate code for the large model. This model makes no assumptions about addresses and sizes of sections.
- STREAM_ARRAY_SIZE= “260000000”: Sets the Array size for the STREAM benchmark. General recommendation is that “STREAM_ARRAY_SIZE” must be at least 4x the size of the sum of all the last-level caches in the system.
- NTIMES=STREAM runs each kernel “NTIMES” times.
- ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them.
- fnt-store= Generate non-temporal store instruction for array accesses in a loop with large trip count.
Running Stream
These are the steps recommended to run STREAM on AMD processors:
- STREAM generally gives the better performance with 1 thread per CCD.
- Example binding options for AMD EPYC 7742 and AMD EPYC 7763 Processor to bind 1 thread per CCD: “export GOMP_CPU_AFFINITY=0-127:8” and “export OMP_NUM_THREADS=16”
Setting Environment |
# Format for loading STREAM build with AOCC $ spack load stream@<Version> %aocc@<Version> |
# Example : Load STREAM build with AOCC 3.2.0 module into environment $ spack load stream %aocc@3.2.0 |
Note: It is recommended to reboot the node for the optimal stream results.
Run Command |
# Running STREAM: # Load STREAM $ spack load stream@5.10%aocc@3.2.0 $ echo madvise | tee /sys/kernel/mm/transparent_hugepage/enabled $ echo madvise | tee /sys/kernel/mm/transparent_hugepage/defrag $ echo 3 > /proc/sys/vm/drop_caches $ echo 1 > /proc/sys/kernel/numa_balancing
|
$ export OMP_SCHEDULE=static $ export OMP_DYNAMIC= false $ export OMP_THREAD_LIMIT=256 $ export OMP_NESTED=FALSE $ export OMP_STACKSIZE=256M |
# Thread Binding Options for AMD EPYC 7742/7763 Processor $ export GOMP_CPU_AFFINITY=0-127:8 $ export OMP_NUM_THREADS=16 |
$ echo "running for 1 thread per CCD" $ stream_c.exe |