This technical note demonstrates how AMD CodeAnalyst
can be used to analyze and improve the performance of a compute-bound program.
The program that we chose for this demonstration is an old
classic: matrix multiply. We'll start with a "textbook" implementation of matrix
multiply, then measure and analyze its performance. Next, we will improve the
performance of the program by changing its memory access pattern.