Matrix Multiplication Algorithm

A matrix is an array with n rows and m columns in general. For this exercise assume that you are working with square matrices only, so m = n. The pseudocode for an algorithm to multiply two square matrices A and B and store the result in C (where all matrices are non-sparse and stored in a two-dimensional array) is:

for row = 0 to n-1
for col = 0 to n-1
for l = 0 to n-1
C[row][col] += A[row][l] * B[l][col]

Measure the execution time of the matrix multiplication and compare it with the theoretical behaviour of the algorithm.

• Examine the pseudocode and consider which run-time behaviour over increasing problem sizes n you expect.
• Implement the matrix multiplication pseudocode.
• Time your code over increasing matrix sizes n and compare the results with your initial expectation.
Now investigate the influence of the order of memory access onto the timing results. If you carefully consider the matrix algorithm pseudocode above you should notice that the nesting of the loops can be changed arbitrarily. So instead of row/col/l it can be col/row/l, l/col/row, etc... Each different loop nesting creates a different memory access order.
•  Create additional matrix multiplication functions for all permutations of the row, column and l loop.
• For each of the permutations measure their run-time over a range of matrix sizes n and compare the results. Which nesting is best? What can you conclude from this about the way matrices are stored?

Submit a single graph that contains the timing results of the 6 possible loop orderings of the loops and briefly discuss the results.