matrixMul cudaEventCreate cudaEventRecord cudaEventQuery cudaEventDestroy cudaEventElapsedTime cudaEventSynchronize cudaMalloc cudaFree cudaMemcpy whole ./ ../ ../../common/inc CUDA Runtime API Linear Algebra CUDA matrix multiply true matrixMul.cu 1:CUDA Basic Topics 3:Linear Algebra sm35 sm37 sm50 sm52 sm60 sm61 sm70 sm72 sm75 sm80 sm86 x86_64 linux windows7 x86_64 macosx arm aarch64 ppc64le linux all Matrix Multiplication (CUDA Runtime API Version)