matrixMulCUBLAS cudaEventCreate cudaEventRecord cudaEventQuery cudaEventDestroy cudaEventElapsedTime cudaMalloc cudaFree cudaMemcpy cublasCreate cublasSgemm whole ./ ../ ../../Common CUDA Runtime API Performance Strategies Linear Algebra CUBLAS CUDA CUBLAS matrix multiply cublas true matrixMulCUBLAS.cpp CUBLAS 1:CUDA Basic Topics 3:Linear Algebra sm35 sm37 sm50 sm52 sm60 sm61 sm70 sm72 sm75 sm80 sm86 new/matrixMulCUBLAS.cpp x86_64 linux windows7 x86_64 macosx arm ppc64le linux all Matrix Multiplication (CUBLAS) exe