transpose cudaFree cudaEventRecord cudaEventCreate cudaEventElapsedTime cudaEventSynchronize cudaMalloc cudaEventDestroy cudaGetLastError cudaMemcpy cudaGetDeviceProperties cudaGetDevice whole ./ ../ ../../../Common Performance Strategies Linear Algebra GPGPU matrix transpose true transpose.cu 1:CUDA Advanced Topics 1:Performance Strategies 3:Linear Algebra sm35 sm37 sm50 sm52 sm53 sm60 sm61 sm70 sm72 sm75 sm80 sm86 sm87 x86_64 linux windows7 x86_64 macosx arm sbsa ppc64le linux all Matrix Transpose exe doc\MatrixTranspose.pdf