tf32TensorCoreGemm
--std=c++11
cudaMalloc
cudaDeviceSynchronize
cudaFuncSetAttribute
cudaEventCreate
cudaEventRecord
cudaEventSynchronize
cudaEventElapsedTime
cudaFree
whole
./
../
../../common/inc
Matrix Multiply
WMMA
Tensor Cores
matrix multiply
Async copy
CPP11
GCC 5.0.0
true
tf32TensorCoreGemm.cu
1:CUDA Basic Topics
sm80
sm86
x86_64
linux
aarch64
windows7
ppc64le
linux
8.0
tf32 Tensor Core GEMM
exe