globalToShmemAsyncCopy
--std=c++11
cudaStreamCreateWithFlags
cudaMalloc
cudaDeviceGetAttribute
cudaFree
cudaMallocHost
cudaEventSynchronize
cudaEventRecord
cudaFreeHost
cudaStreamSynchronize
cudaEventDestroy
cudaEventElapsedTime
cudaMemsetAsync
cudaMemcpyAsync
cudaEventCreate
whole
./
../
../../../Common
CUDA Runtime API
Linear Algebra
CPP11 CUDA
CUDA
matrix multiply
Async copy
CPP11
GCC 5.1.0
true
globalToShmemAsyncCopy.cu
CPP11
1:CUDA Basic Topics
3:Linear Algebra
sm70
sm72
sm75
sm80
sm86
sm87
sm89
sm90
x86_64
linux
x86_64
macosx
arm
sbsa
ppc64le
linux
aarch64
linux
aarch64
qnx
windows7
7.0
Global Memory to Shared Memory Async Copy