globalToShmemAsyncCopy
--std=c++11
cudaFree
cudaEventRecord
cudaMallocHost
cudaEventCreate
cudaMemsetAsync
cudaEventElapsedTime
cudaEventSynchronize
cudaDeviceGetAttribute
cudaFreeHost
cudaMalloc
cudaStreamCreateWithFlags
cudaEventDestroy
cudaStreamSynchronize
cudaMemcpyAsync
whole
./
../
../../../Common
CUDA Runtime API
Linear Algebra
CPP11 CUDA
CUDA
matrix multiply
Async copy
CPP11
GCC 5.1.0
true
globalToShmemAsyncCopy.cu
CPP11
1:CUDA Basic Topics
3:Linear Algebra
sm70
sm72
sm75
sm80
sm86
sm87
x86_64
linux
x86_64
macosx
arm
sbsa
ppc64le
linux
aarch64
linux
aarch64
qnx
windows7
7.0
Global Memory to Shared Memory Async Copy