shfl_scan
--std=c++11
-O3
cudaMemset
cudaFree
cudaEventRecord
cudaMallocHost
cudaEventCreate
cudaEventElapsedTime
cudaEventSynchronize
cudaFreeHost
cudaMalloc
cudaMemcpy
cudaGetDeviceProperties
cudaGetDevice
whole
./
../
../../../Common
Data-Parallel Algorithms
Performance Strategies
GPGPU
CPP11
CUDA
scan
parallel prefix sum
Data-Parallel Algorithms
true
shfl_scan.cu
CPP11
1:CUDA Advanced Topics
1:Data-Parallel Algorithms
1:Performance Strategies
sm35
sm37
sm50
sm52
sm53
sm60
sm61
sm70
sm72
sm75
sm80
sm86
sm87
x86_64
linux
windows7
x86_64
macosx
arm
aarch64
sbsa
ppc64le
linux
3.5
CUDA Parallel Prefix Sum with Shuffle Intrinsics (SHFL_Scan)
exe