shfl_scan
--std=c++11
-O3
cudaMemcpy
cudaFree
cudaMallocHost
cudaEventSynchronize
cudaEventRecord
cudaFreeHost
cudaGetDevice
cudaMemset
cudaMalloc
cudaEventElapsedTime
cudaGetDeviceProperties
cudaEventCreate
whole
./
../
../../../Common
Data-Parallel Algorithms
Performance Strategies
GPGPU
CPP11
CUDA
scan
parallel prefix sum
Data-Parallel Algorithms
true
shfl_scan.cu
CPP11
1:CUDA Advanced Topics
1:Data-Parallel Algorithms
1:Performance Strategies
x86_64
linux
windows7
x86_64
macosx
arm
aarch64
sbsa
ppc64le
linux
3.5
CUDA Parallel Prefix Sum with Shuffle Intrinsics (SHFL_Scan)
exe