threadFenceReduction cudaMemcpy cudaFree cudaDeviceSynchronize cudaMalloc cudaGetDeviceProperties whole ./ ../ ../../../Common Cooperative Groups Data-Parallel Algorithms Performance Strategies reduction true threadFenceReduction.cu 1:CUDA Advanced Topics 1:Data-Parallel Algorithms 1:Performance Strategies sm50 sm52 sm53 sm60 sm61 sm70 sm72 sm75 sm80 sm86 sm87 sm89 sm90 x86_64 linux windows7 x86_64 macosx arm sbsa ppc64le linux all threadFenceReduction exe