reductionMultiBlockCG --std=c++11 cudaMemcpy cudaFree cudaSetDevice cudaDeviceSynchronize cudaLaunchCooperativeKernel cudaMalloc cudaOccupancyMaxActiveBlocksPerMultiprocessor cudaGetDeviceProperties cudaOccupancyMaxPotentialBlockSize ./ ../ ../../../Common Cooperative Groups MultiBlock Cooperative Groups GPGPU CPP11 true reductionMultiBlockCG.cu MBCG CPP11 1:CUDA Advanced Topics sm60 sm61 sm70 sm72 sm75 sm80 sm86 sm87 sm90 x86_64 linux ppc64le linux windows7 aarch64 sbsa 6.0 Reduction using MultiBlock Cooperative Groups exe