simpleAWBarrier --std=c++11 cudaMalloc cudaFree cudaMemcpyAsync whole ./ ../ ../../Common Arrive Wait Barrier CUDA GPGPU CPP11 GCC 5.1.0 true simpleAWBarrier.cu CPP11 MBCG 1:CUDA Basic Topics sm70 sm72 sm75 sm80 sm86 x86_64 linux windows7 arm ppc64le linux aarch64 linux aarch64 qnx 7.0 Simple Arrive Wait Barrier exe