simpleAWBarrier --std=c++11 cudaMalloc cudaFree cudaMemcpyAsync whole ./ ../ ../../common/inc Arrive Wait Barrier CUDA GPGPU CPP11 GCC 5.0.0 true simpleAWBarrier.cu CPP11 MBCG 1:CUDA Basic Topics sm70 sm72 sm75 sm80 x86_64 linux windows7 arm ppc64le linux aarch64 linux aarch64 qnx 7.0 Simple Arrive Wait Barrier exe