cuda-samples/cpp/3_CUDA_Features/warpAggregatedAtomicsCG
Dheemanth aeab82ff30
CUDA 13.2 samples update (#432)
- Added Python samples for CUDA Python 1.0 release
- Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
2026-05-13 17:13:18 -05:00
..
2026-05-13 17:13:18 -05:00
2026-05-13 17:13:18 -05:00
2026-05-13 17:13:18 -05:00

warpAggregatedAtomicsCG - Warp Aggregated Atomics using Cooperative Groups

Description

This sample demonstrates how using Cooperative Groups (CG) to perform warp aggregated atomics to single and multiple counters, a useful technique to improve performance when many threads atomically add to a single or multiple counters.

Key Concepts

Cooperative Groups, Atomic Intrinsics

Supported SM Architectures

Supported OSes

Linux, Windows

Supported CPU Architecture

x86_64, armv7l, aarch64

CUDA APIs involved

CUDA Runtime API

cudaMemcpy, cudaFree, cudaDeviceGetAttribute, cudaMemset, cudaMalloc

Prerequisites

Download and install the CUDA Toolkit for your corresponding platform.

References (for more details)