This is the release of the CUDA 13.3 samples, which include additions for CUDA Tile C++, and updated CCCL and Python samples.
cubDeviceSegmentedScan - CUB DeviceSegmentedScan
Description
This sample demonstrates cub::DeviceSegmentedScan. A segmented scan computes an independent scan over each of many contiguous segments in a single device-wide call. Two operations are shown: ExclusiveSegmentedSum across three independent segments, and InclusiveSegmentedScan with a custom binary operator (running maximum via cuda::maximum<>).
Key Concepts
CUB Device Algorithms, Segmented Scan, Prefix Sum
Supported SM Architectures
SM 7.0 SM 7.5 SM 8.0 SM 8.6 SM 8.9 SM 9.0 SM 10.0 SM 11.0 SM 12.0
Supported OSes
Linux, Windows
Supported CPU Architecture
x86_64, aarch64
CUDA APIs involved
CCCL CUB
cub::DeviceSegmentedScan::ExclusiveSegmentedSum, cub::DeviceSegmentedScan::InclusiveSegmentedScan
CCCL libcu++
cuda::maximum
CUDA Runtime API
cudaDeviceSynchronize, cudaGetDeviceProperties
Dependencies needed to build/run
CCCL 3.3+. Fetched automatically via CPM at configure time (pinned to v3.3.3). Override with -DCCCL_SOURCE_DIR=/path/to/cccl to use a local checkout.
Prerequisites
Download and install the CUDA Toolkit for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.