cubDeviceTransform - CUB DeviceTransform N-to-M
Description
This sample demonstrates cub::DeviceTransform in its N-input / M-output form. A single device-wide call reads from N input sequences and writes to M output sequences, driven by a user-provided op that returns a cuda::std::tuple of M values. Two cases are shown: N=3 inputs producing 1 output, and N=2 inputs producing 2 outputs (sum and difference in one fused pass).
Key Concepts
CCCL 3.3, CUB Device Algorithms, Fused Elementwise Transforms, Counting Iterators
Supported SM Architectures
SM 7.0 SM 7.5 SM 8.0 SM 8.6 SM 8.9 SM 9.0 SM 10.0 SM 11.0 SM 12.0
Supported OSes
Linux, Windows
Supported CPU Architecture
x86_64, aarch64
CUDA APIs involved
CCCL CUB
cub::DeviceTransform::Transform
CCCL libcu++
cuda::counting_iterator, cuda::std::tuple
CUDA Runtime API
cudaDeviceSynchronize, cudaGetDeviceProperties
Dependencies needed to build/run
CCCL 3.3+. Fetched automatically via CPM at configure time (pinned to v3.3.3). Override with -DCCCL_SOURCE_DIR=/path/to/cccl to use a local checkout.
Prerequisites
Download and install the CUDA Toolkit for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.