cubDeviceTransform - CUB DeviceTransform N-to-M

Description

This sample demonstrates cub::DeviceTransform in its N-input / M-output form. A single device-wide call reads from N input sequences and writes to M output sequences, driven by a user-provided op that returns a cuda::std::tuple of M values. Two cases are shown: N=3 inputs producing 1 output, and N=2 inputs producing 2 outputs (sum and difference in one fused pass).

Key Concepts

CCCL 3.3, CUB Device Algorithms, Fused Elementwise Transforms, Counting Iterators

Supported SM Architectures

SM 7.0 SM 7.5 SM 8.0 SM 8.6 SM 8.9 SM 9.0 SM 10.0 SM 11.0 SM 12.0

Supported OSes

Linux, Windows

Supported CPU Architecture

x86_64, aarch64

CUDA APIs involved

CCCL CUB

cub::DeviceTransform::Transform

CCCL libcu++

cuda::counting_iterator, cuda::std::tuple

CUDA Runtime API

cudaDeviceSynchronize, cudaGetDeviceProperties

Dependencies needed to build/run

CCCL 3.3+. Fetched automatically via CPM at configure time (pinned to v3.3.3). Override with -DCCCL_SOURCE_DIR=/path/to/cccl to use a local checkout.

Prerequisites

Download and install the CUDA Toolkit for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.

References (for more details)

CCCL 3.3 release notes, cub::DeviceTransform header