libcuxxMdspan - libcu++ mdspan Interop (DLPack + shared_memory_mdspan)

Description

This sample demonstrates two mdspan-centric features CCCL: DLPack <-> cuda::std::mdspan bridging via cuda::to_device_mdspan / cuda::to_dlpack_tensor (the tensor-interchange protocol used by PyTorch, JAX, CuPy, and others), and cuda::shared_memory_mdspan for multi-dimensional views of shared-memory tiles with address-space-safe accessors. A small matrix is built, wrapped in a DLTensor, converted to a device_mdspan, scaled row-wise, and transposed through a shared_memory_mdspan tile. The output mdspan is converted back to DLPack and its metadata is printed.

Key Concepts

CCCL 3.3, libcu++ mdspan, DLPack Interoperability, Shared Memory Views

Supported SM Architectures

SM 7.0 SM 7.5 SM 8.0 SM 8.6 SM 8.9 SM 9.0 SM 10.0 SM 11.0 SM 12.0

Supported OSes

Linux, Windows

Supported CPU Architecture

x86_64, aarch64

CUDA APIs involved

CCCL libcu++

cuda::to_device_mdspan, cuda::to_dlpack_tensor, cuda::device_mdspan, cuda::shared_memory_mdspan, cuda::std::mdspan

CUDA Runtime API

cudaMalloc, cudaFree, cudaMemcpy, cudaMemset, cudaDeviceSynchronize, cudaGetDeviceProperties

Dependencies needed to build/run

CCCL 3.3+, DLPack 1.2+. Both fetched automatically via CPM at configure time (pinned to v3.3.3 and v1.3 respectively). Override with -DCCCL_SOURCE_DIR=/path/to/cccl and -DDLPACK_SOURCE_DIR=/path/to/dlpack to use local checkouts.

Prerequisites

Download and install the CUDA Toolkit for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.

References (for more details)

CCCL 3.3 release notes, cuda::to_device_mdspan header, cuda::shared_memory_mdspan docs, DLPack specification