2.6 KiB
libcuxxMdspan - libcu++ mdspan Interop (DLPack + shared_memory_mdspan)
Description
This sample demonstrates two mdspan-centric features CCCL: DLPack <-> cuda::std::mdspan bridging via cuda::to_device_mdspan / cuda::to_dlpack_tensor (the tensor-interchange protocol used by PyTorch, JAX, CuPy, and others), and cuda::shared_memory_mdspan for multi-dimensional views of shared-memory tiles with address-space-safe accessors. A small matrix is built, wrapped in a DLTensor, converted to a device_mdspan, scaled row-wise, and transposed through a shared_memory_mdspan tile. The output mdspan is converted back to DLPack and its metadata is printed.
Key Concepts
CCCL 3.3, libcu++ mdspan, DLPack Interoperability, Shared Memory Views
Supported SM Architectures
SM 7.0 SM 7.5 SM 8.0 SM 8.6 SM 8.9 SM 9.0 SM 10.0 SM 11.0 SM 12.0
Supported OSes
Linux, Windows
Supported CPU Architecture
x86_64, aarch64
CUDA APIs involved
CCCL libcu++
cuda::to_device_mdspan, cuda::to_dlpack_tensor, cuda::device_mdspan, cuda::shared_memory_mdspan, cuda::std::mdspan
CUDA Runtime API
cudaMalloc, cudaFree, cudaMemcpy, cudaMemset, cudaDeviceSynchronize, cudaGetDeviceProperties
Dependencies needed to build/run
CCCL 3.3+, DLPack 1.2+. Both fetched automatically via CPM at configure time (pinned to v3.3.3 and v1.3 respectively). Override with -DCCCL_SOURCE_DIR=/path/to/cccl and -DDLPACK_SOURCE_DIR=/path/to/dlpack to use local checkouts.
Prerequisites
Download and install the CUDA Toolkit for your corresponding platform. Make sure the dependencies mentioned in Dependencies section above are installed.
References (for more details)
CCCL 3.3 release notes, cuda::to_device_mdspan header, cuda::shared_memory_mdspan docs, DLPack specification