mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2026-06-04 00:06:52 +08:00
This is the release of the CUDA 13.3 samples, which include additions for CUDA Tile C++, and updated CCCL and Python samples.
tileTranspose
Description
This sample demonstrates how to transpose a 2D matrix using CUDA Tile C++. Each block handles an n x m sized chunk of the source matrix. The block loads a chunk, transposes it locally, and stores it to the correct position in the result matrix. A cuda::tiles::partition_view is used to model the chunking of the source and result matrices.
Expected Output
Success! Matrix transpose matches expected results.
Prerequisites
- CUDA Toolkit version 13.3 or later.
- CUDA Driver version 580 or later.
- Host compiler with C++20 support.