mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2026-06-04 00:06:52 +08:00
This is the release of the CUDA 13.3 samples, which include additions for CUDA Tile C++, and updated CCCL and Python samples.
cudaComputeLambdas (Python)
Description
This sample demonstrates how cuda.compute (from the
cuda-cccl package) accepts plain Python callables, including
lambdas, as the operators that drive device-wide reductions,
transforms, and scans. Internally cuda.compute JIT-compiles the
callable through Numba for the GPU, so you can iterate on the
operator in pure Python and still get a fused device-wide kernel.
The sample exercises three algorithm families:
cuda.compute.reduce_into- sum vialambda a, b: a + b.cuda.compute.unary_transform- elementwisey = x*x + 1via a lambda.cuda.compute.inclusive_scan- prefix sum over only the even values, driven by a regular Python function as the binary operator.
What You'll Learn
- Passing a Python
lambdadirectly as the operator to a cuda.compute device algorithm - Using a regular Python
deffunction for the same purpose when the op is non-trivial - The three core algorithm families in cuda.compute: reductions, transforms, and scans
- How cuda.compute auto-compiles the op to LTO-IR via Numba
Key Libraries
cuda.compute(from thecuda-ccclpackage) - device algorithms and JIT-compiled Python opscuda.core- device setupcupy- device buffersnumpy- scalar init values and host-side verification
Key APIs
From cuda.compute
cuda.compute.reduce_into(d_in, d_out, num_items, op, h_init)- device-wide reductioncuda.compute.unary_transform(d_in, d_out, num_items, op)- elementwise unary transformcuda.compute.inclusive_scan(d_in, d_out, op, init_value, num_items)- inclusive prefix scan
From cuda_samples_utils
print_gpu_info()- print device name and compute capability
Requirements
Hardware
- NVIDIA GPU with Compute Capability 7.0 or higher
Software
- CUDA Toolkit 13.0 or newer (cuda.compute compiles ops to LTO-IR via
Numba, which needs the toolkit's
nvvmandlibdevice). - Python 3.10 or newer
cuda-cccl(>=1.0.0)cuda-core(>=1.0.0)cupy-cuda13x(>=14.0.0)numba-cuda(pulled in transitively bycuda-cccl)
If the CUDA toolkit is not on your PATH, set CUDA_HOME so Numba
can locate libdevice:
export CUDA_HOME=/usr/local/cuda
Installation
Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/2_CoreConcepts/cudaComputeLambdas
pip install -r requirements.txt
The requirements.txt installs:
cuda-cccl(>=1.0.0) - ships thecuda.computemodulecuda-core(>=1.0.0)cupy-cuda13x(>=14.0.0)numpy(>=1.24.0)
How to Run
Basic usage
cd cuda-samples/python/2_CoreConcepts/cudaComputeLambdas
python cudaComputeLambdas.py
With custom parameters
python cudaComputeLambdas.py --device 1
Expected Output
Device: <Your GPU Name>
Compute Capability: <X.Y>
reduce_into(lambda a,b: a+b) over 1..10 -> 55 (expected 55) OK
unary_transform(lambda x: x*x + 1):
got = [1, 2, 5, 10, 17, 26, 37, 50]
expected = [1, 2, 5, 10, 17, 26, 37, 50] OK
inclusive_scan(add-evens-only) over [1,2,3,4,5,6]:
got = [0, 2, 2, 6, 6, 12]
expected = [0, 2, 2, 6, 6, 12] OK
Done
Note: Device name and compute capability will vary based on your GPU.
Files
cudaComputeLambdas.py- Python implementationREADME.md- This filerequirements.txt- Sample dependencies../../Utilities/cuda_samples_utils.py- Common utilities (imported by this sample)