Dheemanth aeab82ff30
CUDA 13.2 samples update (#432)
- Added Python samples for CUDA Python 1.0 release
- Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
2026-05-13 17:13:18 -05:00

3.4 KiB

Sample: Vector Addition (Python)

Description

Run your first GPU kernel: add two vectors element-wise on the GPU using the cuda.core API with runtime compilation.

What You'll Learn

  • Writing CUDA kernels in C++ with template support
  • Runtime compilation of CUDA kernels from Python
  • Using cuda.core for device management, programs, and launches
  • Configuring and launching kernels with grid and block dimensions
  • Using CuPy for GPU memory management
  • Verifying GPU results against CPU computation

Key Libraries

  • cuda.core — Pythonic access to CUDA runtime and compilation
  • cupy — GPU array library for Python

Key APIs

From cuda.core

  • Device — Initialize and manage CUDA device
  • Program — Create program from kernel source code
  • ProgramOptions — Set compilation options (C++ standard, architecture)
  • LaunchConfig — Configure kernel launch parameters
  • launch — Execute kernel on specified stream

Import stable symbols from the top-level package (not cuda.core.experimental). See the cuda.core documentation.

From CuPy

  • cp.random.rand() — Generate random arrays on GPU
  • cp.empty() — Allocate uninitialized GPU arrays
  • cp.allclose() — Verify results with tolerance

From cuda_samples_utils

  • verify_array_result() — Verify computation results

Kernel Techniques

  • 1D Grid-Stride Loop — Handle arbitrary array sizes with fixed grid
  • Template Programming — Generic kernel for different data types
  • Bounds Checking — Prevent out-of-bounds memory access

Requirements

Hardware

  • NVIDIA GPU with Compute Capability 7.0 or higher
  • Minimum GPU memory: 512 MB

Software

  • CUDA Toolkit 13.0 or newer (matches cuda-python 13.x)
  • Python 3.10 or newer
  • cuda-python (>=13.0.0)
  • cuda-core (>=0.6.0)
  • cupy-cuda13x (>=13.0.0)

Installation

Install the required packages from requirements.txt:

cd /path/to/cuda-samples/python/1_GettingStarted/vectorAdd
pip install -r requirements.txt

The requirements.txt installs:

  • cuda-python (>=13.0.0)
  • cuda-core (>=0.6.0)
  • cupy-cuda13x (>=13.0.0)

How to Run

Basic usage

cd samples/python/1_GettingStarted/vectorAdd
python vectorAdd.py

With custom parameters

# Custom vector size
python vectorAdd.py --elements 1000000

# Use specific GPU
python vectorAdd.py --device 1

# Skip verification for benchmarking
python vectorAdd.py --no-verify

Expected Output

[Vector addition using CUDA Core API]
Device: <Your GPU Name>
Compute Capability: sm_<XX>
Compiling kernel 'vectorAdd<float>'...
Kernel compiled successfully
[Vector addition of 50000 elements]
CUDA kernel launch with 196 blocks of 256 threads
Verifying result...
Test PASSED

Done

Note: Device name and compute capability will vary based on your GPU.

Files

  • vectorAdd.py — Python implementation using cuda.core API
  • README.md — This file
  • requirements.txt — Sample dependencies
  • ../../Utilities/cuda_samples_utils.py — Common utilities (imported by this sample)

See Also