Dheemanth aeab82ff30
CUDA 13.2 samples update (#432)
- Added Python samples for CUDA Python 1.0 release
- Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
2026-05-13 17:13:18 -05:00

131 lines
3.4 KiB
Markdown

# Sample: Vector Addition (Python)
## Description
Run your first GPU kernel: add two vectors element-wise on the GPU using the [`cuda.core`](https://nvidia.github.io/cuda-python/cuda-core/latest/) API with runtime compilation.
## What You'll Learn
- Writing CUDA kernels in C++ with template support
- Runtime compilation of CUDA kernels from Python
- Using `cuda.core` for device management, programs, and launches
- Configuring and launching kernels with grid and block dimensions
- Using CuPy for GPU memory management
- Verifying GPU results against CPU computation
## Key Libraries
- [`cuda.core`](https://nvidia.github.io/cuda-python/cuda-core/latest/) — Pythonic access to CUDA runtime and compilation
- `cupy` — GPU array library for Python
## Key APIs
### From `cuda.core`
- `Device` — Initialize and manage CUDA device
- `Program` — Create program from kernel source code
- `ProgramOptions` — Set compilation options (C++ standard, architecture)
- `LaunchConfig` — Configure kernel launch parameters
- `launch` — Execute kernel on specified stream
Import stable symbols from the top-level package (not `cuda.core.experimental`). See the [cuda.core documentation](https://nvidia.github.io/cuda-python/cuda-core/latest/).
### From CuPy
- `cp.random.rand()` — Generate random arrays on GPU
- `cp.empty()` — Allocate uninitialized GPU arrays
- `cp.allclose()` — Verify results with tolerance
### From `cuda_samples_utils`
- `verify_array_result()` — Verify computation results
## Kernel Techniques
- **1D Grid-Stride Loop** — Handle arbitrary array sizes with fixed grid
- **Template Programming** — Generic kernel for different data types
- **Bounds Checking** — Prevent out-of-bounds memory access
## Requirements
### Hardware
- NVIDIA GPU with Compute Capability 7.0 or higher
- Minimum GPU memory: 512 MB
### Software
- CUDA Toolkit 13.0 or newer (matches `cuda-python` 13.x)
- Python 3.10 or newer
- `cuda-python` (>=13.0.0)
- `cuda-core` (>=0.6.0)
- `cupy-cuda13x` (>=13.0.0)
## Installation
Install the required packages from requirements.txt:
```bash
cd /path/to/cuda-samples/python/1_GettingStarted/vectorAdd
pip install -r requirements.txt
```
The requirements.txt installs:
- `cuda-python` (>=13.0.0)
- `cuda-core` (>=0.6.0)
- `cupy-cuda13x` (>=13.0.0)
## How to Run
### Basic usage
```bash
cd samples/python/1_GettingStarted/vectorAdd
python vectorAdd.py
```
### With custom parameters
```bash
# Custom vector size
python vectorAdd.py --elements 1000000
# Use specific GPU
python vectorAdd.py --device 1
# Skip verification for benchmarking
python vectorAdd.py --no-verify
```
## Expected Output
```
[Vector addition using CUDA Core API]
Device: <Your GPU Name>
Compute Capability: sm_<XX>
Compiling kernel 'vectorAdd<float>'...
Kernel compiled successfully
[Vector addition of 50000 elements]
CUDA kernel launch with 196 blocks of 256 threads
Verifying result...
Test PASSED
Done
```
**Note:** Device name and compute capability will vary based on your GPU.
## Files
- `vectorAdd.py` — Python implementation using cuda.core API
- `README.md` — This file
- `requirements.txt` — Sample dependencies
- `../../Utilities/cuda_samples_utils.py` — Common utilities (imported by this sample)
## See Also
- [CUDA Python Documentation](https://nvidia.github.io/cuda-python/)
- [cuda.core API](https://nvidia.github.io/cuda-python/cuda-core/latest/)
- [CuPy Documentation](https://docs.cupy.dev/)