mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2026-05-14 14:06:53 +08:00
- Added Python samples for CUDA Python 1.0 release - Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
3.4 KiB
3.4 KiB
Sample: Vector Addition (Python)
Description
Run your first GPU kernel: add two vectors element-wise on the GPU using the cuda.core API with runtime compilation.
What You'll Learn
- Writing CUDA kernels in C++ with template support
- Runtime compilation of CUDA kernels from Python
- Using
cuda.corefor device management, programs, and launches - Configuring and launching kernels with grid and block dimensions
- Using CuPy for GPU memory management
- Verifying GPU results against CPU computation
Key Libraries
cuda.core— Pythonic access to CUDA runtime and compilationcupy— GPU array library for Python
Key APIs
From cuda.core
Device— Initialize and manage CUDA deviceProgram— Create program from kernel source codeProgramOptions— Set compilation options (C++ standard, architecture)LaunchConfig— Configure kernel launch parameterslaunch— Execute kernel on specified stream
Import stable symbols from the top-level package (not cuda.core.experimental). See the cuda.core documentation.
From CuPy
cp.random.rand()— Generate random arrays on GPUcp.empty()— Allocate uninitialized GPU arrayscp.allclose()— Verify results with tolerance
From cuda_samples_utils
verify_array_result()— Verify computation results
Kernel Techniques
- 1D Grid-Stride Loop — Handle arbitrary array sizes with fixed grid
- Template Programming — Generic kernel for different data types
- Bounds Checking — Prevent out-of-bounds memory access
Requirements
Hardware
- NVIDIA GPU with Compute Capability 7.0 or higher
- Minimum GPU memory: 512 MB
Software
- CUDA Toolkit 13.0 or newer (matches
cuda-python13.x) - Python 3.10 or newer
cuda-python(>=13.0.0)cuda-core(>=0.6.0)cupy-cuda13x(>=13.0.0)
Installation
Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/1_GettingStarted/vectorAdd
pip install -r requirements.txt
The requirements.txt installs:
cuda-python(>=13.0.0)cuda-core(>=0.6.0)cupy-cuda13x(>=13.0.0)
How to Run
Basic usage
cd samples/python/1_GettingStarted/vectorAdd
python vectorAdd.py
With custom parameters
# Custom vector size
python vectorAdd.py --elements 1000000
# Use specific GPU
python vectorAdd.py --device 1
# Skip verification for benchmarking
python vectorAdd.py --no-verify
Expected Output
[Vector addition using CUDA Core API]
Device: <Your GPU Name>
Compute Capability: sm_<XX>
Compiling kernel 'vectorAdd<float>'...
Kernel compiled successfully
[Vector addition of 50000 elements]
CUDA kernel launch with 196 blocks of 256 threads
Verifying result...
Test PASSED
Done
Note: Device name and compute capability will vary based on your GPU.
Files
vectorAdd.py— Python implementation using cuda.core APIREADME.md— This filerequirements.txt— Sample dependencies../../Utilities/cuda_samples_utils.py— Common utilities (imported by this sample)