mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2026-05-14 14:06:53 +08:00
- Added Python samples for CUDA Python 1.0 release - Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
135 lines
3.7 KiB
Markdown
135 lines
3.7 KiB
Markdown
# CUDA Python Utilities
|
|
|
|
Common utilities for CUDA Python samples using the `cuda.core` API.
|
|
|
|
## Overview
|
|
|
|
This module provides reusable utility functions for CUDA samples to reduce code duplication. Samples import from `cuda_samples_utils.py` using simple path-based imports (no package structure needed).
|
|
|
|
## Installation Requirements
|
|
|
|
Install from the Python samples directory:
|
|
|
|
```bash
|
|
cd /path/to/cuda-samples/Python
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
This installs a common CUDA 13 stack (see `python/requirements.txt`):
|
|
|
|
- `cuda-python` (>=13.0.0)
|
|
- `cuda-core` (>=0.6.0)
|
|
- `cupy-cuda13x` (>=13.0.0)
|
|
- `numpy` (>=2.3.2)
|
|
|
|
## How to Use in Samples
|
|
|
|
Import utilities using path-based import:
|
|
|
|
```python
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
# Add Utilities directory to path
|
|
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "Utilities"))
|
|
from cuda_samples_utils import verify_array_result
|
|
|
|
# Use the utility
|
|
if verify_array_result(result, expected):
|
|
print("Success!")
|
|
```
|
|
|
|
## Available Functions
|
|
|
|
### Result Verification
|
|
|
|
#### `verify_array_result(result, expected, rtol=1e-5, atol=1e-8, verbose=True)`
|
|
|
|
Verify computed results match expected values. The helper detects whether both
|
|
arguments are NumPy arrays or both are CuPy arrays and uses the matching
|
|
library's `allclose` (no unnecessary cross-device transfers).
|
|
|
|
**Parameters:**
|
|
- `result`: NumPy or CuPy array with computed results
|
|
- `expected`: NumPy or CuPy array with expected values (same kind as `result`)
|
|
- `rtol`: Relative tolerance (default: 1e-5)
|
|
- `atol`: Absolute tolerance (default: 1e-8)
|
|
- `verbose`: Print test result (default: True)
|
|
|
|
**Returns:**
|
|
- `True` if results match within tolerance, `False` otherwise
|
|
|
|
**Example:**
|
|
```python
|
|
expected = a + b
|
|
if verify_array_result(c, expected):
|
|
print("Computation correct!")
|
|
```
|
|
|
|
### Package Check
|
|
|
|
#### `check_cuda_requirements()`
|
|
|
|
Check if required CUDA packages are available.
|
|
|
|
**Returns:**
|
|
- `True` if requirements are met, `False` otherwise
|
|
|
|
**Example:**
|
|
```python
|
|
if not check_cuda_requirements():
|
|
sys.exit(1)
|
|
```
|
|
|
|
## Design Philosophy
|
|
|
|
These utilities focus on common operations that are **not** part of `cuda.core` API:
|
|
- Result verification for NumPy or CuPy arrays
|
|
- Package requirements checking
|
|
|
|
For CUDA operations like device initialization, kernel compilation, and grid size calculations, samples should use `cuda.core` API directly to demonstrate the proper usage patterns.
|
|
|
|
## Complete Example
|
|
|
|
See `../1_GettingStarted/vectorAdd/vectorAdd.py` for a complete example:
|
|
|
|
```python
|
|
import sys
|
|
from pathlib import Path
|
|
|
|
# Import utility
|
|
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "Utilities"))
|
|
from cuda_samples_utils import verify_array_result
|
|
|
|
import cupy as cp
|
|
from cuda.core import Device, Program, ProgramOptions, LaunchConfig, launch
|
|
|
|
# Use cuda.core directly for device and kernel operations
|
|
device = Device(0)
|
|
device.set_current()
|
|
|
|
program_options = ProgramOptions(std="c++17", arch=f"sm_{device.arch}")
|
|
program = Program(kernel_source, code_type="c++", options=program_options)
|
|
module = program.compile("cubin", name_expressions=("kernel_name",))
|
|
kernel = module.get_kernel("kernel_name")
|
|
|
|
# Calculate grid size inline
|
|
threads_per_block = 256
|
|
blocks_per_grid = (num_elements + threads_per_block - 1) // threads_per_block
|
|
|
|
# Launch kernel - pass cupy arrays directly
|
|
config = LaunchConfig(grid=blocks_per_grid, block=threads_per_block)
|
|
launch(stream, config, kernel, a, b, c, cp.int32(num_elements))
|
|
|
|
# Verify results using utility
|
|
verify_array_result(c, expected)
|
|
```
|
|
|
|
## Benefits
|
|
|
|
- **Code Reuse**: Write common functionality once
|
|
- **Consistency**: All samples use the same patterns
|
|
- **Maintainability**: Bug fixes benefit all samples
|
|
- **Transparency**: Samples show cuda.core API usage directly
|
|
- **Simplicity**: No complex package structure needed
|