Dheemanth aeab82ff30
CUDA 13.2 samples update (#432)
- Added Python samples for CUDA Python 1.0 release
- Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
2026-05-13 17:13:18 -05:00

73 lines
2.1 KiB
Markdown

# Sample: Kernel Nsys Profiling - CUDA C++ Kernel Profiling with cuda.core (Python)
## Description
This sample demonstrates how to profile custom CUDA C++ kernels compiled and launched with `cuda.core` using NVIDIA Nsight Systems. It implements three GPU operations (vector addition, SAXPY, vector transform) as custom kernels and shows how to instrument code with NVTX markers for profiling analysis.
## What you will learn
- How to write and compile CUDA C++ kernels with `cuda.core.Program`
- How to launch kernels with `LaunchConfig` and manage CUDA streams
- How to use NVTX markers (`nvtx.annotate()`) to annotate code sections
- How to profile kernels with Nsight Systems and analyze performance
- Modern CUDA Python workflow with `cuda.core.Device` and proper resource cleanup
## Requirements
- NVIDIA GPU with Compute Capability 7.0+
- CUDA Toolkit 13.0+
- Python 3.10+
- Packages: `numpy`, `cuda-python`, `cuda-core`, `cupy-cuda13x`, `nvtx` (see `requirements.txt`; NumPy >=2.3.2)
**Install:**
```bash
pip install -r requirements.txt
```
## How to run
```bash
python kernelNsysProfile.py
python kernelNsysProfile.py --array-size 10000000 # Custom size
```
## Nsys Profiling
**Basic profile:**
```bash
nsys profile -o gpu_profile python kernelNsysProfile.py
nsys-ui gpu_profile.nsys-rep # View results
```
The program uses color-coded NVTX markers:
- **Purple**: Phase 2 (cuda.core Custom Kernels - main focus)
- **Yellow/Blue/Green**: Other phases
- **Cyan**: Nested operations
Focus on Phase 2 to analyze kernel execution times, launch overhead, and GPU utilization.
**For detailed Nsys usage and analysis techniques, see the [NVIDIA Nsight Systems documentation](https://docs.nvidia.com/nsight-systems/).**
## Troubleshooting
**Missing packages:**
```bash
pip install -r requirements.txt
```
**Out of memory:**
```bash
python kernelNsysProfile.py -n 10000000 # Reduce array size
```
**Nsys not found:**
```bash
export PATH=/usr/local/cuda/bin:$PATH
```
## See Also
- [CUDA Python Documentation](https://nvidia.github.io/cuda-python/)
- [NVIDIA Nsight Systems Documentation](https://docs.nvidia.com/nsight-systems/)
- [CuPy Documentation](https://docs.cupy.dev/)