mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2026-05-14 14:06:53 +08:00
- Added Python samples for CUDA Python 1.0 release - Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
Sample: Kernel Nsys Profiling - CUDA C++ Kernel Profiling with cuda.core (Python)
Description
This sample demonstrates how to profile custom CUDA C++ kernels compiled and launched with cuda.core using NVIDIA Nsight Systems. It implements three GPU operations (vector addition, SAXPY, vector transform) as custom kernels and shows how to instrument code with NVTX markers for profiling analysis.
What you will learn
- How to write and compile CUDA C++ kernels with
cuda.core.Program - How to launch kernels with
LaunchConfigand manage CUDA streams - How to use NVTX markers (
nvtx.annotate()) to annotate code sections - How to profile kernels with Nsight Systems and analyze performance
- Modern CUDA Python workflow with
cuda.core.Deviceand proper resource cleanup
Requirements
- NVIDIA GPU with Compute Capability 7.0+
- CUDA Toolkit 13.0+
- Python 3.10+
- Packages:
numpy,cuda-python,cuda-core,cupy-cuda13x,nvtx(seerequirements.txt; NumPy >=2.3.2)
Install:
pip install -r requirements.txt
How to run
python kernelNsysProfile.py
python kernelNsysProfile.py --array-size 10000000 # Custom size
Nsys Profiling
Basic profile:
nsys profile -o gpu_profile python kernelNsysProfile.py
nsys-ui gpu_profile.nsys-rep # View results
The program uses color-coded NVTX markers:
- Purple: Phase 2 (cuda.core Custom Kernels - main focus)
- Yellow/Blue/Green: Other phases
- Cyan: Nested operations
Focus on Phase 2 to analyze kernel execution times, launch overhead, and GPU utilization.
For detailed Nsys usage and analysis techniques, see the NVIDIA Nsight Systems documentation.
Troubleshooting
Missing packages:
pip install -r requirements.txt
Out of memory:
python kernelNsysProfile.py -n 10000000 # Reduce array size
Nsys not found:
export PATH=/usr/local/cuda/bin:$PATH