mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2026-05-14 14:06:53 +08:00
- Added Python samples for CUDA Python 1.0 release - Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
6.5 KiB
6.5 KiB
processCheckpoint (Python)
Description
This sample demonstrates how to use the CUDA process checkpoint API
via cuda.core.checkpoint.Process to suspend, capture, and restore the
CUDA state of a running Linux process.
CUDA process checkpointing is the driver-level primitive that powers
CRIU + cuda-checkpoint integration.
The sample:
- Allocates a GPU buffer and fills it with a deterministic pattern via a small kernel.
- Reads the buffer back to host and computes a SHA-256 hash.
- Runs the full checkpoint lifecycle on its own process:
lock → checkpoint → restore → unlock. - Reads the buffer back again and verifies that the hash is unchanged, proving that GPU memory contents survived the round trip.
The sample prints the CUDA process state after each step so the full state machine is visible:
lock() checkpoint() restore() unlock()
running ---------> locked ------------> checkpointed -----------> locked ---------> running
What You'll Learn
- Creating a
cuda.core.checkpoint.Processfor the current process by PID and observing its.statetransitions. - Running the full
lock → checkpoint → restore → unlockcycle with a lock timeout. - The fact that
restore()leaves the process in thelockedstate; you must still callunlock()to return torunning. - Verifying that GPU memory is preserved across the checkpoint round-trip by comparing SHA-256 hashes of the buffer before and after.
- The rough cost of each step (checkpoint and restore dominate and scale with the device-memory footprint being captured).
Key Libraries
cuda.core- device management, memory allocation, kernel compilation and
launch, and the
checkpoint.Processwrapper.
- device management, memory allocation, kernel compilation and
launch, and the
cuda.bindings- used directly for a pageable
cuMemcpyDtoH.
- used directly for a pageable
Key APIs
From cuda.core.checkpoint
checkpoint.Process(pid)- create a handle to a CUDA process by PID. Acceptsos.getpid()for the self-checkpoint case shown here.Process.state- one of"running","locked","checkpointed", or"failed".Process.lock(timeout_ms=…)- block further CUDA API calls on the process; completes already-submitted work. Always pass a non-zero timeout to avoid deadlocks.Process.checkpoint()- copy device memory to host-side driver allocations and release GPU resources. Process state becomescheckpointed.Process.restore(gpu_mapping=None)- re-acquire GPU resources and copy memory back to device. Leaves the process in thelockedstate.Process.unlock()- return the process torunning.Process.restore_thread_id- thread ID thatrestore()must be called from in the target process (not used in the self-checkpoint case here).
From cuda.core
Device.set_current()/Device.memory_resource.allocate(...)/Stream,LaunchConfig,Program,launch- standard device, compile, and launch primitives used to produce the buffer contents.
From cuda.bindings.driver
cuMemcpyDtoH(host_ptr, device_handle, nbytes)- synchronous D2H copy into a pageable host buffer.
Requirements
Hardware
- Any NVIDIA GPU supported by CUDA process checkpointing. CUDA checkpointing is currently limited to x86-64 Linux.
Software
- Linux (the CUDA checkpoint API is Linux-only).
- NVIDIA driver with CUDA process checkpoint support.
- CUDA Toolkit 13.0 or newer.
- Python 3.10 or newer.
cuda-core >= 1.0.0.
Installation
Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/2_CoreConcepts/processCheckpoint
pip install -r requirements.txt
How to Run
Basic usage
python processCheckpoint.py
Larger GPU footprint to see checkpoint time scale
python processCheckpoint.py --buffer-mib 512
Use a specific GPU
python processCheckpoint.py --device 1
All options
--device CUDA device ID (default: 0)
--buffer-mib GPU buffer size in MiB (default: 16)
--lock-timeout-ms Timeout passed to Process.lock in ms (default: 5000)
Expected Output
On an RTX 4090 with a 16 MiB buffer:
[Process Checkpoint Sample using CUDA Core API]
PID: 748330
Device: NVIDIA GeForce RTX 4090
Compute Capability: sm_89
Buffer size: 16 MiB
Lock timeout: 5000 ms
Compiling kernel ...
Writing deterministic pattern to GPU buffer ...
Buffer hash (before): b045f7975dc23352
Running checkpoint lifecycle on self ...
step duration (ms) state after
--------------------------------------------------
initial - running
lock 0.578 locked
checkpoint 268.369 checkpointed
restore 235.024 locked
unlock 1.648 running
--------------------------------------------------
total 505.618
Buffer hash (before): b045f7975dc23352
Buffer hash (after): b045f7975dc23352
PASS: GPU buffer contents survived checkpoint/restore.
Done
What to look for:
- The four state transitions are all observable:
running → locked → checkpointed → locked → running. Note thatrestore()leaves the process inlocked, notrunning. - The checkpoint and restore steps dominate the wall-clock time
(hundreds of ms even for a small buffer) - they copy GPU memory to
and from driver-managed host allocations. Increasing
--buffer-mibvisibly increases the checkpoint time. - The
lockandunlocksteps are essentially free (sub-ms) - they just flip the process state. - The SHA-256 hashes before and after match, proving the GPU memory contents survived the round trip.
Exact timings vary with GPU model, driver version, system load, and the size of the device memory footprint being captured.
Files
processCheckpoint.py- Python implementation usingcuda.core.checkpointREADME.md- This filerequirements.txt- Sample dependencies
See Also
- CUDA Python Documentation
NVIDIA/cuda-checkpoint- the CUDA checkpoint/restore utility, the CRIU plugin, and C
reference programs (
r570-features.c,r580-migration-api.c).
- the CUDA checkpoint/restore utility, the CRIU plugin, and C
reference programs (
- Checkpointing CUDA Applications with CRIU
- NVIDIA technical blog post on the broader CRIU workflow.