# processCheckpoint (Python) ## Description This sample demonstrates how to use the **CUDA process checkpoint API** via `cuda.core.checkpoint.Process` to suspend, capture, and restore the CUDA state of a running Linux process. CUDA process checkpointing is the driver-level primitive that powers CRIU + `cuda-checkpoint` integration. The sample: 1. Allocates a GPU buffer and fills it with a deterministic pattern via a small kernel. 2. Reads the buffer back to host and computes a SHA-256 hash. 3. Runs the full checkpoint lifecycle on its own process: `lock → checkpoint → restore → unlock`. 4. Reads the buffer back again and verifies that the hash is unchanged, proving that GPU memory contents survived the round trip. The sample prints the CUDA process state after each step so the full state machine is visible: ``` lock() checkpoint() restore() unlock() running ---------> locked ------------> checkpointed -----------> locked ---------> running ``` ## What You'll Learn - Creating a `cuda.core.checkpoint.Process` for the current process by PID and observing its `.state` transitions. - Running the full `lock → checkpoint → restore → unlock` cycle with a lock timeout. - The fact that `restore()` leaves the process in the `locked` state; you must still call `unlock()` to return to `running`. - Verifying that GPU memory is preserved across the checkpoint round-trip by comparing SHA-256 hashes of the buffer before and after. - The rough cost of each step (checkpoint and restore dominate and scale with the device-memory footprint being captured). ## Key Libraries - [`cuda.core`](https://nvidia.github.io/cuda-python/cuda-core/latest/) - device management, memory allocation, kernel compilation and launch, and the `checkpoint.Process` wrapper. - [`cuda.bindings`](https://nvidia.github.io/cuda-python/cuda-bindings/latest/) - used directly for a pageable `cuMemcpyDtoH`. ## Key APIs ### From `cuda.core.checkpoint` - `checkpoint.Process(pid)` - create a handle to a CUDA process by PID. Accepts `os.getpid()` for the self-checkpoint case shown here. - `Process.state` - one of `"running"`, `"locked"`, `"checkpointed"`, or `"failed"`. - `Process.lock(timeout_ms=…)` - block further CUDA API calls on the process; completes already-submitted work. Always pass a non-zero timeout to avoid deadlocks. - `Process.checkpoint()` - copy device memory to host-side driver allocations and release GPU resources. Process state becomes `checkpointed`. - `Process.restore(gpu_mapping=None)` - re-acquire GPU resources and copy memory back to device. Leaves the process in the `locked` state. - `Process.unlock()` - return the process to `running`. - `Process.restore_thread_id` - thread ID that `restore()` must be called from in the target process (not used in the self-checkpoint case here). ### From `cuda.core` - `Device.set_current()` / `Device.memory_resource.allocate(...)` / `Stream`, `LaunchConfig`, `Program`, `launch` - standard device, compile, and launch primitives used to produce the buffer contents. ### From `cuda.bindings.driver` - `cuMemcpyDtoH(host_ptr, device_handle, nbytes)` - synchronous D2H copy into a pageable host buffer. ## Requirements ### Hardware - Any NVIDIA GPU supported by CUDA process checkpointing. CUDA checkpointing is currently limited to x86-64 Linux. ### Software - Linux (the CUDA checkpoint API is Linux-only). - NVIDIA driver with CUDA process checkpoint support. - CUDA Toolkit 13.0 or newer. - Python 3.10 or newer. - `cuda-core >= 1.0.0`. ## Installation Install the required packages from `requirements.txt`: ```bash cd /path/to/cuda-samples/python/2_CoreConcepts/processCheckpoint pip install -r requirements.txt ``` ## How to Run ### Basic usage ```bash python processCheckpoint.py ``` ### Larger GPU footprint to see checkpoint time scale ```bash python processCheckpoint.py --buffer-mib 512 ``` ### Use a specific GPU ```bash python processCheckpoint.py --device 1 ``` ### All options ``` --device CUDA device ID (default: 0) --buffer-mib GPU buffer size in MiB (default: 16) --lock-timeout-ms Timeout passed to Process.lock in ms (default: 5000) ``` ## Expected Output On an RTX 4090 with a 16 MiB buffer: ``` [Process Checkpoint Sample using CUDA Core API] PID: 748330 Device: NVIDIA GeForce RTX 4090 Compute Capability: sm_89 Buffer size: 16 MiB Lock timeout: 5000 ms Compiling kernel ... Writing deterministic pattern to GPU buffer ... Buffer hash (before): b045f7975dc23352 Running checkpoint lifecycle on self ... step duration (ms) state after -------------------------------------------------- initial - running lock 0.578 locked checkpoint 268.369 checkpointed restore 235.024 locked unlock 1.648 running -------------------------------------------------- total 505.618 Buffer hash (before): b045f7975dc23352 Buffer hash (after): b045f7975dc23352 PASS: GPU buffer contents survived checkpoint/restore. Done ``` **What to look for:** - The **four state transitions** are all observable: `running → locked → checkpointed → locked → running`. Note that `restore()` leaves the process in `locked`, not `running`. - The **checkpoint and restore steps dominate** the wall-clock time (hundreds of ms even for a small buffer) - they copy GPU memory to and from driver-managed host allocations. Increasing `--buffer-mib` visibly increases the checkpoint time. - The `lock` and `unlock` steps are essentially free (sub-ms) - they just flip the process state. - The SHA-256 **hashes before and after match**, proving the GPU memory contents survived the round trip. Exact timings vary with GPU model, driver version, system load, and the size of the device memory footprint being captured. ## Files - `processCheckpoint.py` - Python implementation using `cuda.core.checkpoint` - `README.md` - This file - `requirements.txt` - Sample dependencies ## See Also - [CUDA Python Documentation](https://nvidia.github.io/cuda-python/) - [`NVIDIA/cuda-checkpoint`](https://github.com/NVIDIA/cuda-checkpoint) - the CUDA checkpoint/restore utility, the CRIU plugin, and C reference programs (`r570-features.c`, `r580-migration-api.c`). - [Checkpointing CUDA Applications with CRIU](https://developer.nvidia.com/blog/checkpointing-cuda-applications-with-criu/) - NVIDIA technical blog post on the broader CRIU workflow.