mirror of https://github.com/NVIDIA/cuda-samples.git synced 2026-06-04 16:26:53 +08:00

History

- Added Python samples for CUDA Python 1.0 release
- Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.

2026-05-13 17:13:18 -05:00

ipcMemoryPool.py

CUDA 13.2 samples update (#432 )

2026-05-13 17:13:18 -05:00

README.md

CUDA 13.2 samples update (#432 )

2026-05-13 17:13:18 -05:00

requirements.txt

CUDA 13.2 samples update (#432 )

2026-05-13 17:13:18 -05:00

README.md

ipcMemoryPool (Python)

Description

This sample demonstrates how to share GPU memory between Python processes using CUDA Inter-Process Communication (IPC) and cuda.core's IPC-enabled memory pools.

By default each process has its own CUDA virtual address space and cannot see allocations made by another process. With an IPC-enabled DeviceMemoryResource the parent allocates once, and the child process maps that same physical GPU memory into its own address space so both read and write the same bytes. The sample performs a round-trip test:

Parent creates an IPC-enabled DeviceMemoryResource and allocates a Buffer.
Parent fills the buffer with a known pattern.
Parent sends the Buffer to a child process through an multiprocessing.Queue. cuda.core's pickle reducers re-create the memory resource and map the buffer in the child.
Child verifies the parent's pattern, writes a new pattern, and signals completion.
Parent verifies the child's writes.

What You'll Learn

Enabling IPC on a DeviceMemoryResource with ipc_enabled=True
Sending Buffer objects across process boundaries via mp.Queue
How cuda.core's pickle reducers rebuild the MR and map the buffer in the receiving process
Why multiprocessing must use the "spawn" start method with CUDA
Detecting IPC support at runtime (POSIX file-descriptor handle type, memory-pool support, Linux-only)

Key Libraries

cuda.core - IPC-enabled memory resources and buffer reducers
cupy - zero-copy views over the shared device memory via DLPack
multiprocessing - standard library process management

Key APIs

From `cuda.core`

DeviceMemoryResource(device, options=DeviceMemoryResourceOptions(ipc_enabled=True)) - create an IPC-enabled memory pool
DeviceMemoryResourceOptions(max_size=..., ipc_enabled=True) - configure the underlying pool
mr.allocate(nbytes) - allocate a Buffer from the IPC pool
Buffer.is_mapped - True when the buffer is usable in the current process
Device.properties.memory_pools_supported - runtime feature check
Device.properties.handle_type_posix_file_descriptor_supported - runtime feature check

From `cuda_samples_utils`

print_gpu_info() - print device name and compute capability

Requirements

Hardware

NVIDIA GPU with Compute Capability 7.0 or higher
Device that supports CUDA memory pools and POSIX file-descriptor IPC handles (the sample detects and reports this at startup)
Minimum GPU memory: 512 MB

Software

Linux x86_64 (POSIX file-descriptor IPC handles are not available on Windows or macOS)
CUDA Toolkit 13.0 or newer (matches cuda-python 13.x)
Python 3.10 or newer
cuda-python (>=13.0.0)
cuda-core (>=0.6.0)
cupy-cuda13x (>=13.0.0)

Installation

Install the required packages from requirements.txt:

cd /path/to/cuda-samples/python/4_DistributedComputing/ipcMemoryPool
pip install -r requirements.txt

The requirements.txt installs:

cuda-python (>=13.0.0)
cuda-core (>=0.6.0)
cupy-cuda13x (>=13.0.0)

How to Run

Basic usage

cd cuda-samples/python/4_DistributedComputing/ipcMemoryPool
python ipcMemoryPool.py

With custom parameters

# Larger shared buffer
python ipcMemoryPool.py --elements 65536

# Use a specific GPU
python ipcMemoryPool.py --device 1

On platforms or devices that do not support CUDA IPC, the sample prints a diagnostic and exits cleanly with status 0.

Expected Output

Device: <Your GPU Name>
Compute Capability: <X.Y>

Created IPC-enabled DeviceMemoryResource (is_ipc_enabled=True)
Parent wrote pattern (first 5 values): [100. 101. 102. 103. 104.]
Parent sent buffer to child pid=<pid>; waiting...
[child pid=<pid>] received buffer: is_mapped=True, size=4096
Parent sees child's pattern (first 5 values): [-0. -1. -2. -3. -4.]
IPC round-trip: OK

Note: Device name, compute capability, and child PID will vary based on your system.

Files

ipcMemoryPool.py - Python implementation using cuda.core IPC memory pools
README.md - This file
requirements.txt - Sample dependencies
../../Utilities/cuda_samples_utils.py - Common utilities (imported by this sample)

README.md

ipcMemoryPool (Python)

Description

What You'll Learn

Key Libraries

Key APIs

From cuda.core

From cuda_samples_utils

Requirements

Hardware

Software

Installation

How to Run

Basic usage

With custom parameters

Expected Output

Files

See Also

From `cuda.core`

From `cuda_samples_utils`