mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2026-05-14 14:06:53 +08:00
- Added Python samples for CUDA Python 1.0 release - Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
141 lines
4.7 KiB
Markdown
141 lines
4.7 KiB
Markdown
# ipcMemoryPool (Python)
|
|
|
|
## Description
|
|
|
|
This sample demonstrates how to share GPU memory between Python
|
|
processes using CUDA Inter-Process Communication (IPC) and
|
|
`cuda.core`'s IPC-enabled memory pools.
|
|
|
|
By default each process has its own CUDA virtual address space and
|
|
cannot see allocations made by another process. With an IPC-enabled
|
|
`DeviceMemoryResource` the parent allocates once, and the child
|
|
process maps that same physical GPU memory into its own address space
|
|
so both read and write the same bytes. The sample performs a
|
|
round-trip test:
|
|
|
|
1. Parent creates an IPC-enabled `DeviceMemoryResource` and allocates
|
|
a `Buffer`.
|
|
2. Parent fills the buffer with a known pattern.
|
|
3. Parent sends the `Buffer` to a child process through an
|
|
`multiprocessing.Queue`. cuda.core's pickle reducers re-create the
|
|
memory resource and map the buffer in the child.
|
|
4. Child verifies the parent's pattern, writes a new pattern, and
|
|
signals completion.
|
|
5. Parent verifies the child's writes.
|
|
|
|
## What You'll Learn
|
|
|
|
- Enabling IPC on a `DeviceMemoryResource` with `ipc_enabled=True`
|
|
- Sending `Buffer` objects across process boundaries via `mp.Queue`
|
|
- How cuda.core's pickle reducers rebuild the MR and map the buffer
|
|
in the receiving process
|
|
- Why `multiprocessing` must use the `"spawn"` start method with CUDA
|
|
- Detecting IPC support at runtime (POSIX file-descriptor handle
|
|
type, memory-pool support, Linux-only)
|
|
|
|
## Key Libraries
|
|
|
|
- [`cuda.core`](https://nvidia.github.io/cuda-python/cuda-core/latest/) - IPC-enabled memory resources and buffer reducers
|
|
- `cupy` - zero-copy views over the shared device memory via DLPack
|
|
- `multiprocessing` - standard library process management
|
|
|
|
## Key APIs
|
|
|
|
### From `cuda.core`
|
|
|
|
- `DeviceMemoryResource(device, options=DeviceMemoryResourceOptions(ipc_enabled=True))` - create an IPC-enabled memory pool
|
|
- `DeviceMemoryResourceOptions(max_size=..., ipc_enabled=True)` - configure the underlying pool
|
|
- `mr.allocate(nbytes)` - allocate a `Buffer` from the IPC pool
|
|
- `Buffer.is_mapped` - True when the buffer is usable in the current process
|
|
- `Device.properties.memory_pools_supported` - runtime feature check
|
|
- `Device.properties.handle_type_posix_file_descriptor_supported` - runtime feature check
|
|
|
|
### From `cuda_samples_utils`
|
|
|
|
- `print_gpu_info()` - print device name and compute capability
|
|
|
|
## Requirements
|
|
|
|
### Hardware
|
|
|
|
- NVIDIA GPU with Compute Capability 7.0 or higher
|
|
- Device that supports CUDA memory pools and POSIX file-descriptor IPC handles (the sample detects and reports this at startup)
|
|
- Minimum GPU memory: 512 MB
|
|
|
|
### Software
|
|
|
|
- Linux x86_64 (POSIX file-descriptor IPC handles are not available on Windows or macOS)
|
|
- CUDA Toolkit 13.0 or newer (matches `cuda-python` 13.x)
|
|
- Python 3.10 or newer
|
|
- `cuda-python` (>=13.0.0)
|
|
- `cuda-core` (>=0.6.0)
|
|
- `cupy-cuda13x` (>=13.0.0)
|
|
|
|
## Installation
|
|
|
|
Install the required packages from `requirements.txt`:
|
|
|
|
```bash
|
|
cd /path/to/cuda-samples/python/4_DistributedComputing/ipcMemoryPool
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
The `requirements.txt` installs:
|
|
|
|
- `cuda-python` (>=13.0.0)
|
|
- `cuda-core` (>=0.6.0)
|
|
- `cupy-cuda13x` (>=13.0.0)
|
|
|
|
## How to Run
|
|
|
|
### Basic usage
|
|
|
|
```bash
|
|
cd cuda-samples/python/4_DistributedComputing/ipcMemoryPool
|
|
python ipcMemoryPool.py
|
|
```
|
|
|
|
### With custom parameters
|
|
|
|
```bash
|
|
# Larger shared buffer
|
|
python ipcMemoryPool.py --elements 65536
|
|
|
|
# Use a specific GPU
|
|
python ipcMemoryPool.py --device 1
|
|
```
|
|
|
|
On platforms or devices that do not support CUDA IPC, the sample
|
|
prints a diagnostic and exits cleanly with status 0.
|
|
|
|
## Expected Output
|
|
|
|
```
|
|
Device: <Your GPU Name>
|
|
Compute Capability: <X.Y>
|
|
|
|
Created IPC-enabled DeviceMemoryResource (is_ipc_enabled=True)
|
|
Parent wrote pattern (first 5 values): [100. 101. 102. 103. 104.]
|
|
Parent sent buffer to child pid=<pid>; waiting...
|
|
[child pid=<pid>] received buffer: is_mapped=True, size=4096
|
|
Parent sees child's pattern (first 5 values): [-0. -1. -2. -3. -4.]
|
|
IPC round-trip: OK
|
|
```
|
|
|
|
**Note:** Device name, compute capability, and child PID will vary
|
|
based on your system.
|
|
|
|
## Files
|
|
|
|
- `ipcMemoryPool.py` - Python implementation using `cuda.core` IPC memory pools
|
|
- `README.md` - This file
|
|
- `requirements.txt` - Sample dependencies
|
|
- `../../Utilities/cuda_samples_utils.py` - Common utilities (imported by this sample)
|
|
|
|
## See Also
|
|
|
|
- [CUDA Python Documentation](https://nvidia.github.io/cuda-python/)
|
|
- [`cuda.core` memory API](https://nvidia.github.io/cuda-python/cuda-core/latest/api.html#memory)
|
|
- Upstream `cuda.core` IPC tests: [`test_memory_ipc.py`](https://github.com/NVIDIA/cuda-python/blob/main/cuda_core/tests/memory_ipc/test_memory_ipc.py)
|
|
- [CUDA IPC programming guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#interprocess-communication)
|