mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2026-05-14 14:06:53 +08:00
- Added Python samples for CUDA Python 1.0 release - Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
4.4 KiB
4.4 KiB
Sample: Memory Resources and Buffers (Python)
Description
This sample demonstrates the cuda.core memory management model: a
MemoryResource owns a pool of memory and hands out Buffer objects that
can be passed to kernels, copied between resources with
Buffer.copy_to(), and viewed as NumPy or CuPy arrays through DLPack. The
script exercises three common resources side-by-side:
DeviceMemoryResource- device-local GPU memory. EveryDeviceexposes a default pool viaDevice.memory_resource, and applications can create additional pools explicitly.PinnedMemoryResource- page-locked host memory, used here as the input and output staging buffers around a GPU kernel (the canonical pinned-H2D / compute / pinned-D2H pattern).ManagedMemoryResource- unified memory that the driver migrates between host and device on demand; host views see the GPU's writes without an explicit copy.
The same scale_and_bias kernel runs on each resource and every result is
verified on the host.
What You'll Learn
- Creating and using
DeviceMemoryResource,PinnedMemoryResource, andManagedMemoryResource - Allocating
Bufferobjects from a resource with a bound stream - Copying between buffers across resources with
Buffer.copy_to() - Taking zero-copy NumPy or CuPy views of a
Buffervia DLPack - Releasing buffers with stream-ordered
close(stream)semantics
Key Libraries
cuda.core- Pythonic access to CUDA runtime, programs, and memory resourcescupy- GPU array views of device buffersnumpy- host array views of pinned and managed buffers
Key APIs
From cuda.core
Device.memory_resource- default memory pool attached to a deviceDeviceMemoryResource,PinnedMemoryResource,ManagedMemoryResource- allocate buffers of the corresponding memory kindMemoryResource.allocate(nbytes, stream=...)- returns aBufferBuffer.copy_to(dst_buffer, stream=...)- async, stream-ordered copyBuffer.close(stream)- stream-ordered deallocationBuffersupports__dlpack__for zero-copy views
From CuPy and NumPy
cp.from_dlpack()/np.from_dlpack()- zero-copy array view of aBuffer
From cuda_samples_utils
print_gpu_info()- print device name and compute capability
Requirements
Hardware
- NVIDIA GPU with Compute Capability 7.0 or higher
- Managed memory support (most discrete GPUs on Linux and Windows)
Software
- CUDA Toolkit 13.0 or newer (matches
cuda-python13.x) - Python 3.10 or newer
cuda-python(>=13.0.0)cuda-core(>=0.6.0)cupy-cuda13x(>=13.0.0)
Installation
Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/2_CoreConcepts/memoryResources
pip install -r requirements.txt
The requirements.txt installs:
cuda-python(>=13.0.0)cuda-core(>=0.6.0)cupy-cuda13x(>=13.0.0)
How to Run
Basic usage
cd cuda-samples/python/2_CoreConcepts/memoryResources
python memoryResources.py
With custom parameters
# Larger buffer size
python memoryResources.py --elements 1048576
# Use a specific GPU
python memoryResources.py --device 1
Expected Output
Device: <Your GPU Name>
Compute Capability: <X.Y>
[1] DeviceMemoryResource + PinnedMemoryResource (staging)
Pinned staging, device kernel, and copy_to verified
[2] ManagedMemoryResource (unified memory)
GPU writes observed directly through the host-visible mapping
[3] Explicit DeviceMemoryResource
Explicit DeviceMemoryResource allocation verified
All memory resource demos passed.
Note: Device name and compute capability will vary based on your GPU.
Files
memoryResources.py- Python implementation usingcuda.corememory resourcesREADME.md- This filerequirements.txt- Sample dependencies../../Utilities/cuda_samples_utils.py- Common utilities (imported by this sample)
See Also
- CUDA Python Documentation
cuda.corememory API- Upstream
cuda.coreexample:memory_ops.py - Upstream
cuda.coreexample:memory_pool_resources.py