mirror of https://github.com/NVIDIA/cuda-samples.git synced 2026-06-04 00:06:52 +08:00

Release v13.3 of the CUDA samples with CUDA 13.3 Toolkit (#435 )

This is the release of the CUDA 13.3 samples, which include additions for CUDA Tile C++, and updated CCCL and Python samples.

2026-05-27 16:50:59 -05:00

5.0 KiB

Raw Blame History

Sample: Memory Resources and Buffers (Python)

Description

This sample demonstrates the cuda.core memory management model: a MemoryResource owns a pool of memory and hands out Buffer objects that can be passed to kernels, copied between resources with Buffer.copy_to(), and viewed as NumPy or CuPy arrays through DLPack. The script exercises three common resources side-by-side:

DeviceMemoryResource - device-local GPU memory. Every Device exposes a default pool via Device.memory_resource, and applications can create additional pools explicitly.
PinnedMemoryResource - page-locked host memory, used here as the input and output staging buffers around a GPU kernel (the canonical pinned-H2D / compute / pinned-D2H pattern).
ManagedMemoryResource - unified memory that the driver migrates between host and device on demand; host views see the GPU's writes without an explicit copy.

The same scale_and_bias kernel runs on each resource and every result is verified on the host.

What You'll Learn

Creating and using DeviceMemoryResource, PinnedMemoryResource, and ManagedMemoryResource
Allocating Buffer objects from a resource with a bound stream
Copying between buffers across resources with Buffer.copy_to()
Taking zero-copy NumPy or CuPy views of a Buffer via DLPack
Releasing buffers with stream-ordered close(stream) semantics

Key Libraries

cuda.core - Pythonic access to CUDA runtime, programs, and memory resources
cupy - GPU array views of device buffers
numpy - host array views of pinned and managed buffers

Key APIs

From `cuda.core`

Device.memory_resource - default memory pool attached to a device
DeviceMemoryResource, PinnedMemoryResource, ManagedMemoryResource - allocate buffers of the corresponding memory kind
MemoryResource.allocate(nbytes, stream=...) - returns a Buffer
Buffer.copy_to(dst_buffer, stream=...) - async, stream-ordered copy
Buffer.close(stream) - stream-ordered deallocation
Buffer supports __dlpack__ for zero-copy views

From CuPy and NumPy

cp.from_dlpack() / np.from_dlpack() - zero-copy array view of a Buffer

From `cuda_samples_utils`

print_gpu_info() - print device name and compute capability

Requirements

Hardware

NVIDIA GPU with Compute Capability 7.0 or higher
Managed memory support (most discrete GPUs)

Software

CUDA Toolkit 13.0 or newer (matches cuda-python 13.x)
Python 3.10 or newer
cuda-python (>=13.0.0)
cuda-core (>=1.0.0)
cupy-cuda13x (>=14.0.0)

Platform Support

The ManagedMemoryResource demo in this sample exercises concurrent host access to managed allocations while the GPU is active, which requires the device property concurrent_managed_access=True. This is only supported on Linux with HMM (Pascal and newer). On Windows (WDDM/MCDM/TCC) the property is False, so the sample exits early with a waive message and exit code 2. The DeviceMemoryResource + PinnedMemoryResource demos in this sample would still work on Windows on their own, but to keep the sample self-contained the entire script waives when concurrent managed access is unavailable.

Installation

Install the required packages from requirements.txt:

cd /path/to/cuda-samples/python/2_CoreConcepts/memoryResources
pip install -r requirements.txt

The requirements.txt installs:

cuda-python (>=13.0.0)
cuda-core (>=1.0.0)
cupy-cuda13x (>=14.0.0)

How to Run

Basic usage

cd cuda-samples/python/2_CoreConcepts/memoryResources
python memoryResources.py

With custom parameters

# Larger buffer size
python memoryResources.py --elements 1048576

# Use a specific GPU
python memoryResources.py --device 1

Expected Output

Device: <Your GPU Name>
Compute Capability: <X.Y>

[1] DeviceMemoryResource + PinnedMemoryResource (staging)
  Pinned staging, device kernel, and copy_to verified

[2] ManagedMemoryResource (unified memory)
  GPU writes observed directly through the host-visible mapping

[3] Explicit DeviceMemoryResource
  Explicit DeviceMemoryResource allocation verified

All memory resource demos passed.

Note: Device name and compute capability will vary based on your GPU.

Files

memoryResources.py - Python implementation using cuda.core memory resources
README.md - This file
requirements.txt - Sample dependencies
../../Utilities/cuda_samples_utils.py - Common utilities (imported by this sample)

5.0 KiB Raw Blame History