mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2024-11-28 18:09:18 +08:00
90 lines
3.4 KiB
Plaintext
90 lines
3.4 KiB
Plaintext
|
GPU Device 0: "Hopper" with compute capability 9.0
|
||
|
|
||
|
Driver version is: 12.0
|
||
|
Running sample.
|
||
|
================================
|
||
|
Running virtual address reuse example.
|
||
|
Sequential allocations & frees within a single graph enable CUDA to reuse virtual addresses.
|
||
|
|
||
|
Check confirms that d_a and d_b share a virtual address.
|
||
|
FOOTPRINT: 67108864 bytes
|
||
|
|
||
|
Cleaning up example by trimming device memory.
|
||
|
FOOTPRINT: 0 bytes
|
||
|
|
||
|
================================
|
||
|
Running physical memory reuse example.
|
||
|
CUDA reuses the same physical memory for allocations from separate graphs when the allocation lifetimes don't overlap.
|
||
|
|
||
|
Creating the graph execs does not reserve any physical memory.
|
||
|
FOOTPRINT: 0 bytes
|
||
|
|
||
|
The first graph launched reserves the memory it needs.
|
||
|
FOOTPRINT: 67108864 bytes
|
||
|
A subsequent launch of the same graph in the same stream reuses the same physical memory. Thus the memory footprint does not grow here.
|
||
|
FOOTPRINT: 67108864 bytes
|
||
|
|
||
|
Subsequent launches of other graphs in the same stream also reuse the physical memory. Thus the memory footprint does not grow here.
|
||
|
01: FOOTPRINT: 67108864 bytes
|
||
|
02: FOOTPRINT: 67108864 bytes
|
||
|
03: FOOTPRINT: 67108864 bytes
|
||
|
04: FOOTPRINT: 67108864 bytes
|
||
|
05: FOOTPRINT: 67108864 bytes
|
||
|
06: FOOTPRINT: 67108864 bytes
|
||
|
07: FOOTPRINT: 67108864 bytes
|
||
|
|
||
|
Check confirms all graphs use a different virtual address.
|
||
|
|
||
|
Cleaning up example by trimming device memory.
|
||
|
FOOTPRINT: 0 bytes
|
||
|
|
||
|
================================
|
||
|
Running simultaneous streams example.
|
||
|
Graphs that can run concurrently need separate physical memory. In this example, each graph launched in a separate stream increases the total memory footprint.
|
||
|
|
||
|
When launching a new graph, CUDA may reuse physical memory from a graph whose execution has already finished -- even if the new graph is being launched in a different stream from the completed graph. Therefore, a kernel node is added to the graphs to increase runtime.
|
||
|
|
||
|
Initial footprint:
|
||
|
FOOTPRINT: 0 bytes
|
||
|
|
||
|
Each graph launch in a seperate stream grows the memory footprint:
|
||
|
01: FOOTPRINT: 67108864 bytes
|
||
|
02: FOOTPRINT: 134217728 bytes
|
||
|
03: FOOTPRINT: 201326592 bytes
|
||
|
04: FOOTPRINT: 268435456 bytes
|
||
|
05: FOOTPRINT: 335544320 bytes
|
||
|
06: FOOTPRINT: 402653184 bytes
|
||
|
07: FOOTPRINT: 402653184 bytes
|
||
|
|
||
|
Cleaning up example by trimming device memory.
|
||
|
FOOTPRINT: 0 bytes
|
||
|
|
||
|
================================
|
||
|
Running unfreed streams example.
|
||
|
CUDA cannot reuse phyiscal memory from graphs which do not free their allocations.
|
||
|
|
||
|
Despite being launched in the same stream, each graph launch grows the memory footprint. Since the allocation is not freed, CUDA keeps the memory valid for use.
|
||
|
00: FOOTPRINT: 67108864 bytes
|
||
|
01: FOOTPRINT: 134217728 bytes
|
||
|
02: FOOTPRINT: 201326592 bytes
|
||
|
03: FOOTPRINT: 268435456 bytes
|
||
|
04: FOOTPRINT: 335544320 bytes
|
||
|
05: FOOTPRINT: 402653184 bytes
|
||
|
06: FOOTPRINT: 469762048 bytes
|
||
|
07: FOOTPRINT: 536870912 bytes
|
||
|
|
||
|
Trimming does not impact the memory footprint since the un-freed allocations are still holding onto the memory.
|
||
|
FOOTPRINT: 536870912 bytes
|
||
|
|
||
|
Freeing the allocations does not shrink the footprint.
|
||
|
FOOTPRINT: 536870912 bytes
|
||
|
|
||
|
Since the allocations are now freed, trimming does reduce the footprint even when the graph execs are not yet destroyed.
|
||
|
FOOTPRINT: 0 bytes
|
||
|
|
||
|
Cleaning up example by trimming device memory.
|
||
|
FOOTPRINT: 0 bytes
|
||
|
|
||
|
================================
|
||
|
Sample complete.
|