cuda-samples/Samples/6_Performance/README.md

15 lines
874 B
Markdown
Raw Permalink Normal View History

2022-01-13 14:05:24 +08:00
# 6. Performance
### [alignedTypes](./alignedTypes)
A simple test, showing huge access speed gap between aligned and misaligned structures. It measures per-element copy throughput for aligned and misaligned structures on big chunks of data.
### [transpose](./transpose)
This sample demonstrates Matrix Transpose. Different performance are shown to achieve high performance.
### [UnifiedMemoryPerf](./UnifiedMemoryPerf)
This sample demonstrates the performance comparision using matrix multiplication kernel of Unified Memory with/without hints and other types of memory like zero copy buffers, pageable, pagelocked memory performing synchronous and Asynchronous transfers on a single GPU.
2024-07-26 00:30:13 +08:00
### [cudaGraphsPerfScaling](./cudaGraphsPerfScaling)
This sample demonstrates the performance characteristics of cuda graphs. It is focused on how the apis scale with graph size.