cuda-samples/bin/x86_64/linux/release/APM_globalToShmemAsyncCopy.txt
2023-03-01 01:41:29 +00:00

12 lines
495 B
Plaintext

[globalToShmemAsyncCopy] - Starting...
GPU Device 0: "Hopper" with compute capability 9.0
MatrixA(1280,1280), MatrixB(1280,1280)
Running kernel = 0 - AsyncCopyMultiStageLargeChunk
Computing result using CUDA Kernel...
done
Performance= 5289.33 GFlop/s, Time= 0.793 msec, Size= 4194304000 Ops, WorkgroupSize= 256 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.