mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2024-11-24 19:59:17 +08:00
12 lines
495 B
Plaintext
12 lines
495 B
Plaintext
[globalToShmemAsyncCopy] - Starting...
|
|
GPU Device 0: "Hopper" with compute capability 9.0
|
|
|
|
MatrixA(1280,1280), MatrixB(1280,1280)
|
|
Running kernel = 0 - AsyncCopyMultiStageLargeChunk
|
|
Computing result using CUDA Kernel...
|
|
done
|
|
Performance= 5289.33 GFlop/s, Time= 0.793 msec, Size= 4194304000 Ops, WorkgroupSize= 256 threads/block
|
|
Checking computed result for correctness: Result = PASS
|
|
|
|
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
|