cuda-samples/bin/x86_64/linux/release/APM_globalToShmemAsyncCopy.txt

[globalToShmemAsyncCopy] - Starting...
GPU Device 0: "Hopper" with compute capability 9.0

MatrixA(1280,1280), MatrixB(1280,1280)
Running kernel = 0 - AsyncCopyMultiStageLargeChunk
Computing result using CUDA Kernel...
done
Performance= 5289.33 GFlop/s, Time= 0.793 msec, Size= 4194304000 Ops, WorkgroupSize= 256 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Updating Samples for 12.1 2023-03-01 09:41:29 +08:00			`[globalToShmemAsyncCopy] - Starting...`
			`GPU Device 0: "Hopper" with compute capability 9.0`

			`MatrixA(1280,1280), MatrixB(1280,1280)`
			`Running kernel = 0 - AsyncCopyMultiStageLargeChunk`
			`Computing result using CUDA Kernel...`
			`done`
			`Performance= 5289.33 GFlop/s, Time= 0.793 msec, Size= 4194304000 Ops, WorkgroupSize= 256 threads/block`
			`Checking computed result for correctness: Result = PASS`

			`NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.`