[Matrix Multiply CUBLAS] - Starting... GPU Device 0: "Hopper" with compute capability 9.0 GPU Device 0: "NVIDIA H100 PCIe" with compute capability 9.0 MatrixA(640,480), MatrixB(480,320), MatrixC(640,320) Computing result using CUBLAS...done. Performance= 10873.05 GFlop/s, Time= 0.018 msec, Size= 196608000 Ops Computing result using host CPU...done. Comparing CUBLAS Matrix Multiply with CPU results: PASS NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.