GPU Device 0: "Hopper" with compute capability 9.0 Running ........................................................ Overall Time For matrixMultiplyPerf Printing Average of 20 measurements in (ms) Size_KB UMhint UMhntAs UMeasy 0Copy MemCopy CpAsync CpHpglk CpPglAs 4 0.210 0.264 0.332 0.014 0.033 0.026 0.037 0.024 16 0.201 0.307 0.489 0.025 0.043 0.035 0.046 0.045 64 0.311 0.381 0.758 0.067 0.084 0.075 0.074 0.063 256 0.545 0.604 1.429 0.323 0.228 0.212 0.197 0.187 1024 1.551 1.444 2.436 1.902 0.831 0.784 0.714 0.728 4096 4.960 4.375 7.863 11.966 3.239 3.179 2.908 2.919 16384 18.911 17.022 29.696 77.375 13.796 13.757 12.874 12.862 NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.