[ simpleStreams ] Device synchronization method set to = 0 (Automatic Blocking) Setting reps to 100 to demonstrate steady state > GPU Device 0: "Hopper" with compute capability 9.0 Device: canMapHostMemory: Yes > CUDA Capable: SM 9.0 hardware > 114 Multiprocessor(s) x 128 (Cores/Multiprocessor) = 14592 (Cores) > scale_factor = 1.0000 > array_size = 16777216 > Using CPU/GPU Device Synchronization method (cudaDeviceScheduleAuto) > mmap() allocating 64.00 Mbytes (generic page-aligned system memory) > cudaHostRegister() registering 64.00 Mbytes of generic allocated system memory Starting Test memcopy: 2.45 kernel: 0.12 non-streamed: 2.80 4 streams: 2.66 -------------------------------