Update 1_Utilities/README.md to redirect bandwidthTest to NVBandwidth (#371 )

Merge pull request #368 from XSShawnZeng/master
Update the vulkan headers include sequence and the transpose code format check
2025-07-01 20:20:29 +08:00 · 2025-05-22 11:43:14 -07:00 · 2025-05-21 09:27:13 -07:00 · 2025-05-19 17:43:08 +08:00 · 2025-05-19 17:38:42 +08:00 · 2025-05-19 17:38:22 +08:00
5 changed files with 27 additions and 9 deletions
--- a/Samples/1_Utilities/README.md
+++ b/Samples/1_Utilities/README.md
@ -1,9 +1,5 @@
 # 1. Utilities

-
-### [bandwidthTest](./bandwidthTest)
-This is a simple test program to measure the memcopy bandwidth of the GPU and memcpy bandwidth across PCI-e. This test application is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and page-locked memory, and device to host copy bandwidth for pageable and page-locked memory.
-
 ### [deviceQuery](./deviceQuery)
 This sample enumerates the properties of the CUDA devices present in the system.

@ -12,3 +8,8 @@ This sample enumerates the properties of the CUDA devices present using CUDA Dri

 ### [topologyQuery](./topologyQuery)
 A simple example on how to query the topology of a system with multiple GPU
+
+## Note
+
+### bandwidthTest
+The bandwidthTest sample was out-of-date and has been removed as of the CUDA Samples 12.9 release (see the [change log](../../CHANGELOG.md)). For up-to-date bandwidth measurements, refer instead to the [NVBandwith](https://github.com/nvidia/nvbandwidth) utility.
--- a/Samples/5_Domain_Specific/simpleVulkan/VulkanBaseApp.h
+++ b/Samples/5_Domain_Specific/simpleVulkan/VulkanBaseApp.h
@ -34,8 +34,10 @@
 #include <vulkan/vulkan.h>
 #ifdef _WIN64
 #define NOMINMAX
-#include <vulkan/vulkan_win32.h>
+// Add windows.h to the include path
 #include <windows.h>
+// Add vulkan_win32.h to the include path
+#include <vulkan/vulkan_win32.h>
 #endif /* _WIN64 */

 /* remove _VK_TIMELINE_SEMAPHORE to use binary semaphores */
--- a/Samples/5_Domain_Specific/simpleVulkanMMAP/VulkanBaseApp.h
+++ b/Samples/5_Domain_Specific/simpleVulkanMMAP/VulkanBaseApp.h
@ -34,8 +34,10 @@
 #include <vulkan/vulkan.h>
 #ifdef _WIN64
 #define NOMINMAX
-#include <vulkan/vulkan_win32.h>
+// Add windows.h to the include path firstly as dependency for other Windows headers
 #include <windows.h>
+// Add other Windows headers
+#include <vulkan/vulkan_win32.h>
 #endif /* _WIN64 */

 struct GLFWwindow;
--- a/Samples/5_Domain_Specific/vulkanImageCUDA/vulkanImageCUDA.cu
+++ b/Samples/5_Domain_Specific/vulkanImageCUDA/vulkanImageCUDA.cu
@ -27,10 +27,12 @@

 #define GLFW_INCLUDE_VULKAN
 #ifdef _WIN64
+// Add windows.h to the include path firstly as dependency for other Windows headers
+#include <windows.h>
+// Add other Windows headers
 #include <VersionHelpers.h>
 #include <aclapi.h>
 #include <dxgi1_2.h>
-#include <windows.h>
 #define _USE_MATH_DEFINES
 #endif

--- a/Samples/6_Performance/transpose/transpose.cu
+++ b/Samples/6_Performance/transpose/transpose.cu
@ -103,7 +103,7 @@ __global__ void copySharedMem(float *odata, float *idata, int width, int height)

    for (int i = 0; i < TILE_DIM; i += BLOCK_ROWS) {
        if (xIndex < width && yIndex < height) {
-            tile[threadIdx.y][threadIdx.x] = idata[index];
+            tile[threadIdx.y + i][threadIdx.x] = idata[index + i * width];
        }
    }

@ -111,7 +111,7 @@ __global__ void copySharedMem(float *odata, float *idata, int width, int height)

    for (int i = 0; i < TILE_DIM; i += BLOCK_ROWS) {
        if (xIndex < height && yIndex < width) {
-            odata[index] = tile[threadIdx.y][threadIdx.x];
+            odata[index + i * width] = tile[threadIdx.y + i][threadIdx.x];
        }
    }
 }
@ -596,6 +596,17 @@ int main(int argc, char **argv)
               (size_x * size_y),
               1,
               TILE_DIM * BLOCK_ROWS);
+
+        // Reset d_odata to zero before starting the next loop iteration to avoid
+        // carrying over results from previous kernels. Without this reset, residual
+        // data from a prior kernel (e.g., 'copy') could make a subsequent
+        // kernel (e.g., 'copySharedMem') appear correct even if it performs no work,
+        // leading to false positives in compareData.
+        for (int i = 0; i < (size_x * size_y); ++i) {
+            h_odata[i] = 0;
+        }
+        // copy host data to device
+        checkCudaErrors(cudaMemcpy(d_odata, h_odata, mem_size, cudaMemcpyHostToDevice));
    }

    // cleanup
Author	SHA1	Message	Date
Rob Armstrong	8a9e2c830c	Update 1_Utilities/README.md to redirect bandwidthTest to NVBandwidth (#371 )	2025-05-22 11:43:14 -07:00
Rob Armstrong	adacf1cffd	Merge pull request #368 from XSShawnZeng/master Update the vulkan headers include sequence and the transpose code format check	2025-05-21 09:27:13 -07:00
shawnz	da3b7a2b3c	Update the vulkanImageCUDA/vulkanImageCUDA.cu for Windows headers	2025-05-19 17:43:08 +08:00
shawnz	5987a9e9fa	Update transpose for code format check	2025-05-19 17:38:42 +08:00
shawnz	107f3f537f	Update the include files sequence for vulkan samples on Windows	2025-05-19 17:38:22 +08:00
Francesco Rizzi	b530f1cf42	Fix bug in 6_Performance/transpose: copy sharedmem kernel (#363 ) Update kernel loop bounds handling, main loop data copy to avoid incorrect reuse of output results. --------- Authored-by: Francesco Rizzi <francesco.rizzi@ng-analytics.com>	2025-05-05 08:43:23 -07:00