Bug 5659370: Update the README.md of sample subfolder according to the latest structure of the sample folder

2026-03-12 21:15:42 +08:00 · 2025-11-14 18:18:09 +08:00 · 2025-11-14 18:18:09 +08:00 · df6edf644e
commit df6edf644e
parent 85231cd1b6
5 changed files with 42 additions and 58 deletions
--- a/README.md
+++ b/README.md
@ -449,6 +449,9 @@ Samples that demonstrate performance optimization.
 ### [7. libNVVM](./Samples/7_libNVVM/README.md)
 Samples that demonstrate the use of libNVVVM and NVVM IR.

+### [8. Platform Specific](./Samples/8_Platform_Specific/Tegra/README.md)
+Samples that are specific to certain platforms (Tegra, cuDLA, NvMedia, NvSci, OpenGL ES).
+
 ## Dependencies

 Some CUDA Samples rely on third-party applications and/or libraries, or features provided by the CUDA Toolkit and Driver, to either build or execute. These dependencies are listed below.
--- a/Samples/2_Concepts_and_Techniques/README.md
+++ b/Samples/2_Concepts_and_Techniques/README.md
@ -10,9 +10,6 @@ This sample implements a separable convolution filter of a 2D signal with a gaus
 ### [convolutionTexture](./convolutionTexture)
 Texture-based implementation of a separable 2D convolution with a gaussian kernel. Used for performance comparison against convolutionSeparable.

-### [cuHook](./cuHook)
-This sample demonstrates how to build and use an intercept library with CUDA. The library has to be loaded via LD_PRELOAD, e.g. LD_PRELOAD=<full_path>/libcuhook.so.1 ./cuHook
-
 ### [dct8x8](./dct8x8)
 This sample demonstrates how Discrete Cosine Transform (DCT) for blocks of 8 by 8 pixels can be performed using CUDA: a naive implementation by definition and a more traditional approach used in many libraries. As opposed to implementing DCT in a fragment shader, CUDA allows for an easier and more efficient implementation.

@ -22,9 +19,6 @@ Demonstrates CUDA and EGL Streams interop, where consumer's EGL Stream is on one
 ### [EGLStream_CUDA_Interop](./EGLStream_CUDA_Interop)
 Demonstrates data exchange between CUDA and EGL Streams.

-### [EGLSync_CUDAEvent_Interop](./EGLSync_CUDAEvent_Interop)
-Demonstrates interoperability between CUDA Event and EGL Sync/EGL Image using which one can achieve synchronization on GPU itself for GL-EGL-CUDA operations instead of blocking CPU for synchronization.
-
 ### [eigenvalues](./eigenvalues)
 The computation of all or a subset of all eigenvalues is an important problem in Linear Algebra, statistics, physics, and many other fields. This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a tridiagonal symmetric matrix of arbitrary size with CUDA.

--- a/Samples/4_CUDA_Libraries/README.md
+++ b/Samples/4_CUDA_Libraries/README.md
@ -4,9 +4,6 @@
 ### [batchCUBLAS](./batchCUBLAS)
 A CUDA Sample that demonstrates how using batched CUBLAS API calls to improve overall performance.

-### [batchedLabelMarkersAndLabelCompressionNPP](./batchedLabelMarkersAndLabelCompressionNPP)
-An NPP CUDA Sample that demonstrates how to use the NPP label markers generation and label compression functions based on a Union Find (UF) algorithm including both single image and batched image versions.
-
 ### [boxFilterNPP](./boxFilterNPP)
 A NPP CUDA Sample that demonstrates how to use NPP FilterBox function to perform a Box Filter.

@ -34,18 +31,6 @@ This sample implements a conjugate gradient solver on GPU using CUBLAS and CUSPA
 ### [cudaNvSci](./cudaNvSci)
 This sample demonstrates CUDA-NvSciBuf/NvSciSync Interop. Two CPU threads import the NvSciBuf and NvSciSync into CUDA to perform two image processing algorithms on a ppm image - image rotation in 1st thread & rgba to grayscale conversion of rotated image in 2nd thread. Currently only supported on Ubuntu 18.04

-### [cudaNvSciNvMedia](./cudaNvSciNvMedia)
-This sample demonstrates CUDA-NvMedia interop via NvSciBuf/NvSciSync APIs. Note that this sample only supports cross build from x86_64 to aarch64, aarch64 native build is not supported. For detailed workflow of the sample please check cudaNvSciNvMedia_Readme.pdf in the sample directory.
-
-### [cuDLAErrorReporting](./cuDLAErrorReporting)
-This sample demonstrates how DLA errors can be detected via CUDA.
-
-### [cuDLAHybridMode](./cuDLAHybridMode)
-This sample demonstrates cuDLA hybrid mode wherein DLA can be programmed using CUDA.
-
-### [cuDLAStandaloneMode](./cuDLAStandaloneMode)
-This sample demonstrates cuDLA standalone mode wherein DLA can be programmed without using CUDA.
-
 ### [cuSolverDn_LinearSolver](./cuSolverDn_LinearSolver)
 A CUDA Sample that demonstrates cuSolverDN's LU, QR and Cholesky factorization.

--- a/Samples/5_Domain_Specific/README.md
+++ b/Samples/5_Domain_Specific/README.md
@ -35,15 +35,9 @@ Naturally(Hadamard)-ordered Fast Walsh Transform for batching vectors of arbitra
 ### [FDTD3d](./FDTD3d)
 This sample applies a finite differences time domain progression stencil on a 3D surface.

-### [fluidsD3D9](./fluidsD3D9)
-An example of fluid simulation using CUDA and CUFFT, with Direct3D 9 rendering.  A Direct3D Capable device is required.
-
 ### [fluidsGL](./fluidsGL)
 An example of fluid simulation using CUDA and CUFFT, with OpenGL rendering.

-### [fluidsGLES](./fluidsGLES)
-An example of fluid simulation using CUDA and CUFFT, with OpenGLES rendering.
-
 ### [HSOpticalFlow](./HSOpticalFlow)
 Variational optical flow estimation example.  Uses textures for image operations. Shows how simple PDE solver can be accelerated with CUDA.

@ -57,10 +51,7 @@ This sample extracts a geometric isosurface from a volume dataset using the marc
 This sample evaluates fair call price for a given set of European options using the Monte Carlo approach, taking advantage of all CUDA-capable GPUs installed in the system. This sample use double precision hardware if a GTX 200 class GPU is present.  The sample also takes advantage of CUDA 4.0 capability to supporting using a single CPU thread to control multiple GPUs

 ### [nbody](./nbody)
-This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA.  This sample accompanies the GPU Gems 3 chapter "Fast N-Body Simulation with CUDA".  With CUDA 5.5, performance on Tesla K20c has increased to over 1.8TFLOP/s single precision.  Double Performance has also improved on all Kepler and Fermi GPU architectures as well.  Starting in CUDA 4.0, the nBody sample has been updated to take advantage of new features to easily scale the n-body simulation across multiple GPUs in a single PC.  Adding "-numbodies=<bodies>" to the command line will allow users to set # of bodies for simulation.  Adding “-numdevices=<N>” to the command line option will cause the sample to use N devices (if available) for simulation.  In this mode, the position and velocity data for all bodies are read from system memory using “zero copy” rather than from device memory.  For a small number of devices (4 or fewer) and a large enough number of bodies, bandwidth is not a bottleneck so we can achieve strong scaling across these devices.
-
-### [nbody_opengles](./nbody_opengles)
-This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA. Unlike the OpenGL nbody sample, there is no user interaction.
+This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA.  This sample accompanies the GPU Gems 3 chapter "Fast N-Body Simulation with CUDA".  With CUDA 5.5, performance on Tesla K20c has increased to over 1.8TFLOP/s single precision.  Double Performance has also improved on all Kepler and Fermi GPU architectures as well.  Starting in CUDA 4.0, the nBody sample has been updated to take advantage of new features to easily scale the n-body simulation across multiple GPUs in a single PC.  Adding "-numbodies=<bodies>" to the command line will allow users to set # of bodies for simulation.  Adding "-numdevices=<N>" to the command line option will cause the sample to use N devices (if available) for simulation.  In this mode, the position and velocity data for all bodies are read from system memory using "zero copy" rather than from device memory.  For a small number of devices (4 or fewer) and a large enough number of bodies, bandwidth is not a bottleneck so we can achieve strong scaling across these devices.

 ### [nbody_screen](./nbody_screen)
 This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA. Unlike the OpenGL nbody sample, there is no user interaction.
@ -83,15 +74,6 @@ This sample implements Niederreiter Quasirandom Sequence Generator and Inverse C
 ### [recursiveGaussian](./recursiveGaussian)
 This sample implements a Gaussian blur using Deriche's recursive method. The advantage of this method is that the execution time is independent of the filter width.

-### [simpleD3D10](./simpleD3D10)
-Simple program which demonstrates interoperability between CUDA and Direct3D10. The program generates a vertex array with CUDA and uses Direct3D10 to render the geometry.  A Direct3D Capable device is required.
-
-### [simpleD3D10RenderTarget](./simpleD3D10RenderTarget)
-Simple program which demonstrates interop of rendertargets between Direct3D10 and CUDA. The program uses RenderTarget positions with CUDA and generates a histogram with visualization.  A Direct3D10 Capable device is required.
-
-### [simpleD3D10Texture](./simpleD3D10Texture)
-Simple program which demonstrates how to interoperate CUDA with Direct3D10 Texture.  The program creates a number of D3D10 Textures (2D, 3D, and CubeMap) which are generated from CUDA kernels. Direct3D then renders the results on the screen.  A Direct3D10 Capable device is required.
-
 ### [simpleD3D11](./simpleD3D11)
 Simple program which demonstrates  how to use the CUDA D3D11 External Resource Interoperability APIs to update D3D11 buffers from CUDA and synchronize between D3D11 and CUDA with Keyed Mutexes.

@ -102,21 +84,9 @@ Simple program which demonstrates Direct3D11 Texture interoperability with CUDA.
 ### [simpleD3D12](./simpleD3D12)
 A program which demonstrates Direct3D12 interoperability with CUDA.  The program creates a sinewave in DX12 vertex buffer which is created using CUDA kernels. DX12 and CUDA synchronizes using DirectX12 Fences. Direct3D then renders the results on the screen.  A DirectX12 Capable NVIDIA GPU is required on Windows10 or higher OS.

-### [simpleD3D9](./simpleD3D9)
-Simple program which demonstrates interoperability between CUDA and Direct3D9. The program generates a vertex array with CUDA and uses Direct3D9 to render the geometry.  A Direct3D capable device is required.
-
-### [simpleD3D9Texture](./simpleD3D9Texture)
-Simple program which demonstrates Direct3D9 Texture interoperability with CUDA.  The program creates a number of D3D9 Textures (2D, 3D, and CubeMap) which are written to from CUDA kernels. Direct3D then renders the results on the screen.  A Direct3D capable device is required.
-
 ### [simpleGL](./simpleGL)
 Simple program which demonstrates interoperability between CUDA and OpenGL. The program modifies vertex positions with CUDA and uses OpenGL to render the geometry.

-### [simpleGLES](./simpleGLES)
-Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry.
-
-### [simpleGLES_EGLOutput](./simpleGLES_EGLOutput)
-Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry, and shows how to render directly to the display using the EGLOutput mechanism and the DRM library.
-
 ### [simpleGLES_screen](./simpleGLES_screen)
 Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry.

@ -126,9 +96,6 @@ This sample demonstrates Vulkan CUDA Interop. CUDA imports the Vulkan vertex buf
 ### [simpleVulkanMMAP](./simpleVulkanMMAP)
 This sample demonstrates Vulkan CUDA Interop via cuMemMap APIs. CUDA exports buffers that Vulkan imports as vertex buffer. CUDA invokes kernels to operate on vertices and synchronizes with Vulkan through vulkan semaphores imported by CUDA. This sample depends on Vulkan SDK, GLFW3 libraries, for building this sample please refer to "Build_instructions.txt" provided in this sample's directory

-### [SLID3D10Texture](./SLID3D10Texture)
-Simple program which demonstrates SLI with Direct3D10 Texture interoperability with CUDA.  The program creates a D3D10 Texture which is written to from a CUDA kernel. Direct3D then renders the results on the screen.  A Direct3D Capable device is required.
-
 ### [smokeParticles](./smokeParticles)
 Smoke simulation with volumetric shadows using half-angle slicing technique. Uses CUDA for procedural simulation, Thrust Library for sorting algorithms, and OpenGL for graphics rendering.

@ -141,9 +108,6 @@ This sample implements Sobol Quasirandom Sequence Generator.
 ### [stereoDisparity](./stereoDisparity)
 A CUDA program that demonstrates how to compute a stereo disparity map using SIMD SAD (Sum of Absolute Difference) intrinsics.  Requires Compute Capability 2.0 or higher.

-### [VFlockingD3D10](./VFlockingD3D10)
-The sample models formation of V-shaped flocks by big birds, such as geese and cranes. The algorithms of such flocking are borrowed from the paper "V-like formations in flocks of artificial birds" from Artificial Life, Vol. 14, No. 2, 2008. The sample has CPU- and GPU-based implementations. Press 'g' to toggle between them. The GPU-based simulation works many times faster than the CPU-based one. The printout in the console window reports the simulation time per step. Press 'r' to reset the initial distribution of birds.
-
 ### [volumeFiltering](./volumeFiltering)
 This sample demonstrates 3D Volumetric Filtering using 3D Textures and 3D Surface Writes.

--- a/Samples/8_Platform_Specific/Tegra/README.md
+++ b/Samples/8_Platform_Specific/Tegra/README.md
@ -0,0 +1,38 @@
+# 8. Platform_Specific/Tegra
+
+
+### [cudaNvSciNvMedia](./cudaNvSciNvMedia)
+This sample demonstrates CUDA-NvMedia interop via NvSciBuf/NvSciSync APIs. Note that this sample only supports cross build from x86_64 to aarch64, aarch64 native build is not supported. For detailed workflow of the sample please check cudaNvSciNvMedia_Readme.pdf in the sample directory.
+
+### [cudaNvSciBufMultiplanar](./cudaNvSciBufMultiplanar)
+This sample demonstrates CUDA-NvSciBuf Interop for Multiplanar images. A YUV 420 multiplanar image is flipped and allocated using NvSciBuf APIs and imported into CUDA with CUDA External Resource Interoperability. A CUDA surface is created from the corresponding mapped CUDA array and again bit flipping is performed on the surface. The result is copied back to a YUV image which is compared against the input.
+
+### [cuDLAErrorReporting](./cuDLAErrorReporting)
+This sample demonstrates how DLA errors can be detected via CUDA.
+
+### [cuDLAHybridMode](./cuDLAHybridMode)
+This sample demonstrates cuDLA hybrid mode wherein DLA can be programmed using CUDA.
+
+### [cuDLALayerwiseStatsHybrid](./cuDLALayerwiseStatsHybrid)
+This sample is used to provide layerwise statistics to the application in the cuDLA hybrid mode wherein DLA is programmed using CUDA.
+
+### [cuDLALayerwiseStatsStandalone](./cuDLALayerwiseStatsStandalone)
+This sample is used to provide layerwise statistics to the application in cuDLA standalone mode where DLA is programmed without using CUDA.
+
+### [cuDLAStandaloneMode](./cuDLAStandaloneMode)
+This sample demonstrates cuDLA standalone mode wherein DLA can be programmed without using CUDA.
+
+### [EGLSync_CUDAEvent_Interop](./EGLSync_CUDAEvent_Interop)
+Demonstrates interoperability between CUDA Event and EGL Sync/EGL Image using which one can achieve synchronization on GPU itself for GL-EGL-CUDA operations instead of blocking CPU for synchronization.
+
+### [fluidsGLES](./fluidsGLES)
+An example of fluid simulation using CUDA and CUFFT, with OpenGLES rendering.
+
+### [nbody_opengles](./nbody_opengles)
+This sample demonstrates efficient all-pairs simulation of a gravitational n-body simulation in CUDA. Unlike the OpenGL nbody sample, there is no user interaction.
+
+### [simpleGLES](./simpleGLES)
+Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry.
+
+### [simpleGLES_EGLOutput](./simpleGLES_EGLOutput)
+Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry, and shows how to render directly to the display using the EGLOutput mechanism and the DRM library.