9316529638
unit: 4_CUDA_Libraries/cudaNvSciNvMedia/cuda_consumer.cu Because of the change of padding size in NvSciBuf, the cudaExtent.width and cudaExtent.height should be change Bug 3880762 |
||
---|---|---|
.. | ||
batchCUBLAS | ||
batchedLabelMarkersAndLabelCompressionNPP | ||
boxFilterNPP | ||
cannyEdgeDetectorNPP | ||
conjugateGradient | ||
conjugateGradientCudaGraphs | ||
conjugateGradientMultiBlockCG | ||
conjugateGradientMultiDeviceCG | ||
conjugateGradientPrecond | ||
conjugateGradientUM | ||
cudaNvSci | ||
cudaNvSciNvMedia | ||
cuDLAErrorReporting | ||
cuDLAHybridMode | ||
cuDLAStandaloneMode | ||
cuSolverDn_LinearSolver | ||
cuSolverRf | ||
cuSolverSp_LinearSolver | ||
cuSolverSp_LowlevelCholesky | ||
cuSolverSp_LowlevelQR | ||
FilterBorderControlNPP | ||
freeImageInteropNPP | ||
histEqualizationNPP | ||
lineOfSight | ||
matrixMulCUBLAS | ||
MersenneTwisterGP11213 | ||
nvJPEG | ||
nvJPEG_encoder | ||
oceanFFT | ||
randomFog | ||
simpleCUBLAS | ||
simpleCUBLAS_LU | ||
simpleCUBLASXT | ||
simpleCUFFT | ||
simpleCUFFT_2d_MGPU | ||
simpleCUFFT_callback | ||
simpleCUFFT_MGPU | ||
watershedSegmentationNPP | ||
README.md |
4. CUDA Libraries
batchCUBLAS
A CUDA Sample that demonstrates how using batched CUBLAS API calls to improve overall performance.
batchedLabelMarkersAndLabelCompressionNPP
An NPP CUDA Sample that demonstrates how to use the NPP label markers generation and label compression functions based on a Union Find (UF) algorithm including both single image and batched image versions.
boxFilterNPP
A NPP CUDA Sample that demonstrates how to use NPP FilterBox function to perform a Box Filter.
cannyEdgeDetectorNPP
An NPP CUDA Sample that demonstrates the recommended parameters to use with the nppiFilterCannyBorder_8u_C1R Canny Edge Detection image filter function. This function expects a single channel 8-bit grayscale input image. You can generate a grayscale image from a color image by first calling nppiColorToGray() or nppiRGBToGray(). The Canny Edge Detection function combines and improves on the techniques required to produce an edge detection image using multiple steps.
conjugateGradient
This sample implements a conjugate gradient solver on GPU using CUBLAS and CUSPARSE library.
conjugateGradientCudaGraphs
This sample implements a conjugate gradient solver on GPU using CUBLAS and CUSPARSE library calls captured and called using CUDA Graph APIs.
conjugateGradientMultiBlockCG
This sample implements a conjugate gradient solver on GPU using Multi Block Cooperative Groups, also uses Unified Memory.
conjugateGradientMultiDeviceCG
This sample implements a conjugate gradient solver on multiple GPUs using Multi Device Cooperative Groups, also uses Unified Memory optimized using prefetching and usage hints.
conjugateGradientPrecond
This sample implements a preconditioned conjugate gradient solver on GPU using CUBLAS and CUSPARSE library.
conjugateGradientUM
This sample implements a conjugate gradient solver on GPU using CUBLAS and CUSPARSE library, using Unified Memory
cudaNvSci
This sample demonstrates CUDA-NvSciBuf/NvSciSync Interop. Two CPU threads import the NvSciBuf and NvSciSync into CUDA to perform two image processing algorithms on a ppm image - image rotation in 1st thread & rgba to grayscale conversion of rotated image in 2nd thread. Currently only supported on Ubuntu 18.04
cudaNvSciNvMedia
This sample demonstrates CUDA-NvMedia interop via NvSciBuf/NvSciSync APIs. Note that this sample only supports cross build from x86_64 to aarch64, aarch64 native build is not supported. For detailed workflow of the sample please check cudaNvSciNvMedia_Readme.pdf in the sample directory.
cuDLAErrorReporting
This sample demonstrates how DLA errors can be detected via CUDA.
cuDLAHybridMode
This sample demonstrates cuDLA hybrid mode wherein DLA can be programmed using CUDA.
cuDLAStandaloneMode
This sample demonstrates cuDLA standalone mode wherein DLA can be programmed without using CUDA.
cuSolverDn_LinearSolver
A CUDA Sample that demonstrates cuSolverDN's LU, QR and Cholesky factorization.
cuSolverRf
A CUDA Sample that demonstrates cuSolver's refactorization library - CUSOLVERRF.
cuSolverSp_LinearSolver
A CUDA Sample that demonstrates cuSolverSP's LU, QR and Cholesky factorization.
cuSolverSp_LowlevelCholesky
A CUDA Sample that demonstrates Cholesky factorization using cuSolverSP's low level APIs.
cuSolverSp_LowlevelQR
A CUDA Sample that demonstrates QR factorization using cuSolverSP's low level APIs.
FilterBorderControlNPP
This sample demonstrates how any border version of an NPP filtering function can be used in the most common mode, with border control enabled. Mentioned functions can be used to duplicate the results of the equivalent non-border version of the NPP functions. They can be also used for enabling and disabling border control on various source image edges depending on what portion of the source image is being used as input.
freeImageInteropNPP
A simple CUDA Sample demonstrate how to use FreeImage library with NPP.
histEqualizationNPP
This CUDA Sample demonstrates how to use NPP for histogram equalization for image data.
lineOfSight
This sample is an implementation of a simple line-of-sight algorithm: Given a height map and a ray originating at some observation point, it computes all the points along the ray that are visible from the observation point. The implementation is based on the Thrust library.
matrixMulCUBLAS
This sample implements matrix multiplication from Chapter 3 of the programming guide. To illustrate GPU performance for matrix multiply, this sample also shows how to use the new CUDA 4.0 interface for CUBLAS to demonstrate high-performance performance for matrix multiplication.
MersenneTwisterGP11213
This sample demonstrates the Mersenne Twister random number generator GP11213 in cuRAND.
nvJPEG
A CUDA Sample that demonstrates single and batched decoding of jpeg images using NVJPEG Library.
nvJPEG_encoder
A CUDA Sample that demonstrates single encoding of jpeg images using NVJPEG Library.
oceanFFT
This sample simulates an Ocean height field using CUFFT Library and renders the result using OpenGL.
randomFog
This sample illustrates pseudo- and quasi- random numbers produced by CURAND.
simpleCUBLAS
Example of using CUBLAS API interface to perform GEMM operations.
simpleCUBLAS_LU
CUDA sample demonstrating cuBLAS API cublasDgetrfBatched() for lower-upper (LU) decomposition of a matrix.
simpleCUBLASXT
Example of using CUBLAS-XT library which performs GEMM operations over Multiple GPUs.
simpleCUFFT
Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into frequency domain, multiplying them together, and transforming the signal back to time domain. cuFFT plans are created using simple and advanced API functions.
simpleCUFFT_2d_MGPU
Example of using CUFFT. In this example, CUFFT is used to compute the 2D-convolution of some signal with some filter by transforming both into frequency domain, multiplying them together, and transforming the signal back to time domain on Multiple GPU.
simpleCUFFT_callback
Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into frequency domain, multiplying them together, and transforming the signal back to time domain. The difference between this example and the Simple CUFFT example is that the multiplication step is done by the CUFFT kernel with a user-supplied CUFFT callback routine, rather than by a separate kernel call.
simpleCUFFT_MGPU
Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into frequency domain, multiplying them together, and transforming the signal back to time domain on Multiple GPU.
watershedSegmentationNPP
An NPP CUDA Sample that demonstrates how to use the NPP watershed segmentation function.