mirror of
https://github.com/NVIDIA/cuda-samples.git
synced 2025-08-23 23:35:31 +08:00
Merge branch 'master' into cuda_a_dev
This commit is contained in:
commit
278f4adbd2
@ -2,6 +2,12 @@
|
||||
|
||||
### CUDA 12.9
|
||||
* Updated toolchain for cross-compilation for Tegra Linux platforms.
|
||||
* Repository has been updated with consistent code formatting across all samples
|
||||
* Many small code tweaks and bug fixes (see commit history for details)
|
||||
* Removed the following outdated samples:
|
||||
* `1_Utilities`
|
||||
* `bandwidthTest` - this sample was out of date and did not produce accurate results. For bandwidth
|
||||
testing of NVIDIA GPU platforms, please refer to [NVBandwidth](https://github.com/NVIDIA/nvbandwidth)
|
||||
|
||||
### CUDA 12.8
|
||||
* Updated build system across the repository to CMake. Removed Visual Studio project files and Makefiles.
|
||||
|
103
CONTRIBUTING.md
Normal file
103
CONTRIBUTING.md
Normal file
@ -0,0 +1,103 @@
|
||||
|
||||
# Contributing to the CUDA Samples
|
||||
|
||||
Thank you for your interest in contributing to the CUDA Samples!
|
||||
|
||||
|
||||
## Getting Started
|
||||
|
||||
1. **Fork & Clone the Repository**:
|
||||
|
||||
Fork the reporistory and clone the fork. For more information, check [GitHub's documentation on forking](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) and [cloning a repository](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository).
|
||||
|
||||
## Making Changes
|
||||
|
||||
1. **Create a New Branch**:
|
||||
|
||||
```bash
|
||||
git checkout -b your-feature-branch
|
||||
```
|
||||
|
||||
2. **Make Changes**.
|
||||
|
||||
3. **Build and Test**:
|
||||
|
||||
Ensure changes don't break existing functionality by building and running tests.
|
||||
|
||||
For more details on building and testing, refer to the [Building and Testing](#building-and-testing) section below.
|
||||
|
||||
4. **Commit Changes**:
|
||||
|
||||
```bash
|
||||
git commit -m "Brief description of the change"
|
||||
```
|
||||
|
||||
## Building and Testing
|
||||
|
||||
For information on building a running tests on the samples, please refer to the main [README](README.md)
|
||||
|
||||
## Creating a Pull Request
|
||||
|
||||
1. Push changes to your fork
|
||||
2. Create a pull request targeting the `master` branch of the original CUDA Samples repository. Refer to [GitHub's documentation](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) for more information on creating a pull request.
|
||||
3. Describe the purpose and context of the changes in the pull request description.
|
||||
|
||||
## Code Formatting (pre-commit hooks)
|
||||
|
||||
The CUDA Samples repository uses [pre-commit](https://pre-commit.com/) to execute all code linters and formatters. These
|
||||
tools ensure a consistent coding style throughout the project. Using pre-commit ensures that linter
|
||||
versions and options are aligned for all developers. Additionally, there is a CI check in place to
|
||||
enforce that committed code follows our standards.
|
||||
|
||||
The linters used by the CUDA Samples are listed in `.pre-commit-config.yaml`.
|
||||
For example, C++ and CUDA code is formatted with [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html).
|
||||
|
||||
To use `pre-commit`, install via `conda` or `pip`:
|
||||
|
||||
```bash
|
||||
conda config --add channels conda-forge
|
||||
conda install pre-commit
|
||||
```
|
||||
|
||||
```bash
|
||||
pip install pre-commit
|
||||
```
|
||||
|
||||
Then run pre-commit hooks before committing code:
|
||||
|
||||
```bash
|
||||
pre-commit run
|
||||
```
|
||||
|
||||
By default, pre-commit runs on staged files (only changes and additions that will be committed).
|
||||
To run pre-commit checks on all files, execute:
|
||||
|
||||
```bash
|
||||
pre-commit run --all-files
|
||||
```
|
||||
|
||||
Optionally, you may set up the pre-commit hooks to run automatically when you make a git commit. This can be done by running:
|
||||
|
||||
```bash
|
||||
pre-commit install
|
||||
```
|
||||
|
||||
Now code linters and formatters will be run each time you commit changes.
|
||||
|
||||
You can skip these checks with `git commit --no-verify` or with the short version `git commit -n`, althoguh please note
|
||||
that this may result in pull requests being rejected if subsequent checks fail.
|
||||
|
||||
## Review Process
|
||||
|
||||
Once submitted, maintainers will be automatically assigned to review the pull request. They might suggest changes or improvements. Constructive feedback is a part of the collaborative process, aimed at ensuring the highest quality code.
|
||||
|
||||
For constructive feedback and effective communication during reviews, we recommend following [Conventional Comments](https://conventionalcomments.org/).
|
||||
|
||||
Further recommended reading for successful PR reviews:
|
||||
|
||||
- [How to Do Code Reviews Like a Human (Part One)](https://mtlynch.io/human-code-reviews-1/)
|
||||
- [How to Do Code Reviews Like a Human (Part Two)](https://mtlynch.io/human-code-reviews-2/)
|
||||
|
||||
## Thank You
|
||||
|
||||
Your contributions enhance the CUDA Samples for the entire community. We appreciate your effort and collaboration!
|
16
README.md
16
README.md
@ -149,11 +149,13 @@ This Python3 script finds all executables in a subdirectory you choose, matching
|
||||
the following command line arguments:
|
||||
|
||||
| Switch | Purpose | Example |
|
||||
| -------- | -------------------------------------------------------------------------------------------------------------- | ----------------------- |
|
||||
| ---------- | -------------------------------------------------------------------------------------------------------------- | ----------------------- |
|
||||
| --dir | Specify the root directory to search for executables (recursively) | --dir ./build/Samples |
|
||||
| --config | JSON configuration file for executable arguments | --config test_args.json |
|
||||
| --output | Output directory for test results (stdout saved to .txt files - directory will be created if it doesn't exist) | --output ./test |
|
||||
| --args | Global arguments to pass to all executables (not currently used) | --args arg_1 arg_2 ... |
|
||||
| --parallel | Number of applications to execute in parallel. | --parallel 8 |
|
||||
|
||||
|
||||
Application configurations are loaded from `test_args.json` and matched against executable names (discarding the `.exe` extension on Windows).
|
||||
|
||||
@ -281,18 +283,18 @@ and system configuration):
|
||||
|
||||
```
|
||||
Test Summary:
|
||||
Ran 181 tests
|
||||
All tests passed!
|
||||
Ran 199 test runs for 180 executables.
|
||||
All test runs passed!
|
||||
```
|
||||
|
||||
If some samples fail, you will see something like this:
|
||||
|
||||
```
|
||||
Test Summary:
|
||||
Ran 181 tests
|
||||
Failed tests (2):
|
||||
volumeFiltering: returned 1
|
||||
postProcessGL: returned 1
|
||||
Ran 199 test runs for 180 executables.
|
||||
Failed runs (2):
|
||||
bicubicTexture (run 1/5): Failed (code 1)
|
||||
Mandelbrot (run 1/2): Failed (code 1)
|
||||
```
|
||||
|
||||
You can inspect the stdout logs in the output directory (generally `APM_<application_name>.txt` or `APM_<application_name>.run<n>.txt`) to help
|
||||
|
@ -99,8 +99,21 @@ static void childProcess(int id)
|
||||
std::vector<void *> ptrs;
|
||||
std::vector<cudaEvent_t> events;
|
||||
std::vector<char> verification_buffer(DATA_SIZE);
|
||||
char pidString[20] = {0};
|
||||
char lshmName[40] = {0};
|
||||
|
||||
if (sharedMemoryOpen(shmName, sizeof(shmStruct), &info) != 0) {
|
||||
// Use parent process ID to create a unique shared memory name for Linux multi-process
|
||||
#ifdef __linux__
|
||||
pid_t pid;
|
||||
pid = getppid();
|
||||
snprintf(pidString, sizeof(pidString), "%d", pid);
|
||||
#endif
|
||||
strcat(lshmName, shmName);
|
||||
strcat(lshmName, pidString);
|
||||
|
||||
printf("CP: lshmName = %s\n", lshmName);
|
||||
|
||||
if (sharedMemoryOpen(lshmName, sizeof(shmStruct), &info) != 0) {
|
||||
printf("Failed to create shared memory slab\n");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
@ -195,10 +208,23 @@ static void parentProcess(char *app)
|
||||
std::vector<void *> ptrs;
|
||||
std::vector<cudaEvent_t> events;
|
||||
std::vector<Process> processes;
|
||||
char pidString[20] = {0};
|
||||
char lshmName[40] = {0};
|
||||
|
||||
// Use current process ID to create a unique shared memory name for Linux multi-process
|
||||
#ifdef __linux__
|
||||
pid_t pid;
|
||||
pid = getpid();
|
||||
snprintf(pidString, sizeof(pidString), "%d", pid);
|
||||
#endif
|
||||
strcat(lshmName, shmName);
|
||||
strcat(lshmName, pidString);
|
||||
|
||||
printf("PP: lshmName = %s\n", lshmName);
|
||||
|
||||
checkCudaErrors(cudaGetDeviceCount(&devCount));
|
||||
|
||||
if (sharedMemoryCreate(shmName, sizeof(*shm), &info) != 0) {
|
||||
if (sharedMemoryCreate(lshmName, sizeof(*shm), &info) != 0) {
|
||||
printf("Failed to create shared memory slab\n");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
@ -1,4 +1,3 @@
|
||||
add_subdirectory(bandwidthTest)
|
||||
add_subdirectory(deviceQuery)
|
||||
add_subdirectory(deviceQueryDrv)
|
||||
add_subdirectory(topologyQuery)
|
||||
|
@ -1,18 +0,0 @@
|
||||
{
|
||||
"configurations": [
|
||||
{
|
||||
"name": "Linux",
|
||||
"includePath": [
|
||||
"${workspaceFolder}/**",
|
||||
"${workspaceFolder}/../../../Common"
|
||||
],
|
||||
"defines": [],
|
||||
"compilerPath": "/usr/local/cuda/bin/nvcc",
|
||||
"cStandard": "gnu17",
|
||||
"cppStandard": "gnu++14",
|
||||
"intelliSenseMode": "linux-gcc-x64",
|
||||
"configurationProvider": "ms-vscode.makefile-tools"
|
||||
}
|
||||
],
|
||||
"version": 4
|
||||
}
|
@ -1,7 +0,0 @@
|
||||
{
|
||||
"recommendations": [
|
||||
"nvidia.nsight-vscode-edition",
|
||||
"ms-vscode.cpptools",
|
||||
"ms-vscode.makefile-tools"
|
||||
]
|
||||
}
|
@ -1,10 +0,0 @@
|
||||
{
|
||||
"configurations": [
|
||||
{
|
||||
"name": "CUDA C++: Launch",
|
||||
"type": "cuda-gdb",
|
||||
"request": "launch",
|
||||
"program": "${workspaceFolder}/bandwidthTest"
|
||||
}
|
||||
]
|
||||
}
|
@ -1,15 +0,0 @@
|
||||
{
|
||||
"version": "2.0.0",
|
||||
"tasks": [
|
||||
{
|
||||
"label": "sample",
|
||||
"type": "shell",
|
||||
"command": "make dbg=1",
|
||||
"problemMatcher": ["$nvcc"],
|
||||
"group": {
|
||||
"kind": "build",
|
||||
"isDefault": true
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
@ -1,30 +0,0 @@
|
||||
cmake_minimum_required(VERSION 3.20)
|
||||
|
||||
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/../../../cmake/Modules")
|
||||
|
||||
project(bandwidthTest LANGUAGES C CXX CUDA)
|
||||
|
||||
find_package(CUDAToolkit REQUIRED)
|
||||
|
||||
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
|
||||
|
||||
set(CMAKE_CUDA_ARCHITECTURES 75 80 86 87 89 90 100 101 120)
|
||||
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Wno-deprecated-gpu-targets")
|
||||
if(ENABLE_CUDA_DEBUG)
|
||||
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -G") # enable cuda-gdb (may significantly affect performance on some targets)
|
||||
else()
|
||||
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -lineinfo") # add line information to all builds for debug tools (exclusive to -G option)
|
||||
endif()
|
||||
|
||||
# Include directories and libraries
|
||||
include_directories(../../../Common)
|
||||
|
||||
# Source file
|
||||
# Add target for bandwidthTest
|
||||
add_executable(bandwidthTest bandwidthTest.cu)
|
||||
|
||||
target_compile_options(bandwidthTest PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:--extended-lambda>)
|
||||
|
||||
target_compile_features(bandwidthTest PRIVATE cxx_std_17 cuda_std_17)
|
||||
|
||||
set_target_properties(bandwidthTest PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
|
@ -1,32 +0,0 @@
|
||||
# bandwidthTest - Bandwidth Test
|
||||
|
||||
## Description
|
||||
|
||||
This is a simple test program to measure the memcopy bandwidth of the GPU and memcpy bandwidth across PCI-e. This test application is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and page-locked memory, and device to host copy bandwidth for pageable and page-locked memory.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
CUDA Streams and Events, Performance Strategies
|
||||
|
||||
## Supported SM Architectures
|
||||
|
||||
[SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
|
||||
|
||||
## Supported OSes
|
||||
|
||||
Linux, Windows
|
||||
|
||||
## Supported CPU Architecture
|
||||
|
||||
x86_64, armv7l
|
||||
|
||||
## CUDA APIs involved
|
||||
|
||||
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
|
||||
cudaHostAlloc, cudaMemcpy, cudaMalloc, cudaMemcpyAsync, cudaFree, cudaGetErrorString, cudaMallocHost, cudaSetDevice, cudaGetDeviceProperties, cudaDeviceSynchronize, cudaEventRecord, cudaFreeHost, cudaEventDestroy, cudaEventElapsedTime, cudaGetDeviceCount, cudaEventCreate
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
|
||||
|
||||
## References (for more details)
|
File diff suppressed because it is too large
Load Diff
@ -102,13 +102,23 @@ static void childProcess(int id)
|
||||
int threads = 128;
|
||||
cudaDeviceProp prop;
|
||||
std::vector<void *> ptrs;
|
||||
pid_t pid;
|
||||
char pidString[20] = {0};
|
||||
char lshmName[40] = {0};
|
||||
|
||||
std::vector<char> verification_buffer(DATA_SIZE);
|
||||
|
||||
pid = getppid();
|
||||
snprintf(pidString, sizeof(pidString), "%d", pid);
|
||||
strcat(lshmName, shmName);
|
||||
strcat(lshmName, pidString);
|
||||
|
||||
printf("CP: lshmName = %s\n", lshmName);
|
||||
|
||||
ipcHandle *ipcChildHandle = NULL;
|
||||
checkIpcErrors(ipcOpenSocket(ipcChildHandle));
|
||||
|
||||
if (sharedMemoryOpen(shmName, sizeof(shmStruct), &info) != 0) {
|
||||
if (sharedMemoryOpen(lshmName, sizeof(shmStruct), &info) != 0) {
|
||||
printf("Failed to create shared memory slab\n");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
@ -245,6 +255,16 @@ static void parentProcess(char *app)
|
||||
std::vector<void *> ptrs;
|
||||
std::vector<Process> processes;
|
||||
cudaMemAllocationHandleType handleType = cudaMemHandleTypeNone;
|
||||
pid_t pid;
|
||||
char pidString[20] = {0};
|
||||
char lshmName[40] = {0};
|
||||
|
||||
pid = getpid();
|
||||
snprintf(pidString, sizeof(pidString), "%d", pid);
|
||||
strcat(lshmName, shmName);
|
||||
strcat(lshmName, pidString);
|
||||
|
||||
printf("PP: lshmName = %s\n", lshmName);
|
||||
|
||||
checkCudaErrors(cudaGetDeviceCount(&devCount));
|
||||
std::vector<CUdevice> devices(devCount);
|
||||
@ -252,7 +272,7 @@ static void parentProcess(char *app)
|
||||
cuDeviceGet(&devices[i], i);
|
||||
}
|
||||
|
||||
if (sharedMemoryCreate(shmName, sizeof(*shm), &info) != 0) {
|
||||
if (sharedMemoryCreate(lshmName, sizeof(*shm), &info) != 0) {
|
||||
printf("Failed to create shared memory slab\n");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
@ -310,10 +310,24 @@ static void childProcess(int devId, int id, char **argv)
|
||||
ipcHandle *ipcChildHandle = NULL;
|
||||
int blocks = 0;
|
||||
int threads = 128;
|
||||
char pidString[20] = {0};
|
||||
char lshmName[40] = {0};
|
||||
|
||||
|
||||
// Use parent process ID to create a unique shared memory name for Linux multi-process
|
||||
#ifdef __linux__
|
||||
pid_t pid;
|
||||
pid = getppid();
|
||||
snprintf(pidString, sizeof(pidString), "%d", pid);
|
||||
#endif
|
||||
strcat(lshmName, shmName);
|
||||
strcat(lshmName, pidString);
|
||||
|
||||
printf("CP: lshmName = %s\n", lshmName);
|
||||
|
||||
checkIpcErrors(ipcOpenSocket(ipcChildHandle));
|
||||
|
||||
if (sharedMemoryOpen(shmName, sizeof(shmStruct), &info) != 0) {
|
||||
if (sharedMemoryOpen(lshmName, sizeof(shmStruct), &info) != 0) {
|
||||
printf("Failed to create shared memory slab\n");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
@ -421,11 +435,24 @@ static void parentProcess(char *app)
|
||||
volatile shmStruct *shm = NULL;
|
||||
sharedMemoryInfo info;
|
||||
std::vector<Process> processes;
|
||||
char pidString[20] = {0};
|
||||
char lshmName[40] = {0};
|
||||
|
||||
// Use current process ID to create a unique shared memory name for Linux multi-process
|
||||
#ifdef __linux__
|
||||
pid_t pid;
|
||||
pid = getpid();
|
||||
snprintf(pidString, sizeof(pidString), "%d", pid);
|
||||
#endif
|
||||
strcat(lshmName, shmName);
|
||||
strcat(lshmName, pidString);
|
||||
|
||||
printf("PP: lshmName = %s\n", lshmName);
|
||||
|
||||
checkCudaErrors(cuDeviceGetCount(&devCount));
|
||||
std::vector<CUdevice> devices(devCount);
|
||||
|
||||
if (sharedMemoryCreate(shmName, sizeof(*shm), &info) != 0) {
|
||||
if (sharedMemoryCreate(lshmName, sizeof(*shm), &info) != 0) {
|
||||
printf("Failed to create shared memory slab\n");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
@ -25,8 +25,8 @@
|
||||
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
*/
|
||||
|
||||
#include <cuda_runtime.h>
|
||||
#include <cuda.h>
|
||||
#include <cuda_runtime.h>
|
||||
#include <helper_cuda.h>
|
||||
#include <helper_image.h>
|
||||
#include <vector>
|
||||
|
@ -45,13 +45,15 @@
|
||||
#include <windows.h>
|
||||
#endif
|
||||
|
||||
// includes for OpenGL
|
||||
#include <helper_gl.h>
|
||||
|
||||
// includes
|
||||
#include <cuda_gl_interop.h>
|
||||
#include <cuda_runtime.h>
|
||||
#include <cufft.h>
|
||||
#include <helper_cuda.h>
|
||||
#include <helper_functions.h>
|
||||
#include <helper_gl.h>
|
||||
#include <math.h>
|
||||
#include <math_constants.h>
|
||||
#include <stdio.h>
|
||||
|
@ -86,12 +86,14 @@
|
||||
#include <windows.h>
|
||||
#endif
|
||||
|
||||
// includes for OpenGL
|
||||
#include <helper_gl.h>
|
||||
|
||||
// includes
|
||||
#include <cuda_gl_interop.h>
|
||||
#include <cuda_runtime.h>
|
||||
#include <helper_cuda.h> // includes cuda.h and cuda_runtime_api.h
|
||||
#include <helper_functions.h>
|
||||
#include <helper_gl.h>
|
||||
#include <math.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
|
@ -28,11 +28,15 @@
|
||||
#include "render_particles.h"
|
||||
|
||||
#define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION
|
||||
|
||||
// includes for OpenGL
|
||||
#include <helper_gl.h>
|
||||
|
||||
// includes
|
||||
#include <assert.h>
|
||||
#include <cuda_gl_interop.h>
|
||||
#include <cuda_runtime.h>
|
||||
#include <helper_cuda.h>
|
||||
#include <helper_gl.h>
|
||||
#include <math.h>
|
||||
|
||||
#define GL_POINT_SPRITE_ARB 0x8861
|
||||
|
@ -31,9 +31,12 @@
|
||||
|
||||
#pragma warning(disable : 4312)
|
||||
|
||||
#include <mmsystem.h>
|
||||
// includes for Windows
|
||||
#include <windows.h>
|
||||
|
||||
// includes for multimedia
|
||||
#include <mmsystem.h>
|
||||
|
||||
// This header inclues all the necessary D3D11 and CUDA includes
|
||||
#include <cuda_d3d11_interop.h>
|
||||
#include <cuda_runtime_api.h>
|
||||
|
@ -31,9 +31,12 @@
|
||||
|
||||
#pragma warning(disable : 4312)
|
||||
|
||||
#include <mmsystem.h>
|
||||
// includes for Windows
|
||||
#include <windows.h>
|
||||
|
||||
// includes for multimedia
|
||||
#include <mmsystem.h>
|
||||
|
||||
// This header inclues all the necessary D3D11 and CUDA includes
|
||||
#include <cuda_d3d11_interop.h>
|
||||
#include <cuda_runtime_api.h>
|
||||
|
@ -33,11 +33,15 @@
|
||||
#include <memory.h>
|
||||
|
||||
#define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION
|
||||
|
||||
// includes for OpenGL
|
||||
#include <helper_gl.h>
|
||||
|
||||
// includes
|
||||
#include <cuda_gl_interop.h>
|
||||
#include <cuda_runtime.h>
|
||||
#include <helper_cuda.h>
|
||||
#include <helper_functions.h>
|
||||
#include <helper_gl.h>
|
||||
|
||||
#include "ParticleSystem.cuh"
|
||||
#include "ParticleSystem.h"
|
||||
|
@ -29,11 +29,15 @@
|
||||
This file contains simple wrapper functions that call the CUDA kernels
|
||||
*/
|
||||
#define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION
|
||||
|
||||
// includes for OpenGL
|
||||
#include <helper_gl.h>
|
||||
|
||||
// includes
|
||||
#include <cstdio>
|
||||
#include <cstdlib>
|
||||
#include <cuda_gl_interop.h>
|
||||
#include <helper_cuda.h>
|
||||
#include <helper_gl.h>
|
||||
#include <string.h>
|
||||
|
||||
#include "ParticleSystem.cuh"
|
||||
|
@ -1,11 +1,33 @@
|
||||
# This layer of CMakeLists.txt adds folders, for better organization in Visual Studio
|
||||
# and other IDEs that support this feature.
|
||||
|
||||
set_property(GLOBAL PROPERTY USE_FOLDERS ON)
|
||||
|
||||
set(CMAKE_FOLDER "0_Introduction")
|
||||
add_subdirectory(0_Introduction)
|
||||
|
||||
set(CMAKE_FOLDER "1_Utilities")
|
||||
add_subdirectory(1_Utilities)
|
||||
|
||||
set(CMAKE_FOLDER "2_Concepts_and_Techniques")
|
||||
add_subdirectory(2_Concepts_and_Techniques)
|
||||
|
||||
set(CMAKE_FOLDER "3_CUDA_Features")
|
||||
add_subdirectory(3_CUDA_Features)
|
||||
|
||||
set(CMAKE_FOLDER "4_CUDA_Libraries")
|
||||
add_subdirectory(4_CUDA_Libraries)
|
||||
|
||||
set(CMAKE_FOLDER "5_Domain_Specific")
|
||||
add_subdirectory(5_Domain_Specific)
|
||||
|
||||
set(CMAKE_FOLDER "6_Performance")
|
||||
add_subdirectory(6_Performance)
|
||||
|
||||
set(CMAKE_FOLDER "7_libNVVM")
|
||||
add_subdirectory(7_libNVVM)
|
||||
|
||||
if(BUILD_TEGRA)
|
||||
set(CMAKE_FOLDER "8_Platform_Specific/Tegra")
|
||||
add_subdirectory(8_Platform_Specific/Tegra)
|
||||
endif()
|
||||
|
200
run_tests.py
200
run_tests.py
@ -33,6 +33,15 @@ import json
|
||||
import subprocess
|
||||
import argparse
|
||||
from pathlib import Path
|
||||
import concurrent.futures
|
||||
import threading
|
||||
|
||||
print_lock = threading.Lock()
|
||||
|
||||
def safe_print(*args, **kwargs):
|
||||
"""Thread-safe print function"""
|
||||
with print_lock:
|
||||
print(*args, **kwargs)
|
||||
|
||||
def normalize_exe_name(name):
|
||||
"""Normalize executable name across platforms by removing .exe if present"""
|
||||
@ -78,96 +87,49 @@ def find_executables(root_dir):
|
||||
|
||||
return executables
|
||||
|
||||
def run_test(executable, output_dir, args_config, global_args=None):
|
||||
"""Run a single test and capture output"""
|
||||
def run_single_test_instance(executable, args, output_file, global_args, run_description):
|
||||
"""Run a single instance of a test executable with specific arguments."""
|
||||
exe_path = str(executable)
|
||||
exe_name = executable.name
|
||||
base_name = normalize_exe_name(exe_name)
|
||||
|
||||
# Check if this executable should be skipped
|
||||
if base_name in args_config and args_config[base_name].get("skip", False):
|
||||
print(f"Skipping {exe_name} (marked as skip in config)")
|
||||
return 0
|
||||
|
||||
# Get argument sets for this executable
|
||||
arg_sets = []
|
||||
if base_name in args_config:
|
||||
config = args_config[base_name]
|
||||
if "args" in config:
|
||||
# Single argument set (backwards compatibility)
|
||||
if isinstance(config["args"], list):
|
||||
arg_sets.append(config["args"])
|
||||
else:
|
||||
print(f"Warning: Arguments for {base_name} must be a list")
|
||||
elif "runs" in config:
|
||||
# Multiple argument sets
|
||||
for run in config["runs"]:
|
||||
if isinstance(run.get("args", []), list):
|
||||
arg_sets.append(run.get("args", []))
|
||||
else:
|
||||
print(f"Warning: Arguments for {base_name} run must be a list")
|
||||
|
||||
# If no specific args defined, run once with no args
|
||||
if not arg_sets:
|
||||
arg_sets.append([])
|
||||
|
||||
# Run for each argument set
|
||||
failed = False
|
||||
run_number = 1
|
||||
for args in arg_sets:
|
||||
# Create output file name with run number if multiple runs
|
||||
if len(arg_sets) > 1:
|
||||
output_file = os.path.abspath(f"{output_dir}/APM_{exe_name}.run{run_number}.txt")
|
||||
print(f"Running {exe_name} (run {run_number}/{len(arg_sets)})")
|
||||
else:
|
||||
output_file = os.path.abspath(f"{output_dir}/APM_{exe_name}.txt")
|
||||
print(f"Running {exe_name}")
|
||||
safe_print(f"Starting {exe_name} {run_description}")
|
||||
|
||||
try:
|
||||
# Prepare command with arguments
|
||||
cmd = [f"./{exe_name}"]
|
||||
cmd.extend(args)
|
||||
|
||||
# Add global arguments if provided
|
||||
if global_args:
|
||||
cmd.extend(global_args)
|
||||
|
||||
print(f" Command: {' '.join(cmd)}")
|
||||
safe_print(f" Command ({exe_name} {run_description}): {' '.join(cmd)}")
|
||||
|
||||
# Store current directory
|
||||
original_dir = os.getcwd()
|
||||
|
||||
try:
|
||||
# Change to executable's directory
|
||||
os.chdir(os.path.dirname(exe_path))
|
||||
|
||||
# Run the executable and capture output
|
||||
# Run the executable in its own directory using cwd
|
||||
with open(output_file, 'w') as f:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
stdout=f,
|
||||
stderr=subprocess.STDOUT,
|
||||
timeout=300 # 5 minute timeout
|
||||
timeout=300, # 5 minute timeout
|
||||
cwd=os.path.dirname(exe_path) # Execute in the executable's directory
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
failed = True
|
||||
print(f" Test completed with return code {result.returncode}")
|
||||
|
||||
finally:
|
||||
# Always restore original directory
|
||||
os.chdir(original_dir)
|
||||
status = "Passed" if result.returncode == 0 else "Failed"
|
||||
safe_print(f" Finished {exe_name} {run_description}: {status} (code {result.returncode})")
|
||||
return {"name": exe_name, "description": run_description, "return_code": result.returncode, "status": status}
|
||||
|
||||
except subprocess.TimeoutExpired:
|
||||
print(f"Error: {exe_name} timed out after 5 minutes")
|
||||
failed = True
|
||||
safe_print(f"Error ({exe_name} {run_description}): Timed out after 5 minutes")
|
||||
return {"name": exe_name, "description": run_description, "return_code": -1, "status": "Timeout"}
|
||||
except Exception as e:
|
||||
print(f"Error running {exe_name}: {str(e)}")
|
||||
failed = True
|
||||
safe_print(f"Error running {exe_name} {run_description}: {str(e)}")
|
||||
return {"name": exe_name, "description": run_description, "return_code": -1, "status": f"Error: {str(e)}"}
|
||||
|
||||
run_number += 1
|
||||
|
||||
return 1 if failed else 0
|
||||
def run_test(executable, output_dir, args_config, global_args=None):
|
||||
"""Deprecated: This function is replaced by the parallel execution logic in main."""
|
||||
# This function is no longer called directly by the main logic.
|
||||
# It remains here temporarily in case it's needed for reference or single-threaded debugging.
|
||||
# The core logic is now in run_single_test_instance and managed by ThreadPoolExecutor.
|
||||
print("Warning: run_test function called directly - this indicates an issue in the refactoring.")
|
||||
return 1 # Indicate failure if called
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Run all executables and capture output')
|
||||
@ -175,6 +137,7 @@ def main():
|
||||
parser.add_argument('--config', help='JSON configuration file for executable arguments')
|
||||
parser.add_argument('--output', default='.', # Default to current directory
|
||||
help='Output directory for test results')
|
||||
parser.add_argument('--parallel', type=int, default=1, help='Number of parallel tests to run')
|
||||
parser.add_argument('--args', nargs=argparse.REMAINDER,
|
||||
help='Global arguments to pass to all executables')
|
||||
args = parser.parse_args()
|
||||
@ -192,23 +155,104 @@ def main():
|
||||
return 1
|
||||
|
||||
print(f"Found {len(executables)} executables")
|
||||
print(f"Running tests with up to {args.parallel} parallel tasks.")
|
||||
|
||||
tasks = []
|
||||
for exe in executables:
|
||||
exe_name = exe.name
|
||||
base_name = normalize_exe_name(exe_name)
|
||||
|
||||
# Check if this executable should be skipped globally
|
||||
if base_name in args_config and args_config[base_name].get("skip", False):
|
||||
safe_print(f"Skipping {exe_name} (marked as skip in config)")
|
||||
continue
|
||||
|
||||
arg_sets_configs = []
|
||||
if base_name in args_config:
|
||||
config = args_config[base_name]
|
||||
if "args" in config:
|
||||
if isinstance(config["args"], list):
|
||||
arg_sets_configs.append({"args": config["args"]}) # Wrap in dict for consistency
|
||||
else:
|
||||
safe_print(f"Warning: Arguments for {base_name} must be a list")
|
||||
elif "runs" in config:
|
||||
for i, run_config in enumerate(config["runs"]):
|
||||
if run_config.get("skip", False):
|
||||
safe_print(f"Skipping run {i+1} for {exe_name} (marked as skip in config)")
|
||||
continue
|
||||
if isinstance(run_config.get("args", []), list):
|
||||
arg_sets_configs.append(run_config)
|
||||
else:
|
||||
safe_print(f"Warning: Arguments for {base_name} run {i+1} must be a list")
|
||||
|
||||
# If no specific args defined, create one run with no args
|
||||
if not arg_sets_configs:
|
||||
arg_sets_configs.append({"args": []})
|
||||
|
||||
# Create tasks for each run configuration
|
||||
num_runs = len(arg_sets_configs)
|
||||
for i, run_config in enumerate(arg_sets_configs):
|
||||
current_args = run_config.get("args", [])
|
||||
run_desc = f"(run {i+1}/{num_runs})" if num_runs > 1 else ""
|
||||
|
||||
# Create output file name
|
||||
if num_runs > 1:
|
||||
output_file = os.path.abspath(f"{args.output}/APM_{exe_name}.run{i+1}.txt")
|
||||
else:
|
||||
output_file = os.path.abspath(f"{args.output}/APM_{exe_name}.txt")
|
||||
|
||||
tasks.append({
|
||||
"executable": exe,
|
||||
"args": current_args,
|
||||
"output_file": output_file,
|
||||
"global_args": args.args,
|
||||
"description": run_desc
|
||||
})
|
||||
|
||||
failed = []
|
||||
for exe in executables:
|
||||
ret_code = run_test(exe, args.output, args_config, args.args)
|
||||
if ret_code != 0:
|
||||
failed.append((exe.name, ret_code))
|
||||
total_runs = len(tasks)
|
||||
completed_runs = 0
|
||||
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=args.parallel) as executor:
|
||||
future_to_task = {
|
||||
executor.submit(run_single_test_instance,
|
||||
task["executable"],
|
||||
task["args"],
|
||||
task["output_file"],
|
||||
task["global_args"],
|
||||
task["description"]): task
|
||||
for task in tasks
|
||||
}
|
||||
|
||||
for future in concurrent.futures.as_completed(future_to_task):
|
||||
task_info = future_to_task[future]
|
||||
completed_runs += 1
|
||||
safe_print(f"Progress: {completed_runs}/{total_runs} runs completed.")
|
||||
try:
|
||||
result = future.result()
|
||||
if result["return_code"] != 0:
|
||||
failed.append(result)
|
||||
except Exception as exc:
|
||||
safe_print(f'Task {task_info["executable"].name} {task_info["description"]} generated an exception: {exc}')
|
||||
failed.append({
|
||||
"name": task_info["executable"].name,
|
||||
"description": task_info["description"],
|
||||
"return_code": -1,
|
||||
"status": f"Execution Exception: {exc}"
|
||||
})
|
||||
|
||||
# Print summary
|
||||
print("\nTest Summary:")
|
||||
print(f"Ran {len(executables)} tests")
|
||||
print(f"Ran {total_runs} test runs for {len(executables)} executables.")
|
||||
if failed:
|
||||
print(f"Failed tests ({len(failed)}):")
|
||||
for name, code in failed:
|
||||
print(f" {name}: returned {code}")
|
||||
return failed[0][1] # Return first failure code
|
||||
print(f"Failed runs ({len(failed)}):")
|
||||
for fail in failed:
|
||||
print(f" {fail['name']} {fail['description']}: {fail['status']} (code {fail['return_code']})")
|
||||
# Return the return code of the first failure, or 1 if only exceptions occurred
|
||||
first_failure_code = next((f["return_code"] for f in failed if f["return_code"] != -1), 1)
|
||||
return first_failure_code
|
||||
else:
|
||||
print("All tests passed!")
|
||||
print("All test runs passed!")
|
||||
return 0
|
||||
|
||||
if __name__ == '__main__':
|
||||
|
Loading…
x
Reference in New Issue
Block a user