Merge branch 'master' into cuda_a_dev

This commit is contained in:
Rob Armstrong 2025-04-09 08:33:37 -07:00
commit 278f4adbd2
24 changed files with 389 additions and 1279 deletions

View File

@ -2,6 +2,12 @@
### CUDA 12.9 ### CUDA 12.9
* Updated toolchain for cross-compilation for Tegra Linux platforms. * Updated toolchain for cross-compilation for Tegra Linux platforms.
* Repository has been updated with consistent code formatting across all samples
* Many small code tweaks and bug fixes (see commit history for details)
* Removed the following outdated samples:
* `1_Utilities`
* `bandwidthTest` - this sample was out of date and did not produce accurate results. For bandwidth
testing of NVIDIA GPU platforms, please refer to [NVBandwidth](https://github.com/NVIDIA/nvbandwidth)
### CUDA 12.8 ### CUDA 12.8
* Updated build system across the repository to CMake. Removed Visual Studio project files and Makefiles. * Updated build system across the repository to CMake. Removed Visual Studio project files and Makefiles.

103
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,103 @@
# Contributing to the CUDA Samples
Thank you for your interest in contributing to the CUDA Samples!
## Getting Started
1. **Fork & Clone the Repository**:
Fork the reporistory and clone the fork. For more information, check [GitHub's documentation on forking](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) and [cloning a repository](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository).
## Making Changes
1. **Create a New Branch**:
```bash
git checkout -b your-feature-branch
```
2. **Make Changes**.
3. **Build and Test**:
Ensure changes don't break existing functionality by building and running tests.
For more details on building and testing, refer to the [Building and Testing](#building-and-testing) section below.
4. **Commit Changes**:
```bash
git commit -m "Brief description of the change"
```
## Building and Testing
For information on building a running tests on the samples, please refer to the main [README](README.md)
## Creating a Pull Request
1. Push changes to your fork
2. Create a pull request targeting the `master` branch of the original CUDA Samples repository. Refer to [GitHub's documentation](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests) for more information on creating a pull request.
3. Describe the purpose and context of the changes in the pull request description.
## Code Formatting (pre-commit hooks)
The CUDA Samples repository uses [pre-commit](https://pre-commit.com/) to execute all code linters and formatters. These
tools ensure a consistent coding style throughout the project. Using pre-commit ensures that linter
versions and options are aligned for all developers. Additionally, there is a CI check in place to
enforce that committed code follows our standards.
The linters used by the CUDA Samples are listed in `.pre-commit-config.yaml`.
For example, C++ and CUDA code is formatted with [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html).
To use `pre-commit`, install via `conda` or `pip`:
```bash
conda config --add channels conda-forge
conda install pre-commit
```
```bash
pip install pre-commit
```
Then run pre-commit hooks before committing code:
```bash
pre-commit run
```
By default, pre-commit runs on staged files (only changes and additions that will be committed).
To run pre-commit checks on all files, execute:
```bash
pre-commit run --all-files
```
Optionally, you may set up the pre-commit hooks to run automatically when you make a git commit. This can be done by running:
```bash
pre-commit install
```
Now code linters and formatters will be run each time you commit changes.
You can skip these checks with `git commit --no-verify` or with the short version `git commit -n`, althoguh please note
that this may result in pull requests being rejected if subsequent checks fail.
## Review Process
Once submitted, maintainers will be automatically assigned to review the pull request. They might suggest changes or improvements. Constructive feedback is a part of the collaborative process, aimed at ensuring the highest quality code.
For constructive feedback and effective communication during reviews, we recommend following [Conventional Comments](https://conventionalcomments.org/).
Further recommended reading for successful PR reviews:
- [How to Do Code Reviews Like a Human (Part One)](https://mtlynch.io/human-code-reviews-1/)
- [How to Do Code Reviews Like a Human (Part Two)](https://mtlynch.io/human-code-reviews-2/)
## Thank You
Your contributions enhance the CUDA Samples for the entire community. We appreciate your effort and collaboration!

View File

@ -149,11 +149,13 @@ This Python3 script finds all executables in a subdirectory you choose, matching
the following command line arguments: the following command line arguments:
| Switch | Purpose | Example | | Switch | Purpose | Example |
| -------- | -------------------------------------------------------------------------------------------------------------- | ----------------------- | | ---------- | -------------------------------------------------------------------------------------------------------------- | ----------------------- |
| --dir | Specify the root directory to search for executables (recursively) | --dir ./build/Samples | | --dir | Specify the root directory to search for executables (recursively) | --dir ./build/Samples |
| --config | JSON configuration file for executable arguments | --config test_args.json | | --config | JSON configuration file for executable arguments | --config test_args.json |
| --output | Output directory for test results (stdout saved to .txt files - directory will be created if it doesn't exist) | --output ./test | | --output | Output directory for test results (stdout saved to .txt files - directory will be created if it doesn't exist) | --output ./test |
| --args | Global arguments to pass to all executables (not currently used) | --args arg_1 arg_2 ... | | --args | Global arguments to pass to all executables (not currently used) | --args arg_1 arg_2 ... |
| --parallel | Number of applications to execute in parallel. | --parallel 8 |
Application configurations are loaded from `test_args.json` and matched against executable names (discarding the `.exe` extension on Windows). Application configurations are loaded from `test_args.json` and matched against executable names (discarding the `.exe` extension on Windows).
@ -281,18 +283,18 @@ and system configuration):
``` ```
Test Summary: Test Summary:
Ran 181 tests Ran 199 test runs for 180 executables.
All tests passed! All test runs passed!
``` ```
If some samples fail, you will see something like this: If some samples fail, you will see something like this:
``` ```
Test Summary: Test Summary:
Ran 181 tests Ran 199 test runs for 180 executables.
Failed tests (2): Failed runs (2):
volumeFiltering: returned 1 bicubicTexture (run 1/5): Failed (code 1)
postProcessGL: returned 1 Mandelbrot (run 1/2): Failed (code 1)
``` ```
You can inspect the stdout logs in the output directory (generally `APM_<application_name>.txt` or `APM_<application_name>.run<n>.txt`) to help You can inspect the stdout logs in the output directory (generally `APM_<application_name>.txt` or `APM_<application_name>.run<n>.txt`) to help

View File

@ -99,8 +99,21 @@ static void childProcess(int id)
std::vector<void *> ptrs; std::vector<void *> ptrs;
std::vector<cudaEvent_t> events; std::vector<cudaEvent_t> events;
std::vector<char> verification_buffer(DATA_SIZE); std::vector<char> verification_buffer(DATA_SIZE);
char pidString[20] = {0};
char lshmName[40] = {0};
if (sharedMemoryOpen(shmName, sizeof(shmStruct), &info) != 0) { // Use parent process ID to create a unique shared memory name for Linux multi-process
#ifdef __linux__
pid_t pid;
pid = getppid();
snprintf(pidString, sizeof(pidString), "%d", pid);
#endif
strcat(lshmName, shmName);
strcat(lshmName, pidString);
printf("CP: lshmName = %s\n", lshmName);
if (sharedMemoryOpen(lshmName, sizeof(shmStruct), &info) != 0) {
printf("Failed to create shared memory slab\n"); printf("Failed to create shared memory slab\n");
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
} }
@ -195,10 +208,23 @@ static void parentProcess(char *app)
std::vector<void *> ptrs; std::vector<void *> ptrs;
std::vector<cudaEvent_t> events; std::vector<cudaEvent_t> events;
std::vector<Process> processes; std::vector<Process> processes;
char pidString[20] = {0};
char lshmName[40] = {0};
// Use current process ID to create a unique shared memory name for Linux multi-process
#ifdef __linux__
pid_t pid;
pid = getpid();
snprintf(pidString, sizeof(pidString), "%d", pid);
#endif
strcat(lshmName, shmName);
strcat(lshmName, pidString);
printf("PP: lshmName = %s\n", lshmName);
checkCudaErrors(cudaGetDeviceCount(&devCount)); checkCudaErrors(cudaGetDeviceCount(&devCount));
if (sharedMemoryCreate(shmName, sizeof(*shm), &info) != 0) { if (sharedMemoryCreate(lshmName, sizeof(*shm), &info) != 0) {
printf("Failed to create shared memory slab\n"); printf("Failed to create shared memory slab\n");
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
} }

View File

@ -1,4 +1,3 @@
add_subdirectory(bandwidthTest)
add_subdirectory(deviceQuery) add_subdirectory(deviceQuery)
add_subdirectory(deviceQueryDrv) add_subdirectory(deviceQueryDrv)
add_subdirectory(topologyQuery) add_subdirectory(topologyQuery)

View File

@ -1,18 +0,0 @@
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/**",
"${workspaceFolder}/../../../Common"
],
"defines": [],
"compilerPath": "/usr/local/cuda/bin/nvcc",
"cStandard": "gnu17",
"cppStandard": "gnu++14",
"intelliSenseMode": "linux-gcc-x64",
"configurationProvider": "ms-vscode.makefile-tools"
}
],
"version": 4
}

View File

@ -1,7 +0,0 @@
{
"recommendations": [
"nvidia.nsight-vscode-edition",
"ms-vscode.cpptools",
"ms-vscode.makefile-tools"
]
}

View File

@ -1,10 +0,0 @@
{
"configurations": [
{
"name": "CUDA C++: Launch",
"type": "cuda-gdb",
"request": "launch",
"program": "${workspaceFolder}/bandwidthTest"
}
]
}

View File

@ -1,15 +0,0 @@
{
"version": "2.0.0",
"tasks": [
{
"label": "sample",
"type": "shell",
"command": "make dbg=1",
"problemMatcher": ["$nvcc"],
"group": {
"kind": "build",
"isDefault": true
}
}
]
}

View File

@ -1,30 +0,0 @@
cmake_minimum_required(VERSION 3.20)
list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/../../../cmake/Modules")
project(bandwidthTest LANGUAGES C CXX CUDA)
find_package(CUDAToolkit REQUIRED)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_CUDA_ARCHITECTURES 75 80 86 87 89 90 100 101 120)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Wno-deprecated-gpu-targets")
if(ENABLE_CUDA_DEBUG)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -G") # enable cuda-gdb (may significantly affect performance on some targets)
else()
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -lineinfo") # add line information to all builds for debug tools (exclusive to -G option)
endif()
# Include directories and libraries
include_directories(../../../Common)
# Source file
# Add target for bandwidthTest
add_executable(bandwidthTest bandwidthTest.cu)
target_compile_options(bandwidthTest PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:--extended-lambda>)
target_compile_features(bandwidthTest PRIVATE cxx_std_17 cuda_std_17)
set_target_properties(bandwidthTest PROPERTIES CUDA_SEPARABLE_COMPILATION ON)

View File

@ -1,32 +0,0 @@
# bandwidthTest - Bandwidth Test
## Description
This is a simple test program to measure the memcopy bandwidth of the GPU and memcpy bandwidth across PCI-e. This test application is capable of measuring device to device copy bandwidth, host to device copy bandwidth for pageable and page-locked memory, and device to host copy bandwidth for pageable and page-locked memory.
## Key Concepts
CUDA Streams and Events, Performance Strategies
## Supported SM Architectures
[SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes
Linux, Windows
## Supported CPU Architecture
x86_64, armv7l
## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaHostAlloc, cudaMemcpy, cudaMalloc, cudaMemcpyAsync, cudaFree, cudaGetErrorString, cudaMallocHost, cudaSetDevice, cudaGetDeviceProperties, cudaDeviceSynchronize, cudaEventRecord, cudaFreeHost, cudaEventDestroy, cudaEventElapsedTime, cudaGetDeviceCount, cudaEventCreate
## Prerequisites
Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## References (for more details)

File diff suppressed because it is too large Load Diff

View File

@ -102,13 +102,23 @@ static void childProcess(int id)
int threads = 128; int threads = 128;
cudaDeviceProp prop; cudaDeviceProp prop;
std::vector<void *> ptrs; std::vector<void *> ptrs;
pid_t pid;
char pidString[20] = {0};
char lshmName[40] = {0};
std::vector<char> verification_buffer(DATA_SIZE); std::vector<char> verification_buffer(DATA_SIZE);
pid = getppid();
snprintf(pidString, sizeof(pidString), "%d", pid);
strcat(lshmName, shmName);
strcat(lshmName, pidString);
printf("CP: lshmName = %s\n", lshmName);
ipcHandle *ipcChildHandle = NULL; ipcHandle *ipcChildHandle = NULL;
checkIpcErrors(ipcOpenSocket(ipcChildHandle)); checkIpcErrors(ipcOpenSocket(ipcChildHandle));
if (sharedMemoryOpen(shmName, sizeof(shmStruct), &info) != 0) { if (sharedMemoryOpen(lshmName, sizeof(shmStruct), &info) != 0) {
printf("Failed to create shared memory slab\n"); printf("Failed to create shared memory slab\n");
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
} }
@ -245,6 +255,16 @@ static void parentProcess(char *app)
std::vector<void *> ptrs; std::vector<void *> ptrs;
std::vector<Process> processes; std::vector<Process> processes;
cudaMemAllocationHandleType handleType = cudaMemHandleTypeNone; cudaMemAllocationHandleType handleType = cudaMemHandleTypeNone;
pid_t pid;
char pidString[20] = {0};
char lshmName[40] = {0};
pid = getpid();
snprintf(pidString, sizeof(pidString), "%d", pid);
strcat(lshmName, shmName);
strcat(lshmName, pidString);
printf("PP: lshmName = %s\n", lshmName);
checkCudaErrors(cudaGetDeviceCount(&devCount)); checkCudaErrors(cudaGetDeviceCount(&devCount));
std::vector<CUdevice> devices(devCount); std::vector<CUdevice> devices(devCount);
@ -252,7 +272,7 @@ static void parentProcess(char *app)
cuDeviceGet(&devices[i], i); cuDeviceGet(&devices[i], i);
} }
if (sharedMemoryCreate(shmName, sizeof(*shm), &info) != 0) { if (sharedMemoryCreate(lshmName, sizeof(*shm), &info) != 0) {
printf("Failed to create shared memory slab\n"); printf("Failed to create shared memory slab\n");
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
} }

View File

@ -310,10 +310,24 @@ static void childProcess(int devId, int id, char **argv)
ipcHandle *ipcChildHandle = NULL; ipcHandle *ipcChildHandle = NULL;
int blocks = 0; int blocks = 0;
int threads = 128; int threads = 128;
char pidString[20] = {0};
char lshmName[40] = {0};
// Use parent process ID to create a unique shared memory name for Linux multi-process
#ifdef __linux__
pid_t pid;
pid = getppid();
snprintf(pidString, sizeof(pidString), "%d", pid);
#endif
strcat(lshmName, shmName);
strcat(lshmName, pidString);
printf("CP: lshmName = %s\n", lshmName);
checkIpcErrors(ipcOpenSocket(ipcChildHandle)); checkIpcErrors(ipcOpenSocket(ipcChildHandle));
if (sharedMemoryOpen(shmName, sizeof(shmStruct), &info) != 0) { if (sharedMemoryOpen(lshmName, sizeof(shmStruct), &info) != 0) {
printf("Failed to create shared memory slab\n"); printf("Failed to create shared memory slab\n");
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
} }
@ -421,11 +435,24 @@ static void parentProcess(char *app)
volatile shmStruct *shm = NULL; volatile shmStruct *shm = NULL;
sharedMemoryInfo info; sharedMemoryInfo info;
std::vector<Process> processes; std::vector<Process> processes;
char pidString[20] = {0};
char lshmName[40] = {0};
// Use current process ID to create a unique shared memory name for Linux multi-process
#ifdef __linux__
pid_t pid;
pid = getpid();
snprintf(pidString, sizeof(pidString), "%d", pid);
#endif
strcat(lshmName, shmName);
strcat(lshmName, pidString);
printf("PP: lshmName = %s\n", lshmName);
checkCudaErrors(cuDeviceGetCount(&devCount)); checkCudaErrors(cuDeviceGetCount(&devCount));
std::vector<CUdevice> devices(devCount); std::vector<CUdevice> devices(devCount);
if (sharedMemoryCreate(shmName, sizeof(*shm), &info) != 0) { if (sharedMemoryCreate(lshmName, sizeof(*shm), &info) != 0) {
printf("Failed to create shared memory slab\n"); printf("Failed to create shared memory slab\n");
exit(EXIT_FAILURE); exit(EXIT_FAILURE);
} }

View File

@ -25,8 +25,8 @@
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/ */
#include <cuda_runtime.h>
#include <cuda.h> #include <cuda.h>
#include <cuda_runtime.h>
#include <helper_cuda.h> #include <helper_cuda.h>
#include <helper_image.h> #include <helper_image.h>
#include <vector> #include <vector>

View File

@ -45,13 +45,15 @@
#include <windows.h> #include <windows.h>
#endif #endif
// includes for OpenGL
#include <helper_gl.h>
// includes // includes
#include <cuda_gl_interop.h> #include <cuda_gl_interop.h>
#include <cuda_runtime.h> #include <cuda_runtime.h>
#include <cufft.h> #include <cufft.h>
#include <helper_cuda.h> #include <helper_cuda.h>
#include <helper_functions.h> #include <helper_functions.h>
#include <helper_gl.h>
#include <math.h> #include <math.h>
#include <math_constants.h> #include <math_constants.h>
#include <stdio.h> #include <stdio.h>

View File

@ -86,12 +86,14 @@
#include <windows.h> #include <windows.h>
#endif #endif
// includes for OpenGL
#include <helper_gl.h>
// includes // includes
#include <cuda_gl_interop.h> #include <cuda_gl_interop.h>
#include <cuda_runtime.h> #include <cuda_runtime.h>
#include <helper_cuda.h> // includes cuda.h and cuda_runtime_api.h #include <helper_cuda.h> // includes cuda.h and cuda_runtime_api.h
#include <helper_functions.h> #include <helper_functions.h>
#include <helper_gl.h>
#include <math.h> #include <math.h>
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>

View File

@ -28,11 +28,15 @@
#include "render_particles.h" #include "render_particles.h"
#define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION #define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION
// includes for OpenGL
#include <helper_gl.h>
// includes
#include <assert.h> #include <assert.h>
#include <cuda_gl_interop.h> #include <cuda_gl_interop.h>
#include <cuda_runtime.h> #include <cuda_runtime.h>
#include <helper_cuda.h> #include <helper_cuda.h>
#include <helper_gl.h>
#include <math.h> #include <math.h>
#define GL_POINT_SPRITE_ARB 0x8861 #define GL_POINT_SPRITE_ARB 0x8861

View File

@ -31,9 +31,12 @@
#pragma warning(disable : 4312) #pragma warning(disable : 4312)
#include <mmsystem.h> // includes for Windows
#include <windows.h> #include <windows.h>
// includes for multimedia
#include <mmsystem.h>
// This header inclues all the necessary D3D11 and CUDA includes // This header inclues all the necessary D3D11 and CUDA includes
#include <cuda_d3d11_interop.h> #include <cuda_d3d11_interop.h>
#include <cuda_runtime_api.h> #include <cuda_runtime_api.h>

View File

@ -31,9 +31,12 @@
#pragma warning(disable : 4312) #pragma warning(disable : 4312)
#include <mmsystem.h> // includes for Windows
#include <windows.h> #include <windows.h>
// includes for multimedia
#include <mmsystem.h>
// This header inclues all the necessary D3D11 and CUDA includes // This header inclues all the necessary D3D11 and CUDA includes
#include <cuda_d3d11_interop.h> #include <cuda_d3d11_interop.h>
#include <cuda_runtime_api.h> #include <cuda_runtime_api.h>

View File

@ -33,11 +33,15 @@
#include <memory.h> #include <memory.h>
#define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION #define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION
// includes for OpenGL
#include <helper_gl.h>
// includes
#include <cuda_gl_interop.h> #include <cuda_gl_interop.h>
#include <cuda_runtime.h> #include <cuda_runtime.h>
#include <helper_cuda.h> #include <helper_cuda.h>
#include <helper_functions.h> #include <helper_functions.h>
#include <helper_gl.h>
#include "ParticleSystem.cuh" #include "ParticleSystem.cuh"
#include "ParticleSystem.h" #include "ParticleSystem.h"

View File

@ -29,11 +29,15 @@
This file contains simple wrapper functions that call the CUDA kernels This file contains simple wrapper functions that call the CUDA kernels
*/ */
#define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION #define HELPERGL_EXTERN_GL_FUNC_IMPLEMENTATION
// includes for OpenGL
#include <helper_gl.h>
// includes
#include <cstdio> #include <cstdio>
#include <cstdlib> #include <cstdlib>
#include <cuda_gl_interop.h> #include <cuda_gl_interop.h>
#include <helper_cuda.h> #include <helper_cuda.h>
#include <helper_gl.h>
#include <string.h> #include <string.h>
#include "ParticleSystem.cuh" #include "ParticleSystem.cuh"

View File

@ -1,11 +1,33 @@
# This layer of CMakeLists.txt adds folders, for better organization in Visual Studio
# and other IDEs that support this feature.
set_property(GLOBAL PROPERTY USE_FOLDERS ON)
set(CMAKE_FOLDER "0_Introduction")
add_subdirectory(0_Introduction) add_subdirectory(0_Introduction)
set(CMAKE_FOLDER "1_Utilities")
add_subdirectory(1_Utilities) add_subdirectory(1_Utilities)
set(CMAKE_FOLDER "2_Concepts_and_Techniques")
add_subdirectory(2_Concepts_and_Techniques) add_subdirectory(2_Concepts_and_Techniques)
set(CMAKE_FOLDER "3_CUDA_Features")
add_subdirectory(3_CUDA_Features) add_subdirectory(3_CUDA_Features)
set(CMAKE_FOLDER "4_CUDA_Libraries")
add_subdirectory(4_CUDA_Libraries) add_subdirectory(4_CUDA_Libraries)
set(CMAKE_FOLDER "5_Domain_Specific")
add_subdirectory(5_Domain_Specific) add_subdirectory(5_Domain_Specific)
set(CMAKE_FOLDER "6_Performance")
add_subdirectory(6_Performance) add_subdirectory(6_Performance)
set(CMAKE_FOLDER "7_libNVVM")
add_subdirectory(7_libNVVM) add_subdirectory(7_libNVVM)
if(BUILD_TEGRA) if(BUILD_TEGRA)
set(CMAKE_FOLDER "8_Platform_Specific/Tegra")
add_subdirectory(8_Platform_Specific/Tegra) add_subdirectory(8_Platform_Specific/Tegra)
endif() endif()

View File

@ -33,6 +33,15 @@ import json
import subprocess import subprocess
import argparse import argparse
from pathlib import Path from pathlib import Path
import concurrent.futures
import threading
print_lock = threading.Lock()
def safe_print(*args, **kwargs):
"""Thread-safe print function"""
with print_lock:
print(*args, **kwargs)
def normalize_exe_name(name): def normalize_exe_name(name):
"""Normalize executable name across platforms by removing .exe if present""" """Normalize executable name across platforms by removing .exe if present"""
@ -78,96 +87,49 @@ def find_executables(root_dir):
return executables return executables
def run_test(executable, output_dir, args_config, global_args=None): def run_single_test_instance(executable, args, output_file, global_args, run_description):
"""Run a single test and capture output""" """Run a single instance of a test executable with specific arguments."""
exe_path = str(executable) exe_path = str(executable)
exe_name = executable.name exe_name = executable.name
base_name = normalize_exe_name(exe_name)
# Check if this executable should be skipped safe_print(f"Starting {exe_name} {run_description}")
if base_name in args_config and args_config[base_name].get("skip", False):
print(f"Skipping {exe_name} (marked as skip in config)")
return 0
# Get argument sets for this executable
arg_sets = []
if base_name in args_config:
config = args_config[base_name]
if "args" in config:
# Single argument set (backwards compatibility)
if isinstance(config["args"], list):
arg_sets.append(config["args"])
else:
print(f"Warning: Arguments for {base_name} must be a list")
elif "runs" in config:
# Multiple argument sets
for run in config["runs"]:
if isinstance(run.get("args", []), list):
arg_sets.append(run.get("args", []))
else:
print(f"Warning: Arguments for {base_name} run must be a list")
# If no specific args defined, run once with no args
if not arg_sets:
arg_sets.append([])
# Run for each argument set
failed = False
run_number = 1
for args in arg_sets:
# Create output file name with run number if multiple runs
if len(arg_sets) > 1:
output_file = os.path.abspath(f"{output_dir}/APM_{exe_name}.run{run_number}.txt")
print(f"Running {exe_name} (run {run_number}/{len(arg_sets)})")
else:
output_file = os.path.abspath(f"{output_dir}/APM_{exe_name}.txt")
print(f"Running {exe_name}")
try: try:
# Prepare command with arguments
cmd = [f"./{exe_name}"] cmd = [f"./{exe_name}"]
cmd.extend(args) cmd.extend(args)
# Add global arguments if provided
if global_args: if global_args:
cmd.extend(global_args) cmd.extend(global_args)
print(f" Command: {' '.join(cmd)}") safe_print(f" Command ({exe_name} {run_description}): {' '.join(cmd)}")
# Store current directory # Run the executable in its own directory using cwd
original_dir = os.getcwd()
try:
# Change to executable's directory
os.chdir(os.path.dirname(exe_path))
# Run the executable and capture output
with open(output_file, 'w') as f: with open(output_file, 'w') as f:
result = subprocess.run( result = subprocess.run(
cmd, cmd,
stdout=f, stdout=f,
stderr=subprocess.STDOUT, stderr=subprocess.STDOUT,
timeout=300 # 5 minute timeout timeout=300, # 5 minute timeout
cwd=os.path.dirname(exe_path) # Execute in the executable's directory
) )
if result.returncode != 0: status = "Passed" if result.returncode == 0 else "Failed"
failed = True safe_print(f" Finished {exe_name} {run_description}: {status} (code {result.returncode})")
print(f" Test completed with return code {result.returncode}") return {"name": exe_name, "description": run_description, "return_code": result.returncode, "status": status}
finally:
# Always restore original directory
os.chdir(original_dir)
except subprocess.TimeoutExpired: except subprocess.TimeoutExpired:
print(f"Error: {exe_name} timed out after 5 minutes") safe_print(f"Error ({exe_name} {run_description}): Timed out after 5 minutes")
failed = True return {"name": exe_name, "description": run_description, "return_code": -1, "status": "Timeout"}
except Exception as e: except Exception as e:
print(f"Error running {exe_name}: {str(e)}") safe_print(f"Error running {exe_name} {run_description}: {str(e)}")
failed = True return {"name": exe_name, "description": run_description, "return_code": -1, "status": f"Error: {str(e)}"}
run_number += 1 def run_test(executable, output_dir, args_config, global_args=None):
"""Deprecated: This function is replaced by the parallel execution logic in main."""
return 1 if failed else 0 # This function is no longer called directly by the main logic.
# It remains here temporarily in case it's needed for reference or single-threaded debugging.
# The core logic is now in run_single_test_instance and managed by ThreadPoolExecutor.
print("Warning: run_test function called directly - this indicates an issue in the refactoring.")
return 1 # Indicate failure if called
def main(): def main():
parser = argparse.ArgumentParser(description='Run all executables and capture output') parser = argparse.ArgumentParser(description='Run all executables and capture output')
@ -175,6 +137,7 @@ def main():
parser.add_argument('--config', help='JSON configuration file for executable arguments') parser.add_argument('--config', help='JSON configuration file for executable arguments')
parser.add_argument('--output', default='.', # Default to current directory parser.add_argument('--output', default='.', # Default to current directory
help='Output directory for test results') help='Output directory for test results')
parser.add_argument('--parallel', type=int, default=1, help='Number of parallel tests to run')
parser.add_argument('--args', nargs=argparse.REMAINDER, parser.add_argument('--args', nargs=argparse.REMAINDER,
help='Global arguments to pass to all executables') help='Global arguments to pass to all executables')
args = parser.parse_args() args = parser.parse_args()
@ -192,23 +155,104 @@ def main():
return 1 return 1
print(f"Found {len(executables)} executables") print(f"Found {len(executables)} executables")
print(f"Running tests with up to {args.parallel} parallel tasks.")
tasks = []
for exe in executables:
exe_name = exe.name
base_name = normalize_exe_name(exe_name)
# Check if this executable should be skipped globally
if base_name in args_config and args_config[base_name].get("skip", False):
safe_print(f"Skipping {exe_name} (marked as skip in config)")
continue
arg_sets_configs = []
if base_name in args_config:
config = args_config[base_name]
if "args" in config:
if isinstance(config["args"], list):
arg_sets_configs.append({"args": config["args"]}) # Wrap in dict for consistency
else:
safe_print(f"Warning: Arguments for {base_name} must be a list")
elif "runs" in config:
for i, run_config in enumerate(config["runs"]):
if run_config.get("skip", False):
safe_print(f"Skipping run {i+1} for {exe_name} (marked as skip in config)")
continue
if isinstance(run_config.get("args", []), list):
arg_sets_configs.append(run_config)
else:
safe_print(f"Warning: Arguments for {base_name} run {i+1} must be a list")
# If no specific args defined, create one run with no args
if not arg_sets_configs:
arg_sets_configs.append({"args": []})
# Create tasks for each run configuration
num_runs = len(arg_sets_configs)
for i, run_config in enumerate(arg_sets_configs):
current_args = run_config.get("args", [])
run_desc = f"(run {i+1}/{num_runs})" if num_runs > 1 else ""
# Create output file name
if num_runs > 1:
output_file = os.path.abspath(f"{args.output}/APM_{exe_name}.run{i+1}.txt")
else:
output_file = os.path.abspath(f"{args.output}/APM_{exe_name}.txt")
tasks.append({
"executable": exe,
"args": current_args,
"output_file": output_file,
"global_args": args.args,
"description": run_desc
})
failed = [] failed = []
for exe in executables: total_runs = len(tasks)
ret_code = run_test(exe, args.output, args_config, args.args) completed_runs = 0
if ret_code != 0:
failed.append((exe.name, ret_code)) with concurrent.futures.ThreadPoolExecutor(max_workers=args.parallel) as executor:
future_to_task = {
executor.submit(run_single_test_instance,
task["executable"],
task["args"],
task["output_file"],
task["global_args"],
task["description"]): task
for task in tasks
}
for future in concurrent.futures.as_completed(future_to_task):
task_info = future_to_task[future]
completed_runs += 1
safe_print(f"Progress: {completed_runs}/{total_runs} runs completed.")
try:
result = future.result()
if result["return_code"] != 0:
failed.append(result)
except Exception as exc:
safe_print(f'Task {task_info["executable"].name} {task_info["description"]} generated an exception: {exc}')
failed.append({
"name": task_info["executable"].name,
"description": task_info["description"],
"return_code": -1,
"status": f"Execution Exception: {exc}"
})
# Print summary # Print summary
print("\nTest Summary:") print("\nTest Summary:")
print(f"Ran {len(executables)} tests") print(f"Ran {total_runs} test runs for {len(executables)} executables.")
if failed: if failed:
print(f"Failed tests ({len(failed)}):") print(f"Failed runs ({len(failed)}):")
for name, code in failed: for fail in failed:
print(f" {name}: returned {code}") print(f" {fail['name']} {fail['description']}: {fail['status']} (code {fail['return_code']})")
return failed[0][1] # Return first failure code # Return the return code of the first failure, or 1 if only exceptions occurred
first_failure_code = next((f["return_code"] for f in failed if f["return_code"] != -1), 1)
return first_failure_code
else: else:
print("All tests passed!") print("All test runs passed!")
return 0 return 0
if __name__ == '__main__': if __name__ == '__main__':