Compare commits

...

26 Commits

Author SHA1 Message Date
Rob Nertney
9c688d7ff7 Updating samples for CUDA 12.5 2024-07-25 16:30:13 +00:00
Rob Nertney
5f97d7d0df Updating graphConditionalNodes orphan directory 2024-04-10 19:44:42 +00:00
Rob Nertney
3559ca4d08 Updating README with Confidential Computing notes 2024-03-05 21:01:35 +00:00
Rob Nertney
cd3bc1fa8e Updating samples for CUDA 12.4 2024-03-05 20:53:50 +00:00
Rob Nertney
e8568c4173 Fixing jitlto regression, including missing cuDLA source files for bug #235, and updating changelogs 2023-11-09 16:52:00 +00:00
Rob Nertney
b5c84e6996 Updating Samples for 12.3 and updating props files 2023-10-23 18:44:49 +00:00
Rob Nertney
c46754b877 Update samples for 12.3 2023-10-20 17:38:48 +00:00
Rob Nertney
03309a2d42 Changelog updates 2023-06-29 19:33:40 +00:00
Rob Nertney
5688ee0013 Removing stray cpp from master 2023-05-31 17:48:13 +00:00
Rob Nertney
8004ad59ab Fix #194 and add Large Kernel Parameters Sample 2023-05-31 04:43:22 +00:00
Rob Nertney
e612904184
Merge pull request #182 from Wenlong-Zhu/master
Fix cudaExtent.width set error.
2023-03-27 20:53:45 -07:00
Rob Nertney
81cf058e30 Updating Samples for 12.1 2023-03-01 01:41:29 +00:00
Rob Nertney
26665bf33b Fixing README 2023-02-27 22:35:39 +00:00
Rob Nertney
00bb9bc367 Updating files for Ada architecture 2023-02-27 22:33:19 +00:00
Rob Nertney
e4789153d5 Updating License Header 2023-02-09 19:02:33 +00:00
Rob Nertney
1c2efac7c8 Adding SM number for Ada Architecture 2023-02-07 19:06:53 +00:00
Rob Nertney
3d553b2ea1 Adding JIT LTO Sample 2023-02-07 19:06:38 +00:00
wenlong-zhu
9316529638 Fix cudaExtent.width set error.
unit: 4_CUDA_Libraries/cudaNvSciNvMedia/cuda_consumer.cu
Because of the change of padding size in NvSciBuf,
the cudaExtent.width and cudaExtent.height should be change

Bug 3880762
2023-02-04 00:00:44 +08:00
Rob Nertney
2b689228b7 Updating samples for 12.0 2022-12-08 20:19:55 +00:00
Rob Nertney
81992093d2 Update samples for CUDA 11.8 with correct props 2022-10-14 17:43:37 -07:00
Rutwik Choughule
b312abaa07 add check for filename in nvrtc_helper.h 2022-02-03 18:12:24 +05:30
Rutwik Choughule
8f21b899b6 update dependency related links in README files 2022-01-27 17:58:13 +05:30
Rutwik Choughule
0cbe5f2d82 update makefiles to waive unsupported samples on QNX 2022-01-27 17:57:02 +05:30
Rutwik Choughule
805e60bdfc update lib path for conda 2022-01-27 17:55:38 +05:30
Rutwik Choughule
9d4c014f60 update sample cudaNvSci 2022-01-25 17:22:31 +05:30
Rutwik Choughule
bf8c6dd043 update lib path for conda 2022-01-14 02:31:40 +05:30
1608 changed files with 318914 additions and 5517 deletions

1
.gitignore vendored Normal file
View File

@ -0,0 +1 @@
.vscode/*

View File

@ -1,5 +1,31 @@
## Changelog ## Changelog
### CUDA 12.5
### CUDA 12.4
* Added graphConditionalNodes Sample
### CUDA 12.3
* Added cuDLA samples
* Fixed jitLto regression
### CUDA 12.2
* libNVVM samples received updates
* Fixed jitLto Case issues
* Enabled HOST_COMPILER flag to the makefiles for GCC which is untested but may still work.
### CUDA 12.1
* Added new sample for Large Kernels
### CUDA 12.0
* Added new flags for JIT compiling
* Removed deprecated APIs in Hopper Architecture
### CUDA 11.6
* Added new folder structure for samples
* Added support of Visual Studio 2022 to all samples supported on [Windows](#windows-1).
* All CUDA samples are now only available on [GitHub](https://github.com/nvidia/cuda-samples). They are no longer available via CUDA toolkit.
### CUDA 11.5 ### CUDA 11.5
* Added `cuDLAHybridMode`. Demonstrate usage of cuDLA in hybrid mode. * Added `cuDLAHybridMode`. Demonstrate usage of cuDLA in hybrid mode.
* Added `cuDLAStandaloneMode`. Demonstrate usage of cuDLA in standalone mode. * Added `cuDLAStandaloneMode`. Demonstrate usage of cuDLA in standalone mode.
@ -114,4 +140,4 @@ This is the first release of CUDA Samples on GitHub:
* Added `conjugateGradientMultiBlockCG`. Demonstrates a conjugate gradient solver on GPU using Multi Block Cooperative Groups. * Added `conjugateGradientMultiBlockCG`. Demonstrates a conjugate gradient solver on GPU using Multi Block Cooperative Groups.
* Added `conjugateGradientMultiDeviceCG`. Demonstrates a conjugate gradient solver on multiple GPUs using Multi Device Cooperative Groups, also uses unified memory prefetching and usage hints APIs. * Added `conjugateGradientMultiDeviceCG`. Demonstrates a conjugate gradient solver on multiple GPUs using Multi Device Cooperative Groups, also uses unified memory prefetching and usage hints APIs.
* Added `simpleCUBLAS`. Demonstrates how perform GEMM operations using CUBLAS library. * Added `simpleCUBLAS`. Demonstrates how perform GEMM operations using CUBLAS library.
* Added `simpleCUFFT`. Demonstrates how perform FFT operations using CUFFT library. * Added `simpleCUFFT`. Demonstrates how perform FFT operations using CUFFT library.

View File

@ -666,6 +666,8 @@ inline int _ConvertSMVer2Cores(int major, int minor) {
{0x80, 64}, {0x80, 64},
{0x86, 128}, {0x86, 128},
{0x87, 128}, {0x87, 128},
{0x89, 128},
{0x90, 128},
{-1, -1}}; {-1, -1}};
int index = 0; int index = 0;
@ -712,6 +714,9 @@ inline const char* _ConvertSMVer2ArchName(int major, int minor) {
{0x75, "Turing"}, {0x75, "Turing"},
{0x80, "Ampere"}, {0x80, "Ampere"},
{0x86, "Ampere"}, {0x86, "Ampere"},
{0x87, "Ampere"},
{0x89, "Ada"},
{0x90, "Hopper"},
{-1, "Graphics Device"}}; {-1, "Graphics Device"}};
int index = 0; int index = 0;

View File

@ -114,6 +114,8 @@ inline int _ConvertSMVer2CoresDRV(int major, int minor) {
{0x80, 64}, {0x80, 64},
{0x86, 128}, {0x86, 128},
{0x87, 128}, {0x87, 128},
{0x89, 128},
{0x90, 128},
{-1, -1}}; {-1, -1}};
int index = 0; int index = 0;

View File

@ -168,7 +168,7 @@ int waitProcess(Process *process) {
#endif #endif
} }
#if defined(__linux__) #if defined(__linux__) || defined(__QNX__)
int ipcCreateSocket(ipcHandle *&handle, const char *name, int ipcCreateSocket(ipcHandle *&handle, const char *name,
const std::vector<Process> &processes) { const std::vector<Process> &processes) {
int server_fd; int server_fd;
@ -262,41 +262,48 @@ int ipcRecvShareableHandle(ipcHandle *handle, ShareableHandle *shHandle) {
// Union to guarantee alignment requirements for control array // Union to guarantee alignment requirements for control array
union { union {
struct cmsghdr cm; struct cmsghdr cm;
char control[CMSG_SPACE(sizeof(int))]; // This will not work on QNX as QNX CMSG_SPACE calls __cmsg_alignbytes
// And __cmsg_alignbytes is a runtime function instead of compile-time macros
// char control[CMSG_SPACE(sizeof(int))]
char* control;
} control_un; } control_un;
size_t sizeof_control = CMSG_SPACE(sizeof(int)) * sizeof(char);
control_un.control = (char*) malloc(sizeof_control);
struct cmsghdr *cmptr; struct cmsghdr *cmptr;
ssize_t n; ssize_t n;
int receivedfd; int receivedfd;
char dummy_buffer[1]; char dummy_buffer[1];
ssize_t sendResult; ssize_t sendResult;
msg.msg_control = control_un.control; msg.msg_control = control_un.control;
msg.msg_controllen = sizeof(control_un.control); msg.msg_controllen = sizeof_control;
iov[0].iov_base = (void *)dummy_buffer; iov[0].iov_base = (void *)dummy_buffer;
iov[0].iov_len = sizeof(dummy_buffer); iov[0].iov_len = sizeof(dummy_buffer);
msg.msg_iov = iov; msg.msg_iov = iov;
msg.msg_iovlen = 1; msg.msg_iovlen = 1;
if ((n = recvmsg(handle->socket, &msg, 0)) <= 0) { if ((n = recvmsg(handle->socket, &msg, 0)) <= 0) {
perror("IPC failure: Receiving data over socket failed"); perror("IPC failure: Receiving data over socket failed");
free(control_un.control);
return -1; return -1;
} }
if (((cmptr = CMSG_FIRSTHDR(&msg)) != NULL) && if (((cmptr = CMSG_FIRSTHDR(&msg)) != NULL) &&
(cmptr->cmsg_len == CMSG_LEN(sizeof(int)))) { (cmptr->cmsg_len == CMSG_LEN(sizeof(int)))) {
if ((cmptr->cmsg_level != SOL_SOCKET) || (cmptr->cmsg_type != SCM_RIGHTS)) { if ((cmptr->cmsg_level != SOL_SOCKET) || (cmptr->cmsg_type != SCM_RIGHTS)) {
free(control_un.control);
return -1; return -1;
} }
memmove(&receivedfd, CMSG_DATA(cmptr), sizeof(receivedfd)); memmove(&receivedfd, CMSG_DATA(cmptr), sizeof(receivedfd));
*(int *)shHandle = receivedfd; *(int *)shHandle = receivedfd;
} else { } else {
free(control_un.control);
return -1; return -1;
} }
free(control_un.control);
return 0; return 0;
} }
@ -340,9 +347,12 @@ int ipcSendShareableHandle(ipcHandle *handle,
union { union {
struct cmsghdr cm; struct cmsghdr cm;
char control[CMSG_SPACE(sizeof(int))]; char* control;
} control_un; } control_un;
size_t sizeof_control = CMSG_SPACE(sizeof(int)) * sizeof(char);
control_un.control = (char*) malloc(sizeof_control);
struct cmsghdr *cmptr; struct cmsghdr *cmptr;
ssize_t readResult; ssize_t readResult;
struct sockaddr_un cliaddr; struct sockaddr_un cliaddr;
@ -360,7 +370,7 @@ int ipcSendShareableHandle(ipcHandle *handle,
int sendfd = (int)shareableHandles[data]; int sendfd = (int)shareableHandles[data];
msg.msg_control = control_un.control; msg.msg_control = control_un.control;
msg.msg_controllen = sizeof(control_un.control); msg.msg_controllen = sizeof_control;
cmptr = CMSG_FIRSTHDR(&msg); cmptr = CMSG_FIRSTHDR(&msg);
cmptr->cmsg_len = CMSG_LEN(sizeof(int)); cmptr->cmsg_len = CMSG_LEN(sizeof(int));
@ -380,9 +390,11 @@ int ipcSendShareableHandle(ipcHandle *handle,
ssize_t sendResult = sendmsg(handle->socket, &msg, 0); ssize_t sendResult = sendmsg(handle->socket, &msg, 0);
if (sendResult <= 0) { if (sendResult <= 0) {
perror("IPC failure: Sending data over socket failed"); perror("IPC failure: Sending data over socket failed");
free(control_un.control);
return -1; return -1;
} }
free(control_un.control);
return 0; return 0;
} }

View File

@ -84,7 +84,7 @@ int waitProcess(Process *process);
#define checkIpcErrors(ipcFuncResult) \ #define checkIpcErrors(ipcFuncResult) \
if (ipcFuncResult == -1) { fprintf(stderr, "Failure at %u %s\n", __LINE__, __FILE__); exit(EXIT_FAILURE); } if (ipcFuncResult == -1) { fprintf(stderr, "Failure at %u %s\n", __LINE__, __FILE__); exit(EXIT_FAILURE); }
#if defined(__linux__) #if defined(__linux__) || defined(__QNX__)
struct ipcHandle_st { struct ipcHandle_st {
int socket; int socket;
char *socketName; char *socketName;

View File

@ -421,6 +421,7 @@ inline char *sdkFindFilePath(const char *filename,
} }
// File not found // File not found
printf("\nerror: sdkFindFilePath: file <%s> not found!\n", filename);
return 0; return 0;
} }

View File

@ -49,6 +49,11 @@
void compileFileToCUBIN(char *filename, int argc, char **argv, char **cubinResult, void compileFileToCUBIN(char *filename, int argc, char **argv, char **cubinResult,
size_t *cubinResultSize, int requiresCGheaders) { size_t *cubinResultSize, int requiresCGheaders) {
if (!filename) {
std::cerr << "\nerror: filename is empty for compileFileToCUBIN()!\n";
exit(1);
}
std::ifstream inputFile(filename, std::ifstream inputFile(filename,
std::ios::in | std::ios::binary | std::ios::ate); std::ios::in | std::ios::binary | std::ios::ate);
@ -111,7 +116,12 @@ void compileFileToCUBIN(char *filename, int argc, char **argv, char **cubinResul
compileOptions = "--include-path="; compileOptions = "--include-path=";
std::string path = sdkFindFilePath(HeaderNames, argv[0]); char *strPath = sdkFindFilePath(HeaderNames, argv[0]);
if (!strPath) {
std::cerr << "\nerror: header file " << HeaderNames << " not found!\n";
exit(1);
}
std::string path = strPath;
if (!path.empty()) { if (!path.empty()) {
std::size_t found = path.find(HeaderNames); std::size_t found = path.find(HeaderNames);
path.erase(found); path.erase(found);
@ -120,6 +130,7 @@ void compileFileToCUBIN(char *filename, int argc, char **argv, char **cubinResul
"\nCooperativeGroups headers not found, please install it in %s " "\nCooperativeGroups headers not found, please install it in %s "
"sample directory..\n Exiting..\n", "sample directory..\n Exiting..\n",
argv[0]); argv[0]);
exit(1);
} }
compileOptions += path.c_str(); compileOptions += path.c_str();
compileParams[numCompileOptions] = reinterpret_cast<char *>( compileParams[numCompileOptions] = reinterpret_cast<char *>(

View File

@ -1,15 +1,12 @@
# CUDA Samples # CUDA Samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit. This version supports [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads). Samples for CUDA Developers which demonstrates features in CUDA Toolkit. This version supports [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads).
## Release Notes ## Release Notes
This section describes the release notes for the CUDA Samples on GitHub only. This section describes the release notes for the CUDA Samples on GitHub only.
### CUDA 11.6 ### CUDA 12.5
* Added new folder structure for samples
* Added support of Visual Studio 2022 to all samples supported on [Windows](#windows-1).
* All CUDA samples are now only available on [GitHub](https://github.com/nvidia/cuda-samples). They are no longer available via CUDA toolkit.
### [older versions...](./CHANGELOG.md) ### [older versions...](./CHANGELOG.md)
@ -17,7 +14,7 @@ This section describes the release notes for the CUDA Samples on GitHub only.
### Prerequisites ### Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
For system requirements and installation instructions of cuda toolkit, please refer to the [Linux Installation Guide](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/), and the [Windows Installation Guide](http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html). For system requirements and installation instructions of cuda toolkit, please refer to the [Linux Installation Guide](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/), and the [Windows Installation Guide](http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html).
### Getting the CUDA Samples ### Getting the CUDA Samples
@ -92,6 +89,9 @@ Samples that are specific to domain (Graphics, Finance, Image Processing).
### [6. Performance](./Samples/6_Performance/README.md) ### [6. Performance](./Samples/6_Performance/README.md)
Samples that demonstrate performance optimization. Samples that demonstrate performance optimization.
### [7. libNVVM](./Samples/7_libNVVM/README.md)
Samples that demonstrate the use of libNVVVM and NVVM IR.
## Dependencies ## Dependencies
Some CUDA Samples rely on third-party applications and/or libraries, or features provided by the CUDA Toolkit and Driver, to either build or execute. These dependencies are listed below. Some CUDA Samples rely on third-party applications and/or libraries, or features provided by the CUDA Toolkit and Driver, to either build or execute. These dependencies are listed below.
@ -246,6 +246,10 @@ FP16 is a 16-bit floating-point format. One bit is used for the sign, five bits
NVCC support of [C++11 features](https://en.wikipedia.org/wiki/C++11). NVCC support of [C++11 features](https://en.wikipedia.org/wiki/C++11).
#### CMake
The libNVVM samples are built using [CMake](https://cmake.org/) 3.10 or later.
## Contributors Guide ## Contributors Guide
We welcome your input on issues and suggestions for samples. At this time we are not accepting contributions from the public, check back here as we evolve our contribution model. We welcome your input on issues and suggestions for samples. At this time we are not accepting contributions from the public, check back here as we evolve our contribution model.
@ -263,4 +267,4 @@ Answers to frequently asked questions about CUDA can be found at http://develope
## Attributions ## Attributions
* Teapot image is obtained from [Wikimedia](https://en.wikipedia.org/wiki/File:Original_Utah_Teapot.jpg) and is licensed under the Creative Commons [Attribution-Share Alike 2.0](https://creativecommons.org/licenses/by-sa/2.0/deed.en) Generic license. The image is modified for samples use cases. * Teapot image is obtained from [Wikimedia](https://en.wikipedia.org/wiki/File:Original_Utah_Teapot.jpg) and is licensed under the Creative Commons [Attribution-Share Alike 2.0](https://creativecommons.org/licenses/by-sa/2.0/deed.en) Generic license. The image is modified for samples use cases.

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -318,9 +335,9 @@ endif
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -6,11 +6,11 @@
<toolkit>cudaStreamDestroy</toolkit> <toolkit>cudaStreamDestroy</toolkit>
<toolkit>cudaFree</toolkit> <toolkit>cudaFree</toolkit>
<toolkit>cudaMallocManaged</toolkit> <toolkit>cudaMallocManaged</toolkit>
<toolkit>cudaStreamCreate</toolkit>
<toolkit>cudaDeviceSynchronize</toolkit>
<toolkit>cudaStreamAttachMemAsync</toolkit> <toolkit>cudaStreamAttachMemAsync</toolkit>
<toolkit>cudaSetDevice</toolkit> <toolkit>cudaSetDevice</toolkit>
<toolkit>cudaDeviceSynchronize</toolkit>
<toolkit>cudaStreamSynchronize</toolkit> <toolkit>cudaStreamSynchronize</toolkit>
<toolkit>cudaStreamCreate</toolkit>
<toolkit>cudaGetDeviceProperties</toolkit> <toolkit>cudaGetDeviceProperties</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This sample demonstrates the use of OpenMP and streams with Unified Memory on a single GPU.]]></description> <description><![CDATA[This sample demonstrates the use of OpenMP and streams with Unified Memory on a single GPU.]]></description>
@ -57,19 +57,6 @@
<scope>1:CUDA Systems Integration</scope> <scope>1:CUDA Systems Integration</scope>
<scope>1:Unified Memory</scope> <scope>1:Unified Memory</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch>
<sm-arch>sm60</sm-arch>
<sm-arch>sm61</sm-arch>
<sm-arch>sm70</sm-arch>
<sm-arch>sm72</sm-arch>
<sm-arch>sm75</sm-arch>
<sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,8 +10,6 @@ CUDA Systems Integration, OpenMP, CUBLAS, Multithreading, Unified Memory, CUDA S
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
Linux, Windows Linux, Windows
@ -23,14 +21,14 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaStreamDestroy, cudaFree, cudaMallocManaged, cudaStreamCreate, cudaDeviceSynchronize, cudaStreamAttachMemAsync, cudaSetDevice, cudaStreamSynchronize, cudaGetDeviceProperties cudaStreamDestroy, cudaFree, cudaMallocManaged, cudaStreamAttachMemAsync, cudaSetDevice, cudaDeviceSynchronize, cudaStreamSynchronize, cudaStreamCreate, cudaGetDeviceProperties
## Dependencies needed to build/run ## Dependencies needed to build/run
[OpenMP](../../README.md#openmp), [UVM](../../README.md#uvm), [CUBLAS](../../README.md#cublas) [OpenMP](../../../README.md#openmp), [UVM](../../../README.md#uvm), [CUBLAS](../../../README.md#cublas)
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed. Make sure the dependencies mentioned in [Dependencies]() section above are installed.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/UnifiedMemoryStreams.exe</OutputFile> <OutputFile>$(OutDir)/UnifiedMemoryStreams.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -108,6 +108,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/UnifiedMemoryStreams.exe</OutputFile> <OutputFile>$(OutDir)/UnifiedMemoryStreams.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -104,6 +104,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/UnifiedMemoryStreams.exe</OutputFile> <OutputFile>$(OutDir)/UnifiedMemoryStreams.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -104,6 +104,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -279,9 +296,9 @@ LIBRARIES :=
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -3,21 +3,21 @@
<entry> <entry>
<name>asyncAPI</name> <name>asyncAPI</name>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaMemset</toolkit> <toolkit>cudaProfilerStop</toolkit>
<toolkit>cudaMalloc</toolkit>
<toolkit>cudaMemcpyAsync</toolkit>
<toolkit>cudaFree</toolkit> <toolkit>cudaFree</toolkit>
<toolkit>cudaEventRecord</toolkit>
<toolkit>cudaMallocHost</toolkit> <toolkit>cudaMallocHost</toolkit>
<toolkit>cudaProfilerStart</toolkit> <toolkit>cudaProfilerStart</toolkit>
<toolkit>cudaEventCreate</toolkit>
<toolkit>cudaEventElapsedTime</toolkit>
<toolkit>cudaDeviceSynchronize</toolkit> <toolkit>cudaDeviceSynchronize</toolkit>
<toolkit>cudaEventRecord</toolkit>
<toolkit>cudaFreeHost</toolkit> <toolkit>cudaFreeHost</toolkit>
<toolkit>cudaMalloc</toolkit> <toolkit>cudaMemset</toolkit>
<toolkit>cudaEventQuery</toolkit>
<toolkit>cudaProfilerStop</toolkit>
<toolkit>cudaEventDestroy</toolkit> <toolkit>cudaEventDestroy</toolkit>
<toolkit>cudaMemcpyAsync</toolkit> <toolkit>cudaEventQuery</toolkit>
<toolkit>cudaEventElapsedTime</toolkit>
<toolkit>cudaGetDeviceProperties</toolkit> <toolkit>cudaGetDeviceProperties</toolkit>
<toolkit>cudaEventCreate</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.]]></description> <description><![CDATA[This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -46,8 +46,6 @@
<scope>1:CUDA Basic Topics</scope> <scope>1:CUDA Basic Topics</scope>
<scope>1:Performance Strategies</scope> <scope>1:Performance Strategies</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch> <sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch> <sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch> <sm-arch>sm53</sm-arch>
@ -59,6 +57,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ Asynchronous Data Transfers, CUDA Streams and Events
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaMemset, cudaFree, cudaEventRecord, cudaMallocHost, cudaProfilerStart, cudaEventCreate, cudaEventElapsedTime, cudaDeviceSynchronize, cudaFreeHost, cudaMalloc, cudaEventQuery, cudaProfilerStop, cudaEventDestroy, cudaMemcpyAsync, cudaGetDeviceProperties cudaProfilerStop, cudaMalloc, cudaMemcpyAsync, cudaFree, cudaMallocHost, cudaProfilerStart, cudaDeviceSynchronize, cudaEventRecord, cudaFreeHost, cudaMemset, cudaEventDestroy, cudaEventQuery, cudaEventElapsedTime, cudaGetDeviceProperties, cudaEventCreate
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/asyncAPI.exe</OutputFile> <OutputFile>$(OutDir)/asyncAPI.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/asyncAPI.exe</OutputFile> <OutputFile>$(OutDir)/asyncAPI.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/asyncAPI.exe</OutputFile> <OutputFile>$(OutDir)/asyncAPI.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -299,20 +316,23 @@ ifeq ($(TARGET_OS),linux)
#$(warning $(GCCVERSION)) #$(warning $(GCCVERSION))
IS_MIN_VERSION := $(shell expr `echo $(GCCVERSION)` \>= 47000) IS_MIN_VERSION := $(shell expr `echo $(GCCVERSION)` \>= 47000)
ifneq ($(CUSTOM_HOST_COMPILER), 1)
ifeq ($(IS_MIN_VERSION), 1) ifeq ($(IS_MIN_VERSION), 1)
$(info >>> GCC Version is greater or equal to 4.7.0 <<<) $(info >>> GCC Version is greater or equal to 4.7.0 <<<)
else else
$(info >>> Waiving build. Minimum GCC version required is 4.7.0<<<) $(info >>> Waiving build. Minimum GCC version required is 4.7.0<<<)
SAMPLE_ENABLED := 0 SAMPLE_ENABLED := 0
endif endif
else
$(warning >>> Custom HOST_COMPILER set; skipping GCC version check. This may lead to unintended behavior. Please note the minimum equivalent GCC version is 4.7.0 <<<)
endif
endif endif
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)
@ -363,7 +383,6 @@ run: build
$(EXEC) ./c++11_cuda $(EXEC) ./c++11_cuda
testrun: build testrun: build
$(EXEC) ./c++11_cuda --dummy-test-param
clean: clean:
rm -f c++11_cuda c++11_cuda.o rm -f c++11_cuda c++11_cuda.o

View File

@ -7,9 +7,9 @@
</cflags> </cflags>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaMalloc</toolkit> <toolkit>cudaMalloc</toolkit>
<toolkit>cudaMemcpy</toolkit>
<toolkit>cudaMemset</toolkit> <toolkit>cudaMemset</toolkit>
<toolkit>cudaFree</toolkit> <toolkit>cudaFree</toolkit>
<toolkit>cudaMemcpy</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This sample demonstrates C++11 feature support in CUDA. It scans a input text file and prints no. of occurrences of x, y, z, w characters. ]]></description> <description><![CDATA[This sample demonstrates C++11 feature support in CUDA. It scans a input text file and prints no. of occurrences of x, y, z, w characters. ]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -31,9 +31,6 @@
</librarypaths> </librarypaths>
<nsight_eclipse>true</nsight_eclipse> <nsight_eclipse>true</nsight_eclipse>
<primary_file>c++11_cuda.cu</primary_file> <primary_file>c++11_cuda.cu</primary_file>
<qatests>
<qatest>--dummy-test-param</qatest>
</qatests>
<required_dependencies> <required_dependencies>
<dependency>CPP11</dependency> <dependency>CPP11</dependency>
</required_dependencies> </required_dependencies>
@ -41,8 +38,6 @@
<scope>1:CUDA Advanced Topics</scope> <scope>1:CUDA Advanced Topics</scope>
<scope>1:C++11 CUDA</scope> <scope>1:C++11 CUDA</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch> <sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch> <sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch> <sm-arch>sm53</sm-arch>
@ -54,6 +49,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ CPP11 CUDA
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,14 +23,14 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaMalloc, cudaMemset, cudaFree, cudaMemcpy cudaMalloc, cudaMemcpy, cudaMemset, cudaFree
## Dependencies needed to build/run ## Dependencies needed to build/run
[CPP11](../../README.md#cpp11) [CPP11](../../../README.md#cpp11)
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed. Make sure the dependencies mentioned in [Dependencies]() section above are installed.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/c++11_cuda.exe</OutputFile> <OutputFile>$(OutDir)/c++11_cuda.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/c++11_cuda.exe</OutputFile> <OutputFile>$(OutDir)/c++11_cuda.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/c++11_cuda.exe</OutputFile> <OutputFile>$(OutDir)/c++11_cuda.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -279,9 +296,9 @@ LIBRARIES :=
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -4,8 +4,8 @@
<name>clock</name> <name>clock</name>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaMalloc</toolkit> <toolkit>cudaMalloc</toolkit>
<toolkit>cudaFree</toolkit>
<toolkit>cudaMemcpy</toolkit> <toolkit>cudaMemcpy</toolkit>
<toolkit>cudaFree</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This example shows how to use the clock function to measure the performance of block of threads of a kernel accurately.]]></description> <description><![CDATA[This example shows how to use the clock function to measure the performance of block of threads of a kernel accurately.]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -34,8 +34,6 @@
<scope>1:CUDA Basic Topics</scope> <scope>1:CUDA Basic Topics</scope>
<scope>1:Performance Strategies</scope> <scope>1:Performance Strategies</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch> <sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch> <sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch> <sm-arch>sm53</sm-arch>
@ -47,6 +45,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ Performance Strategies
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaMalloc, cudaFree, cudaMemcpy cudaMalloc, cudaMemcpy, cudaFree
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/clock.exe</OutputFile> <OutputFile>$(OutDir)/clock.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/clock.exe</OutputFile> <OutputFile>$(OutDir)/clock.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/clock.exe</OutputFile> <OutputFile>$(OutDir)/clock.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)

View File

@ -10,7 +10,7 @@ Performance Strategies, Runtime Compilation
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,17 +23,17 @@ x86_64, ppc64le, aarch64
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Driver API](http://docs.nvidia.com/cuda/cuda-driver-api/index.html) ### [CUDA Driver API](http://docs.nvidia.com/cuda/cuda-driver-api/index.html)
cuModuleGetFunction, cuMemAlloc, cuLaunchKernel, cuCtxSynchronize, cuMemFree, cuMemcpyDtoH, cuMemcpyHtoD cuMemcpyDtoH, cuLaunchKernel, cuMemcpyHtoD, cuCtxSynchronize, cuMemAlloc, cuMemFree, cuModuleGetFunction
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaBlockSize, cudaGridSize cudaBlockSize, cudaGridSize
## Dependencies needed to build/run ## Dependencies needed to build/run
[NVRTC](../../README.md#nvrtc) [NVRTC](../../../README.md#nvrtc)
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed. Make sure the dependencies mentioned in [Dependencies]() section above are installed.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -279,9 +296,9 @@ LIBRARIES :=
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -3,22 +3,22 @@
<entry> <entry>
<name>concurrentKernels</name> <name>concurrentKernels</name>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaStreamWaitEvent</toolkit>
<toolkit>cudaStreamDestroy</toolkit> <toolkit>cudaStreamDestroy</toolkit>
<toolkit>cudaFree</toolkit>
<toolkit>cudaEventRecord</toolkit>
<toolkit>cudaMallocHost</toolkit>
<toolkit>cudaStreamCreate</toolkit>
<toolkit>cudaEventCreate</toolkit>
<toolkit>cudaEventElapsedTime</toolkit>
<toolkit>cudaEventSynchronize</toolkit>
<toolkit>cudaFreeHost</toolkit>
<toolkit>cudaMalloc</toolkit> <toolkit>cudaMalloc</toolkit>
<toolkit>cudaEventCreateWithFlags</toolkit>
<toolkit>cudaEventDestroy</toolkit>
<toolkit>cudaMemcpyAsync</toolkit> <toolkit>cudaMemcpyAsync</toolkit>
<toolkit>cudaGetDeviceProperties</toolkit> <toolkit>cudaFree</toolkit>
<toolkit>cudaMallocHost</toolkit>
<toolkit>cudaEventCreateWithFlags</toolkit>
<toolkit>cudaEventSynchronize</toolkit>
<toolkit>cudaEventRecord</toolkit>
<toolkit>cudaFreeHost</toolkit>
<toolkit>cudaGetDevice</toolkit> <toolkit>cudaGetDevice</toolkit>
<toolkit>cudaStreamWaitEvent</toolkit>
<toolkit>cudaEventDestroy</toolkit>
<toolkit>cudaEventElapsedTime</toolkit>
<toolkit>cudaStreamCreate</toolkit>
<toolkit>cudaGetDeviceProperties</toolkit>
<toolkit>cudaEventCreate</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This sample demonstrates the use of CUDA streams for concurrent execution of several kernels on GPU device. It also illustrates how to introduce dependencies between CUDA streams with the new cudaStreamWaitEvent function.]]></description> <description><![CDATA[This sample demonstrates the use of CUDA streams for concurrent execution of several kernels on GPU device. It also illustrates how to introduce dependencies between CUDA streams with the new cudaStreamWaitEvent function.]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -44,8 +44,6 @@
<scope>1:CUDA Advanced Topics</scope> <scope>1:CUDA Advanced Topics</scope>
<scope>1:Performance Strategies</scope> <scope>1:Performance Strategies</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch> <sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch> <sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch> <sm-arch>sm53</sm-arch>
@ -57,6 +55,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ Performance Strategies
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaStreamWaitEvent, cudaStreamDestroy, cudaFree, cudaEventRecord, cudaMallocHost, cudaStreamCreate, cudaEventCreate, cudaEventElapsedTime, cudaEventSynchronize, cudaFreeHost, cudaMalloc, cudaEventCreateWithFlags, cudaEventDestroy, cudaMemcpyAsync, cudaGetDeviceProperties, cudaGetDevice cudaStreamDestroy, cudaMalloc, cudaMemcpyAsync, cudaFree, cudaMallocHost, cudaEventCreateWithFlags, cudaEventSynchronize, cudaEventRecord, cudaFreeHost, cudaGetDevice, cudaStreamWaitEvent, cudaEventDestroy, cudaEventElapsedTime, cudaStreamCreate, cudaGetDeviceProperties, cudaEventCreate
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/concurrentKernels.exe</OutputFile> <OutputFile>$(OutDir)/concurrentKernels.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/concurrentKernels.exe</OutputFile> <OutputFile>$(OutDir)/concurrentKernels.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/concurrentKernels.exe</OutputFile> <OutputFile>$(OutDir)/concurrentKernels.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -279,9 +296,9 @@ LIBRARIES :=
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -4,8 +4,8 @@
<name>cppIntegration</name> <name>cppIntegration</name>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaMalloc</toolkit> <toolkit>cudaMalloc</toolkit>
<toolkit>cudaFree</toolkit>
<toolkit>cudaMemcpy</toolkit> <toolkit>cudaMemcpy</toolkit>
<toolkit>cudaFree</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This example demonstrates how to integrate CUDA into an existing C++ application, i.e. the CUDA entry point on host side is only a function which is called from C++ code and only the file containing this function is compiled with nvcc. It also demonstrates that vector types can be used from cpp.]]></description> <description><![CDATA[This example demonstrates how to integrate CUDA into an existing C++ application, i.e. the CUDA entry point on host side is only a function which is called from C++ code and only the file containing this function is compiled with nvcc. It also demonstrates that vector types can be used from cpp.]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -28,8 +28,6 @@
<scopes> <scopes>
<scope>1:CUDA Basic Topics</scope> <scope>1:CUDA Basic Topics</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch> <sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch> <sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch> <sm-arch>sm53</sm-arch>
@ -41,6 +39,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ CPP-CUDA Integration
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaMalloc, cudaFree, cudaMemcpy cudaMalloc, cudaMemcpy, cudaFree
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/cppIntegration.exe</OutputFile> <OutputFile>$(OutDir)/cppIntegration.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -109,6 +109,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/cppIntegration.exe</OutputFile> <OutputFile>$(OutDir)/cppIntegration.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -105,6 +105,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/cppIntegration.exe</OutputFile> <OutputFile>$(OutDir)/cppIntegration.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -105,6 +105,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -279,9 +296,9 @@ LIBRARIES :=
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -3,17 +3,17 @@
<entry> <entry>
<name>cppOverload</name> <name>cppOverload</name>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaFree</toolkit> <toolkit>cudaMemcpy</toolkit>
<toolkit>cudaFuncSetCacheConfig</toolkit> <toolkit>cudaFuncSetCacheConfig</toolkit>
<toolkit>cudaFree</toolkit>
<toolkit>cudaMallocHost</toolkit> <toolkit>cudaMallocHost</toolkit>
<toolkit>cudaFuncGetAttributes</toolkit> <toolkit>cudaSetDevice</toolkit>
<toolkit>cudaGetDeviceCount</toolkit> <toolkit>cudaGetDeviceProperties</toolkit>
<toolkit>cudaDeviceSynchronize</toolkit> <toolkit>cudaDeviceSynchronize</toolkit>
<toolkit>cudaFreeHost</toolkit> <toolkit>cudaFreeHost</toolkit>
<toolkit>cudaMalloc</toolkit> <toolkit>cudaMalloc</toolkit>
<toolkit>cudaSetDevice</toolkit> <toolkit>cudaFuncGetAttributes</toolkit>
<toolkit>cudaMemcpy</toolkit> <toolkit>cudaGetDeviceCount</toolkit>
<toolkit>cudaGetDeviceProperties</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This sample demonstrates how to use C++ function overloading on the GPU.]]></description> <description><![CDATA[This sample demonstrates how to use C++ function overloading on the GPU.]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -39,8 +39,6 @@
<scope>1:CUDA Basic Topics</scope> <scope>1:CUDA Basic Topics</scope>
<scope>1:Performance Strategies</scope> <scope>1:Performance Strategies</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch> <sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch> <sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch> <sm-arch>sm53</sm-arch>
@ -52,6 +50,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ C++ Function Overloading, CUDA Streams and Events
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaFree, cudaFuncSetCacheConfig, cudaMallocHost, cudaFuncGetAttributes, cudaGetDeviceCount, cudaDeviceSynchronize, cudaFreeHost, cudaMalloc, cudaSetDevice, cudaMemcpy, cudaGetDeviceProperties cudaMemcpy, cudaFuncSetCacheConfig, cudaFree, cudaMallocHost, cudaSetDevice, cudaGetDeviceProperties, cudaDeviceSynchronize, cudaFreeHost, cudaMalloc, cudaFuncGetAttributes, cudaGetDeviceCount
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/cppOverload.exe</OutputFile> <OutputFile>$(OutDir)/cppOverload.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/cppOverload.exe</OutputFile> <OutputFile>$(OutDir)/cppOverload.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/cppOverload.exe</OutputFile> <OutputFile>$(OutDir)/cppOverload.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -321,9 +338,9 @@ endif
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -10,7 +10,7 @@ CUDA Systems Integration, OpenMP, Multithreading
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,14 +23,14 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaMemset, cudaFree, cudaGetDeviceCount, cudaSetDevice, cudaMalloc, cudaGetLastError, cudaMemcpy, cudaGetErrorString, cudaGetDeviceProperties, cudaGetDevice cudaMemcpy, cudaGetErrorString, cudaFree, cudaGetLastError, cudaSetDevice, cudaGetDeviceCount, cudaGetDevice, cudaMemset, cudaMalloc, cudaGetDeviceProperties
## Dependencies needed to build/run ## Dependencies needed to build/run
[OpenMP](../../README.md#openmp) [OpenMP](../../../README.md#openmp)
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed. Make sure the dependencies mentioned in [Dependencies]() section above are installed.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/cudaOpenMP.exe</OutputFile> <OutputFile>$(OutDir)/cudaOpenMP.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -108,6 +108,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/cudaOpenMP.exe</OutputFile> <OutputFile>$(OutDir)/cudaOpenMP.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -104,6 +104,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/cudaOpenMP.exe</OutputFile> <OutputFile>$(OutDir)/cudaOpenMP.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -104,6 +104,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -285,9 +302,9 @@ LIBRARIES :=
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 60 61 70 75 80 86 SMS ?= 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -3,11 +3,11 @@
<entry> <entry>
<name>fp16ScalarProduct</name> <name>fp16ScalarProduct</name>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaMemcpy</toolkit>
<toolkit>cudaFree</toolkit> <toolkit>cudaFree</toolkit>
<toolkit>cudaMallocHost</toolkit> <toolkit>cudaMallocHost</toolkit>
<toolkit>cudaFreeHost</toolkit> <toolkit>cudaFreeHost</toolkit>
<toolkit>cudaMalloc</toolkit> <toolkit>cudaMalloc</toolkit>
<toolkit>cudaMemcpy</toolkit>
<toolkit>cudaGetDeviceProperties</toolkit> <toolkit>cudaGetDeviceProperties</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[Calculates scalar product of two vectors of FP16 numbers.]]></description> <description><![CDATA[Calculates scalar product of two vectors of FP16 numbers.]]></description>
@ -44,6 +44,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>arm</arch> <arch>arm</arch>

View File

@ -10,7 +10,7 @@ CUDA Runtime API
## Supported SM Architectures ## Supported SM Architectures
[SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,14 +23,14 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaFree, cudaMallocHost, cudaFreeHost, cudaMalloc, cudaMemcpy, cudaGetDeviceProperties cudaMemcpy, cudaFree, cudaMallocHost, cudaFreeHost, cudaMalloc, cudaGetDeviceProperties
## Dependencies needed to build/run ## Dependencies needed to build/run
[FP16](../../README.md#fp16) [FP16](../../../README.md#fp16)
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed. Make sure the dependencies mentioned in [Dependencies]() section above are installed.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/fp16ScalarProduct.exe</OutputFile> <OutputFile>$(OutDir)/fp16ScalarProduct.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/fp16ScalarProduct.exe</OutputFile> <OutputFile>$(OutDir)/fp16ScalarProduct.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/fp16ScalarProduct.exe</OutputFile> <OutputFile>$(OutDir)/fp16ScalarProduct.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -279,9 +296,9 @@ LIBRARIES :=
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -3,20 +3,20 @@
<entry> <entry>
<name>matrixMul</name> <name>matrixMul</name>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaStreamCreateWithFlags</toolkit>
<toolkit>cudaProfilerStop</toolkit>
<toolkit>cudaMalloc</toolkit>
<toolkit>cudaFree</toolkit> <toolkit>cudaFree</toolkit>
<toolkit>cudaEventRecord</toolkit>
<toolkit>cudaMallocHost</toolkit> <toolkit>cudaMallocHost</toolkit>
<toolkit>cudaProfilerStart</toolkit> <toolkit>cudaProfilerStart</toolkit>
<toolkit>cudaEventCreate</toolkit>
<toolkit>cudaEventElapsedTime</toolkit>
<toolkit>cudaEventSynchronize</toolkit> <toolkit>cudaEventSynchronize</toolkit>
<toolkit>cudaEventRecord</toolkit>
<toolkit>cudaFreeHost</toolkit> <toolkit>cudaFreeHost</toolkit>
<toolkit>cudaMalloc</toolkit>
<toolkit>cudaProfilerStop</toolkit>
<toolkit>cudaStreamCreateWithFlags</toolkit>
<toolkit>cudaEventDestroy</toolkit>
<toolkit>cudaStreamSynchronize</toolkit> <toolkit>cudaStreamSynchronize</toolkit>
<toolkit>cudaEventDestroy</toolkit>
<toolkit>cudaEventElapsedTime</toolkit>
<toolkit>cudaMemcpyAsync</toolkit> <toolkit>cudaMemcpyAsync</toolkit>
<toolkit>cudaEventCreate</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. To illustrate GPU performance for matrix multiply, this sample also shows how to use the new CUDA 4.0 interface for CUBLAS to demonstrate high-performance performance for matrix multiplication.]]></description> <description><![CDATA[This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various CUDA programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. To illustrate GPU performance for matrix multiply, this sample also shows how to use the new CUDA 4.0 interface for CUBLAS to demonstrate high-performance performance for matrix multiplication.]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -43,8 +43,6 @@
<scope>1:CUDA Basic Topics</scope> <scope>1:CUDA Basic Topics</scope>
<scope>3:Linear Algebra</scope> <scope>3:Linear Algebra</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch> <sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch> <sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch> <sm-arch>sm53</sm-arch>
@ -56,6 +54,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ CUDA Runtime API, Linear Algebra
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l, aarch64
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaFree, cudaEventRecord, cudaMallocHost, cudaProfilerStart, cudaEventCreate, cudaEventElapsedTime, cudaEventSynchronize, cudaFreeHost, cudaMalloc, cudaProfilerStop, cudaStreamCreateWithFlags, cudaEventDestroy, cudaStreamSynchronize, cudaMemcpyAsync cudaStreamCreateWithFlags, cudaProfilerStop, cudaMalloc, cudaFree, cudaMallocHost, cudaProfilerStart, cudaEventSynchronize, cudaEventRecord, cudaFreeHost, cudaStreamSynchronize, cudaEventDestroy, cudaEventElapsedTime, cudaMemcpyAsync, cudaEventCreate
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/matrixMul.exe</OutputFile> <OutputFile>$(OutDir)/matrixMul.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/matrixMul.exe</OutputFile> <OutputFile>$(OutDir)/matrixMul.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/matrixMul.exe</OutputFile> <OutputFile>$(OutDir)/matrixMul.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -283,9 +300,9 @@ FATBIN_FILE := matrixMul_kernel${TARGET_SIZE}.fatbin
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -10,7 +10,7 @@ CUDA Driver API, Matrix Multiply
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l, aarch64
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Driver API](http://docs.nvidia.com/cuda/cuda-driver-api/index.html) ### [CUDA Driver API](http://docs.nvidia.com/cuda/cuda-driver-api/index.html)
cuModuleGetFunction, cuMemcpyHtoD, cuModuleLoadData, cuCtxCreate, cuLaunchKernel, cuDeviceGetName, cuMemAlloc, cuOccupancyMaxPotentialBlockSize, cuDeviceTotalMem, cuMemFree, cuMemcpyDtoH, cuCtxDestroy, cuDeviceGetAttribute cuMemcpyDtoH, cuLaunchKernel, cuMemcpyHtoD, cuDeviceGetName, cuDeviceTotalMem, cuDeviceGetAttribute, cuModuleLoadData, cuOccupancyMaxPotentialBlockSize, cuMemAlloc, cuMemFree, cuCtxDestroy, cuModuleGetFunction, cuCtxCreate
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/matrixMulDrv.exe</OutputFile> <OutputFile>$(OutDir)/matrixMulDrv.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -111,6 +111,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/matrixMulDrv.exe</OutputFile> <OutputFile>$(OutDir)/matrixMulDrv.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/matrixMulDrv.exe</OutputFile> <OutputFile>$(OutDir)/matrixMulDrv.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -287,8 +304,8 @@ ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
# Generate PTX code from SM 53 # Generate PTX code from SM 53
GENCODE_FLAGS += -gencode arch=compute_53,code=compute_53 GENCODE_FLAGS += -gencode arch=compute_53,code=compute_53
else else
# Generate PTX code from SM 35 # Generate PTX code from SM 50
GENCODE_FLAGS += -gencode arch=compute_35,code=compute_35 GENCODE_FLAGS += -gencode arch=compute_50,code=compute_50
endif endif
endif endif

View File

@ -10,7 +10,7 @@ CUDA Driver API, CUDA Dynamically Linked Library
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Driver API](http://docs.nvidia.com/cuda/cuda-driver-api/index.html) ### [CUDA Driver API](http://docs.nvidia.com/cuda/cuda-driver-api/index.html)
cuParamSetv, cuMemFree, cuInit, cuModuleGetFunction, cuCtxDestroy, cuCtxCreate, cuDeviceGetName, cuCtxSynchronize, cuParamSeti, cuModuleLoadDataEx, cuDeviceGet, cuFuncSetSharedSize, cuMemAlloc, cuDeviceComputeCapability, cuFuncSetBlockShape, cuMemcpyHtoD, cuParamSetSize, cuLaunchGrid, cuDeviceGetCount, cuLaunchKernel, cuMemcpyDtoH cuMemcpyDtoH, cuDeviceGetName, cuParamSeti, cuModuleLoadDataEx, cuModuleGetFunction, cuLaunchGrid, cuFuncSetSharedSize, cuMemFree, cuParamSetSize, cuParamSetv, cuInit, cuMemcpyHtoD, cuLaunchKernel, cuDeviceGet, cuFuncSetBlockShape, cuCtxDestroy, cuDeviceGetCount, cuDeviceComputeCapability, cuCtxSynchronize, cuMemAlloc, cuCtxCreate
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -95,6 +95,7 @@ inline int _ConvertSMVer2CoresDRV(int major, int minor) {
{0x80, 64}, {0x80, 64},
{0x86, 128}, {0x86, 128},
{0x87, 128}, {0x87, 128},
{0x90, 128},
{-1, -1}}; {-1, -1}};
int index = 0; int index = 0;

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/matrixMulDynlinkJIT.exe</OutputFile> <OutputFile>$(OutDir)/matrixMulDynlinkJIT.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,compute_35;</CodeGeneration> <CodeGeneration>compute_50,compute_50;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -116,6 +116,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/matrixMulDynlinkJIT.exe</OutputFile> <OutputFile>$(OutDir)/matrixMulDynlinkJIT.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,compute_35;</CodeGeneration> <CodeGeneration>compute_50,compute_50;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -112,6 +112,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/matrixMulDynlinkJIT.exe</OutputFile> <OutputFile>$(OutDir)/matrixMulDynlinkJIT.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,compute_35;</CodeGeneration> <CodeGeneration>compute_50,compute_50;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -112,6 +112,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)

View File

@ -10,7 +10,7 @@ CUDA Runtime API, Linear Algebra, Runtime Compilation
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,14 +23,14 @@ x86_64, ppc64le, aarch64
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Driver API](http://docs.nvidia.com/cuda/cuda-driver-api/index.html) ### [CUDA Driver API](http://docs.nvidia.com/cuda/cuda-driver-api/index.html)
cuModuleGetFunction, cuMemAlloc, cuLaunchKernel, cuCtxSynchronize, cuMemFree, cuMemcpyDtoH, cuMemcpyHtoD cuMemcpyDtoH, cuLaunchKernel, cuMemcpyHtoD, cuCtxSynchronize, cuMemAlloc, cuMemFree, cuModuleGetFunction
## Dependencies needed to build/run ## Dependencies needed to build/run
[NVRTC](../../README.md#nvrtc) [NVRTC](../../../README.md#nvrtc)
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed. Make sure the dependencies mentioned in [Dependencies]() section above are installed.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -113,6 +113,6 @@ xcopy /y /e /s "$(CudaToolkitDir)include\cooperative_groups" .\cooperative_group
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -109,6 +109,6 @@ xcopy /y /e /s "$(CudaToolkitDir)include\cooperative_groups" .\cooperative_group
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -109,6 +109,6 @@ xcopy /y /e /s "$(CudaToolkitDir)include\cooperative_groups" .\cooperative_group
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -279,9 +296,9 @@ LIBRARIES :=
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 53 61 70 72 75 80 86 87 SMS ?= 53 61 70 72 75 80 86 87 90
else else
SMS ?= 35 37 50 52 60 61 70 75 80 86 SMS ?= 50 52 60 61 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -4,9 +4,9 @@
<name>mergeSort</name> <name>mergeSort</name>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaMalloc</toolkit> <toolkit>cudaMalloc</toolkit>
<toolkit>cudaFree</toolkit>
<toolkit>cudaDeviceSynchronize</toolkit> <toolkit>cudaDeviceSynchronize</toolkit>
<toolkit>cudaMemcpy</toolkit> <toolkit>cudaMemcpy</toolkit>
<toolkit>cudaFree</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[This sample implements a merge sort (also known as Batcher's sort), algorithms belonging to the class of sorting networks. While generally subefficient on large sequences compared to algorithms with better asymptotic algorithmic complexity (i.e. merge sort or radix sort), may be the algorithms of choice for sorting batches of short- to mid-sized (key, value) array pairs. Refer to the excellent tutorial by H. W. Lang http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/networks/indexen.htm]]></description> <description><![CDATA[This sample implements a merge sort (also known as Batcher's sort), algorithms belonging to the class of sorting networks. While generally subefficient on large sequences compared to algorithms with better asymptotic algorithmic complexity (i.e. merge sort or radix sort), may be the algorithms of choice for sorting batches of short- to mid-sized (key, value) array pairs. Refer to the excellent tutorial by H. W. Lang http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/networks/indexen.htm]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -33,8 +33,6 @@
<scope>1:CUDA Advanced Topics</scope> <scope>1:CUDA Advanced Topics</scope>
<scope>1:Data-Parallel Algorithms</scope> <scope>1:Data-Parallel Algorithms</scope>
</scopes> </scopes>
<sm-arch>sm35</sm-arch>
<sm-arch>sm37</sm-arch>
<sm-arch>sm50</sm-arch> <sm-arch>sm50</sm-arch>
<sm-arch>sm52</sm-arch> <sm-arch>sm52</sm-arch>
<sm-arch>sm53</sm-arch> <sm-arch>sm53</sm-arch>
@ -46,6 +44,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ Data-Parallel Algorithms
## Supported SM Architectures ## Supported SM Architectures
[SM 3.5 ](https://developer.nvidia.com/cuda-gpus) [SM 3.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 5.0 ](https://developer.nvidia.com/cuda-gpus) [SM 5.2 ](https://developer.nvidia.com/cuda-gpus) [SM 5.3 ](https://developer.nvidia.com/cuda-gpus) [SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,11 +23,11 @@ x86_64, ppc64le, armv7l
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaMalloc, cudaFree, cudaDeviceSynchronize, cudaMemcpy cudaMalloc, cudaDeviceSynchronize, cudaMemcpy, cudaFree
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/mergeSort.exe</OutputFile> <OutputFile>$(OutDir)/mergeSort.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -111,6 +111,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/mergeSort.exe</OutputFile> <OutputFile>$(OutDir)/mergeSort.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/mergeSort.exe</OutputFile> <OutputFile>$(OutDir)/mergeSort.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_35,sm_35;compute_37,sm_37;compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_50,sm_50;compute_52,sm_52;compute_60,sm_60;compute_61,sm_61;compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -113,6 +113,10 @@ ifeq (,$(filter $(TARGET_OS),linux darwin qnx android))
endif endif
# host compiler # host compiler
ifdef HOST_COMPILER
CUSTOM_HOST_COMPILER = 1
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1) ifeq ($(shell expr `xcodebuild -version | grep -i xcode | awk '{print $$2}' | cut -d'.' -f1` \>= 5),1)
HOST_COMPILER ?= clang++ HOST_COMPILER ?= clang++
@ -165,6 +169,19 @@ CCFLAGS :=
LDFLAGS := LDFLAGS :=
# build flags # build flags
# Link flag for customized HOST_COMPILER with gcc realpath
GCC_PATH := $(shell which gcc)
ifeq ($(CUSTOM_HOST_COMPILER),1)
ifneq ($(filter /%,$(HOST_COMPILER)),)
ifneq ($(findstring gcc,$(HOST_COMPILER)),)
ifneq ($(GCC_PATH),$(HOST_COMPILER))
LDFLAGS += -lstdc++
endif
endif
endif
endif
ifeq ($(TARGET_OS),darwin) ifeq ($(TARGET_OS),darwin)
LDFLAGS += -rpath $(CUDA_PATH)/lib LDFLAGS += -rpath $(CUDA_PATH)/lib
CCFLAGS += -arch $(HOST_ARCH) CCFLAGS += -arch $(HOST_ARCH)
@ -305,20 +322,23 @@ ifeq ($(TARGET_OS),linux)
#$(warning $(GCCVERSION)) #$(warning $(GCCVERSION))
IS_MIN_VERSION := $(shell expr `echo $(GCCVERSION)` \>= 51000) IS_MIN_VERSION := $(shell expr `echo $(GCCVERSION)` \>= 51000)
ifneq ($(CUSTOM_HOST_COMPILER), 1)
ifeq ($(IS_MIN_VERSION), 1) ifeq ($(IS_MIN_VERSION), 1)
$(info >>> GCC Version is greater or equal to 5.1.0 <<<) $(info >>> GCC Version is greater or equal to 5.1.0 <<<)
else else
$(info >>> Waiving build. Minimum GCC version required is 5.1.0<<<) $(info >>> Waiving build. Minimum GCC version required is 5.1.0<<<)
SAMPLE_ENABLED := 0 SAMPLE_ENABLED := 0
endif endif
else
$(warning >>> Custom HOST_COMPILER set; skipping GCC version check. This may lead to unintended behavior. Please note the minimum equivalent GCC version is 5.1.0 <<<)
endif
endif endif
# Gencode arguments # Gencode arguments
ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa)) ifeq ($(TARGET_ARCH),$(filter $(TARGET_ARCH),armv7l aarch64 sbsa))
SMS ?= 70 72 75 80 86 87 SMS ?= 70 72 75 80 86 87 90
else else
SMS ?= 70 75 80 86 SMS ?= 70 75 80 86 89 90
endif endif
ifeq ($(SMS),) ifeq ($(SMS),)

View File

@ -6,17 +6,17 @@
<flag>--std=c++11</flag> <flag>--std=c++11</flag>
</cflags> </cflags>
<cuda_api_list> <cuda_api_list>
<toolkit>cudaFree</toolkit>
<toolkit>cudaMallocHost</toolkit>
<toolkit>cudaOccupancyMaxActiveBlocksPerMultiprocessor</toolkit>
<toolkit>cudaOccupancyMaxPotentialBlockSize</toolkit>
<toolkit>cudaDeviceGetAttribute</toolkit>
<toolkit>cudaFreeHost</toolkit>
<toolkit>cudaMalloc</toolkit>
<toolkit>cudaStreamCreateWithFlags</toolkit> <toolkit>cudaStreamCreateWithFlags</toolkit>
<toolkit>cudaLaunchCooperativeKernel</toolkit> <toolkit>cudaFree</toolkit>
<toolkit>cudaDeviceGetAttribute</toolkit>
<toolkit>cudaMallocHost</toolkit>
<toolkit>cudaFreeHost</toolkit>
<toolkit>cudaStreamSynchronize</toolkit> <toolkit>cudaStreamSynchronize</toolkit>
<toolkit>cudaLaunchCooperativeKernel</toolkit>
<toolkit>cudaMalloc</toolkit>
<toolkit>cudaOccupancyMaxActiveBlocksPerMultiprocessor</toolkit>
<toolkit>cudaMemcpyAsync</toolkit> <toolkit>cudaMemcpyAsync</toolkit>
<toolkit>cudaOccupancyMaxPotentialBlockSize</toolkit>
</cuda_api_list> </cuda_api_list>
<description><![CDATA[A simple demonstration of arrive wait barriers.]]></description> <description><![CDATA[A simple demonstration of arrive wait barriers.]]></description>
<devicecompilation>whole</devicecompilation> <devicecompilation>whole</devicecompilation>
@ -53,6 +53,8 @@
<sm-arch>sm80</sm-arch> <sm-arch>sm80</sm-arch>
<sm-arch>sm86</sm-arch> <sm-arch>sm86</sm-arch>
<sm-arch>sm87</sm-arch> <sm-arch>sm87</sm-arch>
<sm-arch>sm89</sm-arch>
<sm-arch>sm90</sm-arch>
<supported_envs> <supported_envs>
<env> <env>
<arch>x86_64</arch> <arch>x86_64</arch>

View File

@ -10,7 +10,7 @@ Arrive Wait Barrier
## Supported SM Architectures ## Supported SM Architectures
[SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
## Supported OSes ## Supported OSes
@ -23,14 +23,14 @@ x86_64, ppc64le, armv7l, aarch64
## CUDA APIs involved ## CUDA APIs involved
### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaFree, cudaMallocHost, cudaOccupancyMaxActiveBlocksPerMultiprocessor, cudaOccupancyMaxPotentialBlockSize, cudaDeviceGetAttribute, cudaFreeHost, cudaMalloc, cudaStreamCreateWithFlags, cudaLaunchCooperativeKernel, cudaStreamSynchronize, cudaMemcpyAsync cudaStreamCreateWithFlags, cudaFree, cudaDeviceGetAttribute, cudaMallocHost, cudaFreeHost, cudaStreamSynchronize, cudaLaunchCooperativeKernel, cudaMalloc, cudaOccupancyMaxActiveBlocksPerMultiprocessor, cudaMemcpyAsync, cudaOccupancyMaxPotentialBlockSize
## Dependencies needed to build/run ## Dependencies needed to build/run
[CPP11](../../README.md#cpp11), [MBCG](../../README.md#mbcg) [CPP11](../../../README.md#cpp11), [MBCG](../../../README.md#mbcg)
## Prerequisites ## Prerequisites
Download and install the [CUDA Toolkit 11.6](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed. Make sure the dependencies mentioned in [Dependencies]() section above are installed.
## Build and Run ## Build and Run

View File

@ -38,7 +38,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -67,7 +67,7 @@
<OutputFile>$(OutDir)/simpleAWBarrier.exe</OutputFile> <OutputFile>$(OutDir)/simpleAWBarrier.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -107,6 +107,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

View File

@ -34,7 +34,7 @@
</PropertyGroup> </PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings"> <ImportGroup Label="ExtensionSettings">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.props" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.props" />
</ImportGroup> </ImportGroup>
<ImportGroup Label="PropertySheets"> <ImportGroup Label="PropertySheets">
<Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" /> <Import Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" />
@ -63,7 +63,7 @@
<OutputFile>$(OutDir)/simpleAWBarrier.exe</OutputFile> <OutputFile>$(OutDir)/simpleAWBarrier.exe</OutputFile>
</Link> </Link>
<CudaCompile> <CudaCompile>
<CodeGeneration>compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;</CodeGeneration> <CodeGeneration>compute_70,sm_70;compute_75,sm_75;compute_80,sm_80;compute_86,sm_86;compute_89,sm_89;compute_90,sm_90;</CodeGeneration>
<AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions> <AdditionalOptions>-Xcompiler "/wd 4819" --threads 0 </AdditionalOptions>
<Include>./;../../../Common</Include> <Include>./;../../../Common</Include>
<Defines>WIN32</Defines> <Defines>WIN32</Defines>
@ -103,6 +103,6 @@
</ItemGroup> </ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" /> <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets"> <ImportGroup Label="ExtensionTargets">
<Import Project="$(CUDAPropsPath)\CUDA 11.6.targets" /> <Import Project="$(CUDAPropsPath)\CUDA 12.5.targets" />
</ImportGroup> </ImportGroup>
</Project> </Project>

Some files were not shown because too many files have changed in this diff Show More