cuda-samples/Samples/4_CUDA_Libraries/conjugateGradientMultiDeviceCG/README.md

# conjugateGradientMultiDeviceCG - conjugateGradient using MultiDevice Cooperative Groups

## Description

This sample implements a conjugate gradient solver on multiple GPUs using Multi Device Cooperative Groups, also uses Unified Memory optimized using prefetching and usage hints.

## Key Concepts

Unified Memory, Linear Algebra, Cooperative Groups, MultiDevice Cooperative Groups, CUBLAS Library, CUSPARSE Library

## Supported SM Architectures

[SM 6.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 6.1 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.2 ](https://developer.nvidia.com/cuda-gpus)  [SM 7.5 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.0 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.6 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.7 ](https://developer.nvidia.com/cuda-gpus)  [SM 8.9 ](https://developer.nvidia.com/cuda-gpus)  [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)

## Supported OSes

Linux, Windows

## Supported CPU Architecture

x86_64, ppc64le, aarch64

## CUDA APIs involved

### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)
cudaHostAlloc, cudaMemPrefetchAsync, cudaFree, cudaLaunchCooperativeKernel, cudaMallocManaged, cudaSetDevice, cudaGetDeviceCount, cudaGetDeviceProperties, cudaFreeHost, cudaMemset, cudaStreamCreate, cudaStreamSynchronize, cudaDeviceEnablePeerAccess, cudaMemAdvise, cudaOccupancyMaxActiveBlocksPerMultiprocessor, cudaDeviceCanAccessPeer

## Dependencies needed to build/run
[UVM](../../../README.md#uvm), [MDCG](../../../README.md#mdcg), [CPP11](../../../README.md#cpp11)

## Prerequisites

Download and install the [CUDA Toolkit 12.3](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.
Make sure the dependencies mentioned in [Dependencies]() section above are installed.

## Build and Run

### Windows
The Windows samples are built using the Visual Studio IDE. Solution files (.sln) are provided for each supported version of Visual Studio, using the format:
```
*_vs<version>.sln - for Visual Studio <version>
```
Each individual sample has its own set of solution files in its directory:

To build/examine all the samples at once, the complete solution files should be used. To build/examine a single sample, the individual sample solution files should be used.
> **Note:** Some samples require that the Microsoft DirectX SDK (June 2010 or newer) be installed and that the VC++ directory paths are properly set up (**Tools > Options...**). Check DirectX Dependencies section for details."

### Linux
The Linux samples are built using makefiles. To use the makefiles, change the current directory to the sample directory you wish to build, and run make:
```
$ cd <sample_dir>
$ make
```
The samples makefiles can take advantage of certain options:
*  **TARGET_ARCH=<arch>** - cross-compile targeting a specific architecture. Allowed architectures are x86_64, ppc64le, aarch64.
    By default, TARGET_ARCH is set to HOST_ARCH. On a x86_64 machine, not setting TARGET_ARCH is the equivalent of setting TARGET_ARCH=x86_64.<br/>
`$ make TARGET_ARCH=x86_64` <br/> `$ make TARGET_ARCH=ppc64le` <br/> `$ make TARGET_ARCH=aarch64` <br/>
    See [here](http://docs.nvidia.com/cuda/cuda-samples/index.html#cross-samples) for more details.
*   **dbg=1** - build with debug symbols
    ```
    $ make dbg=1
    ```
*   **SMS="A B ..."** - override the SM architectures for which the sample will be built, where `"A B ..."` is a space-delimited list of SM architectures. For example, to generate SASS for SM 50 and SM 60, use `SMS="50 60"`.
    ```
    $ make SMS="50 60"
    ```

*  **HOST_COMPILER=<host_compiler>** - override the default g++ host compiler. See the [Linux Installation Guide](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements) for a list of supported host compilers.
```
    $ make HOST_COMPILER=g++
```

## References (for more details)
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00			`# conjugateGradientMultiDeviceCG - conjugateGradient using MultiDevice Cooperative Groups`

			`## Description`

			`This sample implements a conjugate gradient solver on multiple GPUs using Multi Device Cooperative Groups, also uses Unified Memory optimized using prefetching and usage hints.`

			`## Key Concepts`

add and update samples for CUDA 11.6 2022-01-13 14:05:24 +08:00			`Unified Memory, Linear Algebra, Cooperative Groups, MultiDevice Cooperative Groups, CUBLAS Library, CUSPARSE Library`
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00
			`## Supported SM Architectures`

Updating files for Ada architecture 2023-02-28 06:33:19 +08:00			[SM 6.0 ](https://developer.nvidia.com/cuda-gpus) [SM 6.1 ](https://developer.nvidia.com/cuda-gpus) [SM 7.0 ](https://developer.nvidia.com/cuda-gpus) [SM 7.2 ](https://developer.nvidia.com/cuda-gpus) [SM 7.5 ](https://developer.nvidia.com/cuda-gpus) [SM 8.0 ](https://developer.nvidia.com/cuda-gpus) [SM 8.6 ](https://developer.nvidia.com/cuda-gpus) [SM 8.7 ](https://developer.nvidia.com/cuda-gpus) [SM 8.9 ](https://developer.nvidia.com/cuda-gpus) [SM 9.0 ](https://developer.nvidia.com/cuda-gpus)
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00
			`## Supported OSes`

Add and Update samples for CUDA 10.0 2018-08-25 01:05:15 +08:00			`Linux, Windows`
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00
			`## Supported CPU Architecture`

Add and update samples for cuda 11.1 support 2020-09-16 02:15:56 +08:00			`x86_64, ppc64le, aarch64`
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00
			`## CUDA APIs involved`

			`### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html)`
Update samples for CUDA 11.8 with correct props 2022-10-15 08:43:37 +08:00			`cudaHostAlloc, cudaMemPrefetchAsync, cudaFree, cudaLaunchCooperativeKernel, cudaMallocManaged, cudaSetDevice, cudaGetDeviceCount, cudaGetDeviceProperties, cudaFreeHost, cudaMemset, cudaStreamCreate, cudaStreamSynchronize, cudaDeviceEnablePeerAccess, cudaMemAdvise, cudaOccupancyMaxActiveBlocksPerMultiprocessor, cudaDeviceCanAccessPeer`
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00
			`## Dependencies needed to build/run`
update dependency related links in README files 2022-01-27 20:28:13 +08:00			`[UVM](../../../README.md#uvm), [MDCG](../../../README.md#mdcg), [CPP11](../../../README.md#cpp11)`
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00
			`## Prerequisites`

Fixing jitlto regression, including missing cuDLA source files for bug #235, and updating changelogs 2023-11-10 00:52:00 +08:00			`Download and install the [CUDA Toolkit 12.3](https://developer.nvidia.com/cuda-downloads) for your corresponding platform.`
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00			`Make sure the dependencies mentioned in [Dependencies]() section above are installed.`

			`## Build and Run`

Add and Update samples for CUDA 10.0 2018-08-25 01:05:15 +08:00			`### Windows`
			`The Windows samples are built using the Visual Studio IDE. Solution files (.sln) are provided for each supported version of Visual Studio, using the format:`
			```
			`*_vs<version>.sln - for Visual Studio <version>`
			```
			`Each individual sample has its own set of solution files in its directory:`

			`To build/examine all the samples at once, the complete solution files should be used. To build/examine a single sample, the individual sample solution files should be used.`
			`> Note: Some samples require that the Microsoft DirectX SDK (June 2010 or newer) be installed and that the VC++ directory paths are properly set up (Tools > Options...). Check DirectX Dependencies section for details."`

Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00			`### Linux`
			`The Linux samples are built using makefiles. To use the makefiles, change the current directory to the sample directory you wish to build, and run make:`
			```
			`$ cd <sample_dir>`
			`$ make`
			```
			`The samples makefiles can take advantage of certain options:`
Add and update samples for cuda 11.1 support 2020-09-16 02:15:56 +08:00			`* TARGET_ARCH=<arch> - cross-compile targeting a specific architecture. Allowed architectures are x86_64, ppc64le, aarch64.`
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00			`By default, TARGET_ARCH is set to HOST_ARCH. On a x86_64 machine, not setting TARGET_ARCH is the equivalent of setting TARGET_ARCH=x86_64.<br/>`
Add and update samples for cuda 11.1 support 2020-09-16 02:15:56 +08:00			`$ make TARGET_ARCH=x86_64` <br/> `$ make TARGET_ARCH=ppc64le` <br/> `$ make TARGET_ARCH=aarch64` <br/>
Update samples list to include additional samples. 2018-03-10 10:05:01 +08:00			`See [here](http://docs.nvidia.com/cuda/cuda-samples/index.html#cross-samples) for more details.`
			`* dbg=1 - build with debug symbols`
			```
			`$ make dbg=1`
			```
			* SMS="A B ..." - override the SM architectures for which the sample will be built, where `"A B ..."` is a space-delimited list of SM architectures. For example, to generate SASS for SM 50 and SM 60, use `SMS="50 60"`.
			```
			`$ make SMS="50 60"`
			```

			`* HOST_COMPILER=<host_compiler> - override the default g++ host compiler. See the [Linux Installation Guide](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements) for a list of supported host compilers.`
			```
			`$ make HOST_COMPILER=g++`
			```

			`## References (for more details)`