# shfl_scan - CUDA Parallel Prefix Sum with Shuffle Intrinsics (SHFL_Scan) ## Description This example demonstrates how to use the shuffle intrinsic __shfl_up_sync to perform a scan operation across a thread block. ## Key Concepts Data-Parallel Algorithms, Performance Strategies ## Supported SM Architectures ## Supported OSes Linux, Windows ## Supported CPU Architecture x86_64, ppc64le, armv7l, aarch64 ## CUDA APIs involved ### [CUDA Runtime API](http://docs.nvidia.com/cuda/cuda-runtime-api/index.html) cudaMemcpy, cudaFree, cudaMallocHost, cudaEventSynchronize, cudaEventRecord, cudaFreeHost, cudaGetDevice, cudaMemset, cudaMalloc, cudaEventElapsedTime, cudaGetDeviceProperties, cudaEventCreate ## Dependencies needed to build/run [CPP11](../../../README.md#cpp11) ## Prerequisites Download and install the [CUDA Toolkit 12.5](https://developer.nvidia.com/cuda-downloads) for your corresponding platform. Make sure the dependencies mentioned in [Dependencies]() section above are installed. ## References (for more details)