Dheemanth aeab82ff30
CUDA 13.2 samples update (#432)
- Added Python samples for CUDA Python 1.0 release
- Renamed top-level `Samples` directory to `cpp` to accommodate Python samples.
2026-05-13 17:13:18 -05:00
..
2026-05-13 17:13:18 -05:00
2026-05-13 17:13:18 -05:00

Prefix Sum (Scan)

Demonstrates parallel prefix sum (scan) algorithms using cuda.compute with cuda.core stream management.

Overview

  • Inclusive scan: output[i] = [init_value] + input[0] + input[1] + ... + input[i]
  • Exclusive scan: output[i] = init_value + input[0] + input[1] + ... + input[i-1]
  • Uses cuda.compute APIs for optimized CUB-based implementations
  • Uses cuda.core APIs for device and stream management
  • Demonstrates CuPy integration via ExternalStream

Requirements

Hardware

  • NVIDIA GPU with CUDA support

Software

  • CUDA Toolkit 13.0+
  • Python 3.10+
  • cuda-python (13.0.0+)
  • cuda-core (>=0.6.0)
  • cuda-cccl (1.0.0+)
  • cupy-cuda13x (13.0.0+)
  • numpy (>=2.3.2)

Usage

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

# Install dependencies
pip install -r requirements.txt

# Run sample
python prefixSum.py

Key Concepts

Scan Type Formula First Element
Inclusive output[i] = [init_value] + Σ input[0..i] [init_value] + input[0]
Exclusive output[i] = init_value + Σ input[0..i-1] init_value (typically 0, the identity for sum)

Stream Management

This sample demonstrates proper stream usage across libraries:

# Create stream with cuda.core
stream = device.create_stream()

# Wrap for CuPy compatibility (requires int handle)
cp_stream = cp.cuda.ExternalStream(int(stream.handle))

# Use with CuPy operations
with cp_stream:
    d_input = cp.asarray(data)
    d_output = cp.empty_like(d_input)

# Pass to cuda.compute
inclusive_scan(
    d_in=d_input,
    d_out=d_output,
    op=OpKind.PLUS,
    init_value=None,
    num_items=len(d_input),
    stream=stream,
)

Applications

  • Stream compaction
  • Radix sort
  • Histogram computation
  • Polynomial evaluation