2026-05-27 21:03:57 +00:00

1.1 KiB

tileRope

Description

This sample demonstrates a Rotary Position Embedding (RoPE) forward pass using CUDA Tile C++. RoPE injects positional information into the query and key projections of an attention layer by rotating pairs of features in the head dimension by per-position angles. This implementation uses the split-half convention: for each token at position s the pair (q[i], q[i + D/2]) is rotated by theta = s * 10000^(-2i / D), so q[i]' = q[i]*cos(theta) - q[i+D/2]*sin(theta) and q[i+D/2]' = q[i]*sin(theta) + q[i+D/2]*cos(theta). The cuda::tiles::partition_view type is used to partition each (batch, position) token's Q and K tensors into 2D tiles over (heads, half_rope_dim), and a single block processes all heads for one token in parallel, writing the result back in place. A SIMT kernel is used to initialize the inputs and the cos/sin tables.

Expected Output

Success! RoPE matches expected results.

Prerequisites