This is the release of the CUDA 13.3 samples, which include additions for CUDA Tile C++, and updated CCCL and Python samples.
binarySearch (Python)
Description
This sample demonstrates the parallel binary-search algorithms
exposed by cuda.compute (from the cuda-cccl package). Given
a sorted d_data array and a batch of d_values to locate, one
device-wide call returns the insertion index for every value:
cuda.compute.lower_boundwrites, for each value, the lowest index where it could be inserted intod_datawithout breaking the sort order. Equivalent tonumpy.searchsorted(..., side="left").cuda.compute.upper_boundis the analogous upper form, equivalent tonumpy.searchsorted(..., side="right").
The sample runs both algorithms on two curated inputs: one with
distinct elements (where lower_bound and upper_bound agree on
any value not in the data) and one with duplicates (where they
diverge on present values). Results are verified against
numpy.searchsorted.
What You'll Learn
- How to call
cuda.compute.lower_bound/upper_boundwith CuPy arrays - The semantic difference between
lower_boundandupper_bound, especially for inputs containing duplicates - How the output dtype (
np.uintp) is used for indices
Key Libraries
cuda.compute(from thecuda-ccclpackage) - device algorithmscuda.core- device setupcupy- device buffersnumpy- host-side reference vianumpy.searchsorted
Key APIs
From cuda.compute
cuda.compute.lower_bound(d_data, num_items, d_values, num_values, d_out)cuda.compute.upper_bound(d_data, num_items, d_values, num_values, d_out)
From cuda_samples_utils
print_gpu_info()- print device name and compute capability
Requirements
Hardware
- NVIDIA GPU with Compute Capability 7.0 or higher
- Minimum GPU memory: 512 MB
Software
- CUDA Toolkit 13.0 or newer
- Python 3.10 or newer
cuda-cccl(>=1.0.0)cuda-core(>=1.0.0)cupy-cuda13x(>=14.0.0)
If the CUDA toolkit is not on your PATH, set CUDA_HOME so that
cuda.compute's JIT path can locate its dependencies:
export CUDA_HOME=/usr/local/cuda
Installation
Install the required packages from requirements.txt:
cd /path/to/cuda-samples/python/2_CoreConcepts/binarySearch
pip install -r requirements.txt
The requirements.txt installs:
cuda-cccl(>=1.0.0) - ships thecuda.computemodulecuda-core(>=1.0.0)cupy-cuda13x(>=14.0.0)numpy(>=1.24.0)
How to Run
Basic usage
cd cuda-samples/python/2_CoreConcepts/binarySearch
python binarySearch.py
With custom parameters
python binarySearch.py --device 1
Expected Output
Device: <Your GPU Name>
Compute Capability: <X.Y>
Case 1: distinct data, mixed queries
data = [1, 3, 5, 7, 9]
values = [0, 3, 4, 10]
lower_bound: got [0, 1, 2, 5] expected [0, 1, 2, 5] OK
upper_bound: got [0, 2, 2, 5] expected [0, 2, 2, 5] OK
Case 2: duplicates in data
data = [1, 3, 3, 5, 7, 9]
values = [3, 3, 5, 8]
lower_bound: got [1, 1, 3, 5] expected [1, 1, 3, 5] OK
upper_bound: got [3, 3, 4, 5] expected [3, 3, 4, 5] OK
Done
Note: Device name and compute capability will vary based on your GPU.
Files
binarySearch.py- Python implementationREADME.md- This filerequirements.txt- Sample dependencies../../Utilities/cuda_samples_utils.py- Common utilities (imported by this sample)