4 Commits

Author SHA1 Message Date
Francesco Rizzi
b530f1cf42
Fix bug in 6_Performance/transpose: copy sharedmem kernel (#363)
Update kernel loop bounds handling, main loop data copy to avoid incorrect reuse of output results.

---------

Authored-by: Francesco Rizzi <francesco.rizzi@ng-analytics.com>
2025-05-05 08:43:23 -07:00
Rob Armstrong
ceab6e8bcc Apply consistent code formatting across the repo. Add clang-format and pre-commit hooks. 2025-03-27 10:30:07 -07:00
Jonathan Bentz
efb46383e0
Transpose: Change TILE_DIM to 32 to fix bank conflicts
Fixes #175
2025-02-20 15:46:44 -08:00
Rutwik Choughule
2e41896e1b add and update samples for CUDA 11.6 2022-01-13 11:35:24 +05:30