Rob Armstrong
320c7e6392
Resolve merge between 13.0 and 13.1 branches
2025-09-05 09:39:09 -07:00
shawnz
5987a9e9fa
Update transpose for code format check
2025-05-19 17:38:42 +08:00
Francesco Rizzi
b530f1cf42
Fix bug in 6_Performance/transpose: copy sharedmem kernel ( #363 )
...
Update kernel loop bounds handling, main loop data copy to avoid incorrect reuse of output results.
---------
Authored-by: Francesco Rizzi <francesco.rizzi@ng-analytics.com>
2025-05-05 08:43:23 -07:00
Rob Armstrong
ceab6e8bcc
Apply consistent code formatting across the repo. Add clang-format and pre-commit hooks.
2025-03-27 10:30:07 -07:00
Jonathan Bentz
efb46383e0
Transpose: Change TILE_DIM to 32 to fix bank conflicts
...
Fixes #175
2025-02-20 15:46:44 -08:00
Rutwik Choughule
2e41896e1b
add and update samples for CUDA 11.6
2022-01-13 11:35:24 +05:30