In this work “Performance of OpenMP Loop Transformations for the Acoustic Wave Stencil on GPUs”, we evaluate the performance of unroll and tiling, two loop transformations introduced in OpenMP 5.1 and early implemented in Clang 13 for GPUs. Experiments on a common seismic computational kernel demonstrate performance gains on three GPU architectures.
Authors are: Jaime Freire de Souza, LSF Machado, Edson S. Gomi, Claude Tadonki, Simon McIntosh-Smith, & Hermes Senger
More details at: https://sc22.supercomputing.