Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable GPU exection of atm_bdy_adjust_scalars_work via OpenACC #1266

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

gdicker1
Copy link
Collaborator

This PR makes small code modifications and adds OpenACC directives so the atm_bdy_adjust_scalars_work routine can execute on GPU(s).

Timing information for the OpenACC data transfers in this routine is captured in the log file by a new timer: atm_bdy_adjust_scalars [ACC_data_xfer].

Invariant fields used in this routine are also copied to the device within mpas_atm_dynamics_init and are deleted in mpas_atm_dynamics_finalize.

Small whitespace changes. Also change implicit loop to an explicit loop
to better parallelize. Implicit loops can be ported with 'acc kernels',
but we prefer more proscribed 'acc parallel ...' constructs.
This commit adds an initial port of this routine using OpenACC. More
changes are needed for performance and data management.
Ensures the fields which don't change while the model is running are
present on the device from model startup to model shutdown.
Ensure that the other, non-invariant fields are available for this
routine. Variables that are overwritten during this routine are only
created while others are copied in. Any variables overwritten by this
routine are copied out at the end.

Timing for these transfers are reported in the output log file in the
new timer: 'atm_bdy_adjust_scalars [ACC_data_xfer]'.

Also add default(present) to parallel directives to ensure data movement
is correct and prevent any implicit data movements from the compiler.
@mgduda mgduda added Atmosphere OpenACC Work related to OpenACC acceleration of code labels Jan 17, 2025
do iCell = cellSolveStart, cellSolveEnd ! threaded over cells

if ( (bdyMaskCell(iCell) > 1) .and. (bdyMaskCell(iCell) <= nRelaxZone) ) then ! relaxation zone

laplacian_filter_coef = dt_rk*(real(bdyMaskCell(iCell)) - 1.)/real(nRelaxZone)/(10.*dt*meshScalingRegionalCell(iCell))
rayleigh_damping_coef = laplacian_filter_coef/5.0
scalars_tmp(1:num_scalars,1:nVertLevels,iCell) = scalars_new(1:num_scalars,1:nVertLevels,iCell)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a newline here by style conventions?

@@ -6799,6 +6821,7 @@ subroutine atm_bdy_adjust_scalars_work( scalars_new, scalars_driving, dt, dt_rk,
cell1 = cellsOnEdge(1,iEdge)
cell2 = cellsOnEdge(2,iEdge)
!DIR$ IVDEP
!$acc loop collapse(2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the compiler will implicitly assume a vector here, but wondering if we might need to also explicitly state that here for sake of convention.

@abishekg7
Copy link
Collaborator

Other than my comments, this PR seems to be bit identical with the previous version of develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Atmosphere OpenACC Work related to OpenACC acceleration of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants