-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable GPU exection of atm_bdy_adjust_scalars_work via OpenACC #1266
base: develop
Are you sure you want to change the base?
Enable GPU exection of atm_bdy_adjust_scalars_work via OpenACC #1266
Conversation
Small whitespace changes. Also change implicit loop to an explicit loop to better parallelize. Implicit loops can be ported with 'acc kernels', but we prefer more proscribed 'acc parallel ...' constructs.
This commit adds an initial port of this routine using OpenACC. More changes are needed for performance and data management.
Ensures the fields which don't change while the model is running are present on the device from model startup to model shutdown.
Ensure that the other, non-invariant fields are available for this routine. Variables that are overwritten during this routine are only created while others are copied in. Any variables overwritten by this routine are copied out at the end. Timing for these transfers are reported in the output log file in the new timer: 'atm_bdy_adjust_scalars [ACC_data_xfer]'. Also add default(present) to parallel directives to ensure data movement is correct and prevent any implicit data movements from the compiler.
do iCell = cellSolveStart, cellSolveEnd ! threaded over cells | ||
|
||
if ( (bdyMaskCell(iCell) > 1) .and. (bdyMaskCell(iCell) <= nRelaxZone) ) then ! relaxation zone | ||
|
||
laplacian_filter_coef = dt_rk*(real(bdyMaskCell(iCell)) - 1.)/real(nRelaxZone)/(10.*dt*meshScalingRegionalCell(iCell)) | ||
rayleigh_damping_coef = laplacian_filter_coef/5.0 | ||
scalars_tmp(1:num_scalars,1:nVertLevels,iCell) = scalars_new(1:num_scalars,1:nVertLevels,iCell) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a newline here by style conventions?
@@ -6799,6 +6821,7 @@ subroutine atm_bdy_adjust_scalars_work( scalars_new, scalars_driving, dt, dt_rk, | |||
cell1 = cellsOnEdge(1,iEdge) | |||
cell2 = cellsOnEdge(2,iEdge) | |||
!DIR$ IVDEP | |||
!$acc loop collapse(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the compiler will implicitly assume a vector
here, but wondering if we might need to also explicitly state that here for sake of convention.
Other than my comments, this PR seems to be bit identical with the previous version of develop. |
This PR makes small code modifications and adds OpenACC directives so the atm_bdy_adjust_scalars_work routine can execute on GPU(s).
Timing information for the OpenACC data transfers in this routine is captured in the log file by a new timer: atm_bdy_adjust_scalars [ACC_data_xfer].
Invariant fields used in this routine are also copied to the device within mpas_atm_dynamics_init and are deleted in mpas_atm_dynamics_finalize.