Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement offload instructions via macros #80

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

awnawab
Copy link
Collaborator

@awnawab awnawab commented Jan 28, 2025

A draft PR that represents a proof of concept of the development I proposed in #73.

The offload instructions in the code are replaced with macros, which are defined in python modules. More of these python modules can be added to support multiple backends, whilst leaving the code relatively clean. So far I have only implemented the macros for the code in FIELD_RANKSUFF_DATA_MODULE that would be run if OpenACC is enabled but not CUDA.

The preprocessor definitions need to still be generalised. I envisage the following two: WITH_GPU_OFFLOAD corresponding to _OPENACC and WITH_HIC corresponding to _CUDA.

I am eager to start work (very) soon on an OpenMP offload backend, but before I write more code I would really appreciate your feedback on this.

@awnawab
Copy link
Collaborator Author

awnawab commented Jan 28, 2025

NB: the hpc ci tests are failing because of a permissions issue for PRs filed from forks. Once #78 is merged I can rebase on top of that to fix the failing hpc-ci. I have tested offline that it works.

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very neat and I like how clean this looks, despite the powerful features it unlocks for us. While discussing this offline we've noticed two small things that I've marked as comments.

set( Python3_FIND_VIRTUALENV STANDARD )
find_package( Python3 COMPONENTS Interpreter )

execute_process( COMMAND ${Python3_EXECUTABLE} -m pip --disable-pip-version-check install fypp OUTPUT_QUIET )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment from something we've hit very recently: Bare-metal Python installations may come without any pip or only ancient versions available. But they are guaranteed to have a bootstrap mechanism for it that ensures that you have at least a pip version equivalent to the ensurepip version packaged with the installation. For this, add the following:

Suggested change
execute_process( COMMAND ${Python3_EXECUTABLE} -m pip --disable-pip-version-check install fypp OUTPUT_QUIET )
execute_process( COMMAND ${Python3_EXECUTABLE} -m ensurepip --upgrade OUTPUT_QUIET )
execute_process( COMMAND ${Python3_EXECUTABLE} -m pip --disable-pip-version-check install fypp OUTPUT_QUIET )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even know we could have pip-less python, thanks!


__all__ = ['NvidiaOpenACC']

class NvidiaOpenACC():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of categorising backends by compiler-specific programming model implementations is a really good one, it allows to easily encode the minor differences in the interpretation of standards between vendors.

I would suggest to name them according to compiler toolchain, though, renaming this one e.g. to NVHPCOpenACC. This avoids also any issues associated with putting protected brand names like Nvidia into code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, thanks!

@awnawab awnawab force-pushed the naan-offload-macros branch from 8bfd956 to 9a18e00 Compare February 3, 2025 07:59
@awnawab awnawab force-pushed the naan-offload-macros branch from 9a18e00 to aac718d Compare February 3, 2025 08:01
@awnawab awnawab added the approved-for-ci Approved to run hpc-ci label Feb 3, 2025
@awnawab awnawab marked this pull request as ready for review February 3, 2025 08:02
@awnawab awnawab changed the title WIP: implement offload instructions via macros Implement offload instructions via macros Feb 3, 2025
@awnawab
Copy link
Collaborator Author

awnawab commented Feb 3, 2025

This is now ready for review, please have a look at your earliest convenience 🙏

Copy link
Collaborator

@pmarguinaud pmarguinaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need something like this, but the code which was already difficult to understand, is becoming cryptic.

Would it be possible to translate OpenACC directives such as :

!$acc kernels present (PX)

Into :

$:offload_macros.kernels (present=['PX'])

So that it keeps its meaning in the NVIDIA lingo. I know it looks unfair for other vendors, but we have to choose a convention anyway.

@awnawab
Copy link
Collaborator Author

awnawab commented Feb 4, 2025

Hi @pmarguinaud,

Of course, if you think that makes the code more intuitive I am very happy to adapt it accordingly. Thanks for the feedback, I'll implement your suggestion 👍

@github-actions github-actions bot removed the approved-for-ci Approved to run hpc-ci label Feb 4, 2025
@awnawab awnawab added the approved-for-ci Approved to run hpc-ci label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved-for-ci Approved to run hpc-ci
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants