-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement offload instructions via macros #80
base: main
Are you sure you want to change the base?
Conversation
NB: the hpc ci tests are failing because of a permissions issue for PRs filed from forks. Once #78 is merged I can rebase on top of that to fix the failing hpc-ci. I have tested offline that it works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very neat and I like how clean this looks, despite the powerful features it unlocks for us. While discussing this offline we've noticed two small things that I've marked as comments.
set( Python3_FIND_VIRTUALENV STANDARD ) | ||
find_package( Python3 COMPONENTS Interpreter ) | ||
|
||
execute_process( COMMAND ${Python3_EXECUTABLE} -m pip --disable-pip-version-check install fypp OUTPUT_QUIET ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comment from something we've hit very recently: Bare-metal Python installations may come without any pip or only ancient versions available. But they are guaranteed to have a bootstrap mechanism for it that ensures that you have at least a pip version equivalent to the ensurepip version packaged with the installation. For this, add the following:
execute_process( COMMAND ${Python3_EXECUTABLE} -m pip --disable-pip-version-check install fypp OUTPUT_QUIET ) | |
execute_process( COMMAND ${Python3_EXECUTABLE} -m ensurepip --upgrade OUTPUT_QUIET ) | |
execute_process( COMMAND ${Python3_EXECUTABLE} -m pip --disable-pip-version-check install fypp OUTPUT_QUIET ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't even know we could have pip-less python, thanks!
|
||
__all__ = ['NvidiaOpenACC'] | ||
|
||
class NvidiaOpenACC(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of categorising backends by compiler-specific programming model implementations is a really good one, it allows to easily encode the minor differences in the interpretation of standards between vendors.
I would suggest to name them according to compiler toolchain, though, renaming this one e.g. to NVHPCOpenACC
. This avoids also any issues associated with putting protected brand names like Nvidia into code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, thanks!
8bfd956
to
9a18e00
Compare
… FIELD_RANKSUFF_DATA_MODULE
9a18e00
to
aac718d
Compare
This is now ready for review, please have a look at your earliest convenience 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we need something like this, but the code which was already difficult to understand, is becoming cryptic.
Would it be possible to translate OpenACC directives such as :
!$acc kernels present (PX)
Into :
$:offload_macros.kernels (present=['PX'])
So that it keeps its meaning in the NVIDIA lingo. I know it looks unfair for other vendors, but we have to choose a convention anyway.
Hi @pmarguinaud, Of course, if you think that makes the code more intuitive I am very happy to adapt it accordingly. Thanks for the feedback, I'll implement your suggestion 👍 |
A draft PR that represents a proof of concept of the development I proposed in #73.
The offload instructions in the code are replaced with macros, which are defined in python modules. More of these python modules can be added to support multiple backends, whilst leaving the code relatively clean. So far I have only implemented the macros for the code in
FIELD_RANKSUFF_DATA_MODULE
that would be run if OpenACC is enabled but not CUDA.The preprocessor definitions need to still be generalised. I envisage the following two:
WITH_GPU_OFFLOAD
corresponding to_OPENACC
andWITH_HIC
corresponding to_CUDA
.I am eager to start work (very) soon on an OpenMP offload backend, but before I write more code I would really appreciate your feedback on this.