Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluids: Add PyTorch external DD SGS evaluation #1581

Merged
merged 6 commits into from
Jun 12, 2024

Conversation

jrwrigh
Copy link
Collaborator

@jrwrigh jrwrigh commented May 9, 2024

Add PyTorch as a external DD SGS evaluation. This is a "follow up" to #1361 where we use PyTorch to run the data-driven model instead of a native implementation.

ToDos:

  • Add in PetscLogEvents (particularly for inference and the data transfer steps)
  • Add documentation
  • Cleanup the Makefile additions (remove the automatic USE_LIBTORCH testing)
  • Add weak symbols so that this can be compiled without pytorch
  • Add command line switch to enable pytorch vs internal sequential
  • Move createPyTorchModel into the testing directory (it's only useful for documenting the creation of the testing model (instead of it just being a binary blob).
  • Rename "sequential_internal" to "sequential_ceed"

@jrwrigh jrwrigh self-assigned this May 9, 2024
@jrwrigh jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch 2 times, most recently from 30f4204 to 50af404 Compare May 9, 2024 23:00
@jrwrigh jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch 4 times, most recently from ab6b800 to 984a964 Compare May 20, 2024 14:29
@jrwrigh jrwrigh marked this pull request as ready for review May 20, 2024 21:57
@jrwrigh jrwrigh added 1-In Review and removed 0-WIP labels May 20, 2024
@jrwrigh jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch 3 times, most recently from 26b57d3 to 31dc6d3 Compare May 26, 2024 19:42
@jrwrigh
Copy link
Collaborator Author

jrwrigh commented May 27, 2024

This is good for review now. Tests pass on Noether for CPU and CUDA. I'm having some difficulties getting PyTorch+ROCM to build with Spack, but that's not a huge priority since we don't have hardware to run that on right now.

@jrwrigh jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch from 31dc6d3 to 8d82690 Compare May 28, 2024 19:27
@jeremylt jeremylt added the HONEE label Jun 10, 2024
tests/junit.py Outdated Show resolved Hide resolved
Copy link
Member

@jeremylt jeremylt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems reasonable to me

@jrwrigh jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch from 8d82690 to a8d73e7 Compare June 12, 2024 20:40
jrwrigh added 5 commits June 12, 2024 14:41
- Rename sequential_internal -> *_ceed
To have the log_events accessbile to torch (in C++), I needed to
separate out the header file containing the extern PetscLogEvent
declarations. While I was at it, I figured it'd be more clear to have a
separate log_events.c file as well to have the actual "storage" of the
PetscLogEvents and the RegisterLogEvents function itself.
On Sunspot, on-device inference is not working reliably. I'm not sure
exactly why at the moment (whether it's a libCEED backend issue or
something else).
@jrwrigh jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch 3 times, most recently from cd4fd42 to 280bdd9 Compare June 12, 2024 20:52
@jrwrigh jrwrigh force-pushed the jrwrigh/pytorch_external_sgs branch from 280bdd9 to 637c7b1 Compare June 12, 2024 21:02
@jrwrigh jrwrigh merged commit 9702fad into main Jun 12, 2024
28 checks passed
@jrwrigh jrwrigh deleted the jrwrigh/pytorch_external_sgs branch June 12, 2024 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants