-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nbody wall damping #33
base: nanobind
Are you sure you want to change the base?
Conversation
…ive definite test.
* Switch to nanobind * Update env * Use FetchContent for nanobind * Formatting * Allow for any CPU tensor to be consumed by the interface * Add test * Add tensor provider tests * Update CI * Update CI * Update CI * Add multidevice cached allocator. Avoids temporary memory allocations Modify C++ interface to consume device_span instead of raw pointers Add device_adapter, allowing solvers to consume data in their desired device Only ported SelfMobility * Add the rest of the solvers * Add spdlog and gtest tweak CMake debug flags * Finish allocator implementation Update and add some tests Fix lingering device_adaptor issues * Small changes * Small change * Fix bug in Nbody * Fix some tests. Make sure NBody wall tests use particles far away from the wall * Formatting * Remove spdlog * Update interface version * Documentation * Documentation * Update python bindings * Remove spdlog in CMake * Update test * Add some overloads to device_span * Update CPP example * Update CPP example * Add some constructors to device_span * Update cpp example * Add some allocator tests * Move some files around * fix for compiling in double * fix for CI to compile examples * Tweak example CMake * Revert "Tweak example CMake" This reverts commit 365bea7. --------- Co-authored-by: Ryker Fish <[email protected]>
@rykerfish, could you upload the benchmark script? Perhaps under examples/python? |
At first glance, I would say that you are seeing torques being only x2 slower because these kernels are bandwidth-limited (as its usual for GPUs). |
The Nbody wall fluctuation test is still failing with the dampening. Perhaps it allows for some leeway, but the matrix is still non-SPD too close to the wall? |
I made some changes, please check that I did not screwed up the performance. We could squeeze some cycles here and there, but AFAICT your implementation is solid, we should leave it as is. |
DEPENDS ON #32, MERGE AFTER. I thought doing this on top of the nanobind branch would be a good test- I ran into no issues. Closes #31. Also note: this is supposed to go into main, but I set it to the nanobind branch for the moment so all the commits from that branch don't show. Sorry for doing it a bit strange!
This branch contains the damping code for the NBody wall kernels that ensures they maintain positive definiteness. I tested it by making a check for all positive eigenvalues that failed before the damping and passed afterwards. It maintains symmetry as well.
Note that this doesn't change the reference files for the wall mobility tests- those were all generated with the minimum height exactly one hydrodynamic radius from the wall, which is right before the damping code kicks in.
I also tried to make it so that$\mathcal{M}_{UF}$ is the only block computed when torques are not given, and it makes the force computation about 2x as fast as the computation with torques. I'm assuming there's a substantial overhead that lowers it from being 4x faster than torques as you'd expect since it is only computing 1/4 of the mobility blocks. @RaulPPelaez could you take a look and see if you have recommendations for better ways to implement the speed up? My current method is probably a bit clunky.
Timings- all units are seconds
Algorithm: naive
Algorithm: fast
Algorithm: block