Nbody wall damping #33

rykerfish · 2025-01-24T20:45:22Z

DEPENDS ON #32, MERGE AFTER. I thought doing this on top of the nanobind branch would be a good test- I ran into no issues. Closes #31. Also note: this is supposed to go into main, but I set it to the nanobind branch for the moment so all the commits from that branch don't show. Sorry for doing it a bit strange!

This branch contains the damping code for the NBody wall kernels that ensures they maintain positive definiteness. I tested it by making a check for all positive eigenvalues that failed before the damping and passed afterwards. It maintains symmetry as well.

Note that this doesn't change the reference files for the wall mobility tests- those were all generated with the minimum height exactly one hydrodynamic radius from the wall, which is right before the damping code kicks in.

I also tried to make it so that $\mathcal{M}_{UF}$ is the only block computed when torques are not given, and it makes the force computation about 2x as fast as the computation with torques. I'm assuming there's a substantial overhead that lowers it from being 4x faster than torques as you'd expect since it is only computing 1/4 of the mobility blocks. @RaulPPelaez could you take a look and see if you have recommendations for better ways to implement the speed up? My current method is probably a bit clunky.

Timings- all units are seconds

Algorithm: naive

	NBody	NBody_wall
force	1.7602e-03	2.9738e-03
torque	3.2945e-03	5.7371e-03

Algorithm: fast

	NBody	NBody_wall
force	2.5244e-03	3.3512e-03
torque	4.1480e-03	7.0016e-03

Algorithm: block

	NBody	NBody_wall
force	1.2875e-03	1.2934e-03
torque	2.0957e-03	2.1091e-03

…ive definite test.

…vided

…hange timings

* Switch to nanobind * Update env * Use FetchContent for nanobind * Formatting * Allow for any CPU tensor to be consumed by the interface * Add test * Add tensor provider tests * Update CI * Update CI * Update CI * Add multidevice cached allocator. Avoids temporary memory allocations Modify C++ interface to consume device_span instead of raw pointers Add device_adapter, allowing solvers to consume data in their desired device Only ported SelfMobility * Add the rest of the solvers * Add spdlog and gtest tweak CMake debug flags * Finish allocator implementation Update and add some tests Fix lingering device_adaptor issues * Small changes * Small change * Fix bug in Nbody * Fix some tests. Make sure NBody wall tests use particles far away from the wall * Formatting * Remove spdlog * Update interface version * Documentation * Documentation * Update python bindings * Remove spdlog in CMake * Update test * Add some overloads to device_span * Update CPP example * Update CPP example * Add some constructors to device_span * Update cpp example * Add some allocator tests * Move some files around * fix for compiling in double * fix for CI to compile examples * Tweak example CMake * Revert "Tweak example CMake" This reverts commit 365bea7. --------- Co-authored-by: Ryker Fish <[email protected]>

RaulPPelaez · 2025-01-30T12:33:22Z

@rykerfish, could you upload the benchmark script? Perhaps under examples/python?

RaulPPelaez · 2025-01-30T12:41:09Z

At first glance, I would say that you are seeing torques being only x2 slower because these kernels are bandwidth-limited (as its usual for GPUs).
With torques you move twice the memory (loading torques and writing ang. disp), so yeah, you have x4 less computation, but computation is free compared with memory store/load.
BTW very funny that the algorithm that we call "fast" I actually the slowest. I reckon this is a low number of particles?
Also, surprising that block seems to be immune to the wall correction, you sure that is right?

RaulPPelaez · 2025-01-30T13:20:44Z

The Nbody wall fluctuation test is still failing with the dampening. Perhaps it allows for some leeway, but the matrix is still non-SPD too close to the wall?

This reverts commit 87bf67b.

RaulPPelaez · 2025-01-30T13:46:36Z

I made some changes, please check that I did not screwed up the performance.

We could squeeze some cycles here and there, but AFAICT your implementation is solid, we should leave it as is.

rykerfish added 5 commits January 23, 2025 13:23

add randomized test to check eigenvalues of M

9921614

rewrote dot product kernels

a581cb1

implemented damping for the wall corection kernels- now passing posit…

f8ff7ea

…ive definite test.

made it so UF block is the only one computed when torques are not pro…

3472421

…vided

fixed bug in damping code that made it not symmetric

428b1d0

rykerfish changed the base branch from main to nanobind January 24, 2025 23:15

rykerfish and others added 3 commits January 25, 2025 00:06

tried to get rid of branching with templating but it didn't seem to c…

b3667cf

…hange timings

Merge remote-tracking branch 'origin/main' into nbody_damping

116232d

RaulPPelaez added 3 commits January 30, 2025 14:43

Remove unused variable

87bf67b

Revert "Remove unused variable"

1b2cd12

This reverts commit 87bf67b.

Some changes to torque detection in NBody

6906643

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nbody wall damping #33

Nbody wall damping #33

rykerfish commented Jan 24, 2025 •

edited

Loading

RaulPPelaez commented Jan 30, 2025

RaulPPelaez commented Jan 30, 2025

RaulPPelaez commented Jan 30, 2025

RaulPPelaez commented Jan 30, 2025

Nbody wall damping #33

Are you sure you want to change the base?

Nbody wall damping #33

Conversation

rykerfish commented Jan 24, 2025 • edited Loading

Timings- all units are seconds

Algorithm: naive

Algorithm: fast

Algorithm: block

RaulPPelaez commented Jan 30, 2025

RaulPPelaez commented Jan 30, 2025

RaulPPelaez commented Jan 30, 2025

RaulPPelaez commented Jan 30, 2025

rykerfish commented Jan 24, 2025 •

edited

Loading