Improve performance of certain physics queries when using Jolt Physics #101071
+137
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
(This addresses the to-do list item in #99895 titled "Get rid of heap allocations in shape queries when requested hits are less or equal to default" and brings the Jolt Physics module in line with the Godot Jolt extension, which also utilizes this optimization, albeit implemented differently.)
Since most physics queries in Godot have a default upper limit to how many hits they can return (e.g. 32 hits for
intersect_point
) there is a ripe opportunity for omitting the memory allocations associated with storing these hits in the cases where we don't in fact exceed this default limit, which is almost certainly the vast majority of cases.This pull request adds a new container to the Jolt Physics module called
JoltInlineVector
, which acts as a hybrid between a stack-allocated buffer and a heap-allocated one, and switches from the former to the latter once a certain templatized capacity has been exceeded. These containers are sometimes referred to as a "small vector" in other codebases.This new container is then used in the
JoltQueryCollector*Multi
classes, which themselves take a templatized default capacity appropriate for the callsite, allowing us to omit1 the memory allocations associated with storing the hits for the following physics queries:PhysicsDirectSpaceState3D.cast_motion
PhysicsDirectSpaceState3D.collide_shape
PhysicsDirectSpaceState3D.intersect_point
PhysicsDirectSpaceState3D.intersect_shape
PhysicsBody3D.move_and_collide
PhysicsBody3D.test_move
PhysicsServer3D.body_test_motion
The actual performance benefits of this will likely vary greatly depending on the platform, but in my measurements on Windows (both in Superluminal and measuring with
QueryPerformanceCounter
) I'm seeing roughly an average reduction of 10-15% CPU time, with certain larger spikes (presumably from occasional page faults) disappearing completely.Here's a before-and-after profiling of
PhysicsDirectSpaceState3D.cast_motion
, where the motion vector spans most of the level in GDQuest's Robo Blast demo:I tried to keep the implementation of this new container as minimal as possible, but I'll admit it turned out to be a few lines longer than I hoped it would be. I do still think this optimization is worthwhile though, since physics queries tend to be plentiful in a lot of games.
Footnotes
Note that even with this optimization there are still memory allocations happening when performing physics queries from scripts, as the script interface relies on things like
TypedArray<Dictionary>
for its results, which still allocate plenty of memory, so this by no means removes all allocations from the queries listed here. ↩