Too many mem alloc in EPA::reset slow down collision checking #649

laguerreche · 2025-01-20T14:16:59Z

I have two meshes not-closed and concave that collide: one is static and the other one is moved. The only special thing here is that I make a lot of small movements until I reach a sort of equilibrium. The static one is about 5000 vertices and 9700 faces. The moving mesh is about 4200 vertices and 8000 faces.

I am on the devel branch with the last commit being b8c5ab9a1acebc0504969d3c0a2cd95ed620bc89.

When I run my application in Intel VTune ("Bottom-up" tab), I see that a lot of time is spent only in memory allocations :

coal::overlap: 2,4 secs
malloc from EPA::reset: 1,3 sec
coal::details::getShapeSupport<0> (called by coal::details::GJK::getSupport): 0,6 sec
free from coal::ComputeCollision: 0,5 sec

When I step into coal with a debugger I see the same pattern that repeats very often:
1/ call to EPA::reset(0, tolerance)
2/ sv_store was empty so it's resized to 4 elements.

The callstack is:

coald.dll!coal::details::EPA::reset(unsigned __int64 max_iterations_, double tolerance_)
coald.dll!coal::details::EPA::initialize()
coald.dll!coal::details::EPA::EPA(unsigned __int64 max_iterations_, double tolerance_) 
coald.dll!coal::GJKSolver::GJKSolver(const coal::CollisionRequest & request)
coald.dll!coal::MeshCollisionTraversalNode<coal::OBBRSS,0>::leafCollides(unsigned int b1, unsigned int b2, double & sqrDistLowerBound)
coald.dll!coal::collisionRecurse(coal::CollisionTraversalNodeBase * node, unsigned int b1, unsigned int b2, std::list<coal::BVHFrontNode,std::allocator<coal::BVHFrontNode>> * front_list, double & sqrDistLowerBound)
coald.dll!coal::collisionRecurse(coal::CollisionTraversalNodeBase * node, unsigned int b1, unsigned int b2, std::list<coal::BVHFrontNode,std::allocator<coal::BVHFrontNode>> * front_list, double & sqrDistLowerBound)
...
coald.dll!coal::collisionRecurse(coal::CollisionTraversalNodeBase * node, unsigned int b1, unsigned int b2, std::list<coal::BVHFrontNode,std::allocator<coal::BVHFrontNode>> * front_list, double & sqrDistLowerBound)
coald.dll!coal::collide(coal::CollisionTraversalNodeBase * node, const coal::CollisionRequest & request, coal::CollisionResult & result, std::list<coal::BVHFrontNode,std::allocator<coal::BVHFrontNode>> * front_list, bool recursive)
coald.dll!coal::details::orientedMeshCollide<coal::MeshCollisionTraversalNode<coal::OBBRSS,0>,coal::OBBRSS>(const coal::CollisionGeometry * o1, const coal::Transform3s & tf1, const coal::CollisionGeometry * o2, const coal::Transform3s & tf2, const coal::CollisionRequest & request, coal::CollisionResult & result)
coald.dll!coal::BVHCollide<coal::OBBRSS>(const coal::CollisionGeometry * o1, const coal::Transform3s & tf1, const coal::CollisionGeometry * o2, const coal::Transform3s & tf2, const coal::CollisionRequest & request, coal::CollisionResult & result)
coald.dll!coal::BVHCollide<coal::OBBRSS>(const coal::CollisionGeometry * o1, const coal::Transform3s & tf1, const coal::CollisionGeometry * o2, const coal::Transform3s & tf2, const coal::GJKSolver * __formal, const coal::CollisionRequest & request, coal::CollisionResult & result)
coald.dll!coal::ComputeCollision::run(const coal::Transform3s & tf1, const coal::Transform3s & tf2, const coal::CollisionRequest & request, coal::CollisionResult & result)
coald.dll!coal::ComputeCollision::operator()(const coal::Transform3s & tf1, const coal::Transform3s & tf2, const coal::CollisionRequest & request, coal::CollisionResult & result)

To me, calls from leafCollides lead to many small allocations of memory from EPA::reset that kill performance. Since sv_store is always sized to 4 elements, a fix may be to set it to a fixed size at build time and create its content on the stack. Another solution would be to let the user provide a memory allocator that would make a better job here than std::allocator. Currently, I have better performance with FCL 0.7.

The text was updated successfully, but these errors were encountered:

lmontaut · 2025-01-20T17:12:38Z

Hi @laguerreche,
You're right.
However, the problem does not come from EPA but from MeshCollisionTraversalNode.

EPA can't have a fixed sized sv_store because its maximum size depends on the max number of iterations of EPA (which is a user parameter, not known at compile time). sv_store is not of size 4.
This is fine for other pairs of objects because the ComputeCollision functor reuses the same EPA object, so a malloc occurs only once (first time calling operator() on the shapes).
However, for MeshCollisionTraversalNode specifically, an EPA object is recreated every time it calls leafCollides... So indeed, the EPA::reset malloc occurs everytime leafCollides is called.

I can't remember exactly why this is done, but I had marked to fix that that on my todo list:

coal/include/coal/internal/traversal_node_bvhs.h

Lines 206 to 207 in 09ccd5f

    
           // TODO(louis): MeshCollisionTraversalNode should have its own GJKSolver. 
        
           GJKSolver solver(this->request);

What's for sure is that it's a mistake, which should not be hard to fix.

lmontaut self-assigned this Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too many mem alloc in EPA::reset slow down collision checking #649

Too many mem alloc in EPA::reset slow down collision checking #649

laguerreche commented Jan 20, 2025

lmontaut commented Jan 20, 2025

Too many mem alloc in EPA::reset slow down collision checking #649

Too many mem alloc in EPA::reset slow down collision checking #649

Comments

laguerreche commented Jan 20, 2025

lmontaut commented Jan 20, 2025