Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many mem alloc in EPA::reset slow down collision checking #649

Open
laguerreche opened this issue Jan 20, 2025 · 1 comment
Open

Too many mem alloc in EPA::reset slow down collision checking #649

laguerreche opened this issue Jan 20, 2025 · 1 comment
Assignees

Comments

@laguerreche
Copy link

I have two meshes not-closed and concave that collide: one is static and the other one is moved. The only special thing here is that I make a lot of small movements until I reach a sort of equilibrium. The static one is about 5000 vertices and 9700 faces. The moving mesh is about 4200 vertices and 8000 faces.

I am on the devel branch with the last commit being b8c5ab9a1acebc0504969d3c0a2cd95ed620bc89.

When I run my application in Intel VTune ("Bottom-up" tab), I see that a lot of time is spent only in memory allocations :

  • coal::overlap: 2,4 secs
  • malloc from EPA::reset: 1,3 sec
  • coal::details::getShapeSupport<0> (called by coal::details::GJK::getSupport): 0,6 sec
  • free from coal::ComputeCollision: 0,5 sec

When I step into coal with a debugger I see the same pattern that repeats very often:
1/ call to EPA::reset(0, tolerance)
2/ sv_store was empty so it's resized to 4 elements.

The callstack is:

coald.dll!coal::details::EPA::reset(unsigned __int64 max_iterations_, double tolerance_)
coald.dll!coal::details::EPA::initialize()
coald.dll!coal::details::EPA::EPA(unsigned __int64 max_iterations_, double tolerance_) 
coald.dll!coal::GJKSolver::GJKSolver(const coal::CollisionRequest & request)
coald.dll!coal::MeshCollisionTraversalNode<coal::OBBRSS,0>::leafCollides(unsigned int b1, unsigned int b2, double & sqrDistLowerBound)
coald.dll!coal::collisionRecurse(coal::CollisionTraversalNodeBase * node, unsigned int b1, unsigned int b2, std::list<coal::BVHFrontNode,std::allocator<coal::BVHFrontNode>> * front_list, double & sqrDistLowerBound)
coald.dll!coal::collisionRecurse(coal::CollisionTraversalNodeBase * node, unsigned int b1, unsigned int b2, std::list<coal::BVHFrontNode,std::allocator<coal::BVHFrontNode>> * front_list, double & sqrDistLowerBound)
...
coald.dll!coal::collisionRecurse(coal::CollisionTraversalNodeBase * node, unsigned int b1, unsigned int b2, std::list<coal::BVHFrontNode,std::allocator<coal::BVHFrontNode>> * front_list, double & sqrDistLowerBound)
coald.dll!coal::collide(coal::CollisionTraversalNodeBase * node, const coal::CollisionRequest & request, coal::CollisionResult & result, std::list<coal::BVHFrontNode,std::allocator<coal::BVHFrontNode>> * front_list, bool recursive)
coald.dll!coal::details::orientedMeshCollide<coal::MeshCollisionTraversalNode<coal::OBBRSS,0>,coal::OBBRSS>(const coal::CollisionGeometry * o1, const coal::Transform3s & tf1, const coal::CollisionGeometry * o2, const coal::Transform3s & tf2, const coal::CollisionRequest & request, coal::CollisionResult & result)
coald.dll!coal::BVHCollide<coal::OBBRSS>(const coal::CollisionGeometry * o1, const coal::Transform3s & tf1, const coal::CollisionGeometry * o2, const coal::Transform3s & tf2, const coal::CollisionRequest & request, coal::CollisionResult & result)
coald.dll!coal::BVHCollide<coal::OBBRSS>(const coal::CollisionGeometry * o1, const coal::Transform3s & tf1, const coal::CollisionGeometry * o2, const coal::Transform3s & tf2, const coal::GJKSolver * __formal, const coal::CollisionRequest & request, coal::CollisionResult & result)
coald.dll!coal::ComputeCollision::run(const coal::Transform3s & tf1, const coal::Transform3s & tf2, const coal::CollisionRequest & request, coal::CollisionResult & result)
coald.dll!coal::ComputeCollision::operator()(const coal::Transform3s & tf1, const coal::Transform3s & tf2, const coal::CollisionRequest & request, coal::CollisionResult & result)

To me, calls from leafCollides lead to many small allocations of memory from EPA::reset that kill performance. Since sv_store is always sized to 4 elements, a fix may be to set it to a fixed size at build time and create its content on the stack. Another solution would be to let the user provide a memory allocator that would make a better job here than std::allocator. Currently, I have better performance with FCL 0.7.

@lmontaut
Copy link
Contributor

Hi @laguerreche,
You're right.
However, the problem does not come from EPA but from MeshCollisionTraversalNode.

EPA can't have a fixed sized sv_store because its maximum size depends on the max number of iterations of EPA (which is a user parameter, not known at compile time). sv_store is not of size 4.
This is fine for other pairs of objects because the ComputeCollision functor reuses the same EPA object, so a malloc occurs only once (first time calling operator() on the shapes).
However, for MeshCollisionTraversalNode specifically, an EPA object is recreated every time it calls leafCollides... So indeed, the EPA::reset malloc occurs everytime leafCollides is called.

I can't remember exactly why this is done, but I had marked to fix that that on my todo list:

// TODO(louis): MeshCollisionTraversalNode should have its own GJKSolver.
GJKSolver solver(this->request);

What's for sure is that it's a mistake, which should not be hard to fix.

@lmontaut lmontaut self-assigned this Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants