Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeout fundamentals for daemon client communication #886

Draft
wants to merge 1 commit into
base: rolling
Choose a base branch
from

Conversation

cottsay
Copy link
Member

@cottsay cottsay commented Mar 5, 2024

This PR is more of a proof-of-concept than a concrete proposal.

If there is a broken ROS 2 daemon process or another completely unrelated TCP server listening on the corresponding XMLRPC port, it's possible for calls like is_daemon_running to hang for a VERY long time.

For example:

  • In one shell, start a simple TCP server on port 11511: nc -k -l 11511
  • In another shell, run ros2 daemon status

You can see the XMLRPC request on the server, but without a response, the call to ros2 daemon status will just sit there. I'm not sure how long it will go before the "global default" timeout will kick in, but I haven't waited long enough to see it.

This can be a particularly bad problem in this package's tests, many of which connect to and sometimes create and destroy daemon processes. It would be nice if those tests didn't hang.

Possible mitigation for problems like #610 and #737

@cottsay cottsay force-pushed the cottsay/daemon-client-timeout branch from 22bd616 to 1a0f95e Compare March 5, 2024 22:22
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having HTTPConnection.timeout set looks reasonable to me.

Comment on lines +76 to +77
for the daemon node to respond. If it is not given,
the global default timeout setting is used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using global default timeout would never time out with my environment... trying to access the server to list the method in localhost, it is not expected to take more than 10 seconds, maybe we can set specific timeout in default here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any reasonable circumstances where it would take longer than that either. I'm in favor of a default timeout as well, but I think we need more feedback on what value might be appropriate.

I also think we need to improve the error message if it becomes the default behavior. Right now, the exception is unhandled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants