Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress most cli errors, but still warn if we get too many #416

Merged
merged 5 commits into from
Jan 23, 2025

Conversation

plars
Copy link
Contributor

@plars plars commented Nov 20, 2024

Description

When running a job while polling with the cli, there may be an occasional network delay that can cause you to get errors like this:

ERROR: 2024-11-20 19:04:11 client.py:64 -- Timeout while trying to communicate with the server.
WARNING: 2024-11-20 19:04:11 __init__.py:874 -- Unable to retrieve job state.
unknown

These can usually be ignored since it will try again, but can be often be interpreted by the user to think that something is wrong when it isn't. However, there's also a possibility that the server is unreachable for a long time, and we don't want to hide that from the user if it's happening.

I think this takes a pretty balanced approach and silences most of these warnings and errors (except when it's going to be fatal), while running a counter for consecutive timeout/connection errors. It will warn the user that something could be wrong at every interval of $TESTFLINGER_ERROR_THRESHOLD (default 3) consecutive errors, but also indicate that it will keep retrying.

Resolved issues

CERTTF-283

Documentation

Added a reference section to the documentation about the testflinger config command, and the supported configuration settings.

Web service API changes

N/A

Tests

Additional unit tests added

@plars plars requested a review from a team December 3, 2024 14:08
Copy link
Contributor

@boukeas boukeas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are very useful as it is indeed the case that users may not know how to interpret these messages.

I do believe that a requirement for 10 consecutive failures will effectively suppress all messages, even when there are actual networking issues, thus making these messages ineffective as a diagnostic tool. So my suggestions are to:

  • retrieve the number of messages required to display a warning from the config file, so that we are able to control it more easily
  • reduce the number to 5 or even 3

cli/testflinger_cli/client.py Outdated Show resolved Hide resolved
@plars plars force-pushed the suppress-cli-server-errors branch from 6132a08 to 7cd7fe0 Compare December 9, 2024 17:10
@plars plars requested review from boukeas and tang-mm December 9, 2024 17:15
Copy link
Contributor

@tang-mm tang-mm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the reference! I just need a few clarifications on the command usage.
Also, in the previous how to authentication guide, we asked users to reload an .env file to update their environment variable. should we also change that doc to recommend the method using cli?

docs/reference/cli-config.rst Show resolved Hide resolved
docs/reference/cli-config.rst Show resolved Hide resolved
@plars plars force-pushed the suppress-cli-server-errors branch from 3cabd62 to 13effc5 Compare January 6, 2025 17:57
@plars plars force-pushed the suppress-cli-server-errors branch from 13effc5 to 30f2ab3 Compare January 6, 2025 18:33
@plars plars requested a review from tang-mm January 6, 2025 18:35
@plars plars merged commit 4ba3eee into main Jan 23, 2025
5 checks passed
@plars plars deleted the suppress-cli-server-errors branch January 23, 2025 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants