-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] XPackRestIT class failing #120816
Comments
…xpack.test.rest.XPackRestIT #120816
This has been muted on branch main Mute Reasons:
Build Scans:
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Pinging @elastic/ml-core (Team:ML) |
Looking at the logs, looks like ML is crashing the yaml test nodes. The final lines of the log include:
|
Higher up in the node logs I see this as well, but this might not be a big deal?
|
It seems at least one ML test is to blame for some of the valid failures:
|
Many test failures in a build are from this:
https://gradle-enterprise.elastic.co/s/4rprshounwmig @davidkyle ^ Maybe related to #120405? This is also the one where the node crashed, maybe its unrelated, but the node crashed due to subtracting more bytes than the model added to the circuit breaker. |
The XPackRestIT suite has been unmuted in #120859 Most of the PR build failures relate to an error in monitoring
All those stem from this draft PR #120718 that changes the wording in the deprecation message. We can safely ignore those.
This error is from the test run where the node stopped due to the CB exception. It comes from the start up of the next test waiting for a condition that will never be satisfied as the node has gone. Later in the console output we see the node is talking to the client.
That leaves a few ML issues to investigate 1 Circuit Breaker Error
I've absolutely no idea how this can occur but it did and it took down the node which caused many test failures. 2 Exception logged by MlInitializationService
This did not cause any test failures but it is suspicious. Here we have a race condition between the post test feature reset and the It might be possible for 3 Bug in ML feature reset
In this case and others the failure was because the ml feature reset API call failed (timed out). CauseProbably related to #120405 I've removed the blocker label as the suite has now been unmuted and set this to medium risk while we investigate the above. |
This has been muted on branch main Mute Reasons:
Build Scans:
|
…xpack.test.rest.XPackRestIT #120816
elasticsearch-intake #16416 / part3 is the circuit breaker The PR build failures are unrelated to ML I've muted the inference tests causing the circuit breaker to trigger and unmuted the test suite in #120897 |
Mute failing inference_crud yml tests and unmute the rest of XPackRestIT For #120816
This has been muted on branch main Mute Reasons:
Build Scans:
|
…xpack.test.rest.XPackRestIT #120816
ugh:
Seems to be causing timeouts. We are definitely fighting a bot here to keep this from muting the suite. |
|
The bot is muting the suite because the failures are occurring in the test setup, an assertion is taking down the node then the next test fails. In elasticsearch-intake #16817 / part3 the assertion failure comes from the ml auditor code:
The cause is an connection exception because the node has gone and once the node has gone every subsequent test will fail racking up the failure count. The PR failures are unrelated and are genuine test failures stemming from changes in open PRs. Most are from these two PRs; #121099 and #121078. The signal from these failures is being misinterpreted as they are due to uncommitted code in the PRs not code in the main branch however, these failures are part of the reason the bot is muting the suite. elasticsearch-intake #16817 / part3 is a transform test so I've muted all ml and transform tests while we figure this one out, see #121377 The failing assertion is protected by a null check https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/common/notifications/AbstractAuditor.java#L175-L179 and will not trigger an error in production code. The assertion is that something unexpected has happened and this is probably due to some race condition. For this reason I don't consider this a blocker and will remove the label once the suite is unmuted by merging #121377 |
This has been muted on branch 8.18 Mute Reasons:
Build Scans: |
…xpack.test.rest.XPackRestIT #120816
The latest failure is the auditor assertion again
XPackRestIT is unmuted and the ml tests muted for 8.18 in #121765 |
Build Scans:
Reproduction Line:
Applicable branches:
8.18
Reproduces locally?:
N/A
Failure History:
See dashboard
Failure Message:
Issue Reasons:
Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.
The text was updated successfully, but these errors were encountered: