Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] RestoreTemplateWithMatchOnlyTextMapperIT test failing #107515

Closed
gmarouli opened this issue Apr 16, 2024 · 16 comments · Fixed by #120392
Closed

[CI] RestoreTemplateWithMatchOnlyTextMapperIT test failing #107515

gmarouli opened this issue Apr 16, 2024 · 16 comments · Fixed by #120392
Assignees
Labels
medium-risk An open issue or test failure that is a medium risk to future releases :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test-failure Triaged test failures from CI

Comments

@gmarouli
Copy link
Contributor

While working on #107281 we noticed that when we were running testPartialRestoreSnapshotThatIncludesDataStream and we would include the global state, we would get the assertion error that the cluster state size does not match.

In this #107514 we introduce a test that demonstrates this issue without any other changes. Looking for other similar failures (#59140), we believe the reason is that there is something being mutated on the local node after the restoration. But this is just an assumption.

We are not sure if this is an issue in restore or in mapping, that's why we added both labels. Please feel free to change this accordingly.

Build scan:
https://gradle-enterprise.elastic.co/s/ceaa7hizih32e/tests/:modules:mapper-extras:internalClusterTest/org.elasticsearch.index.mapper.RestoreTemplateWithMatchOnlyTextMapperIT/test

Reproduction line:

./gradlew ':modules:mapper-extras:internalClusterTest' --tests "org.elasticsearch.index.mapper.RestoreTemplateWithMatchOnlyTextMapperIT.test" -Dtests.seed=A1FA43D50CA25035 -Dtests.locale=uk -Dtests.timezone=America/Yellowknife -Druntime.java=21

Applicable branches:
main

Reproduces locally?:
Yes

Failure history:
Failure dashboard for org.elasticsearch.index.mapper.RestoreTemplateWithMatchOnlyTextMapperIT#test

Failure excerpt:

java.lang.AssertionError: cluster state size does not match expected:<4659> but was:<4669>

  at __randomizedtesting.SeedInfo.seed([A1FA43D50CA25035:29AE7C0FA25E3DCD]:0)
  at org.junit.Assert.fail(Assert.java:89)
  at org.junit.Assert.failNotEquals(Assert.java:835)
  at org.junit.Assert.assertEquals(Assert.java:647)
  at org.elasticsearch.test.ESIntegTestCase.ensureClusterStateConsistency(ESIntegTestCase.java:1235)
  at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:586)
  at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2331)
  at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
  at java.lang.reflect.Method.invoke(Method.java:580)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
  at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
  at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
  at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
  at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
  at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
  at org.junit.rules.RunRules.evaluate(RunRules.java:20)
  at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
  at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
  at java.lang.Thread.run(Thread.java:1583)

@gmarouli gmarouli added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/Mapping The storage related side of mappings >test-failure Triaged test failures from CI labels Apr 16, 2024
@elasticsearchmachine elasticsearchmachine added blocker Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. Team:Search Meta label for search team Team:StorageEngine labels Apr 16, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@volodk85 volodk85 added medium-risk An open issue or test failure that is a medium risk to future releases and removed blocker labels Apr 22, 2024
@arteam arteam self-assigned this Apr 26, 2024
@arteam
Copy link
Contributor

arteam commented May 3, 2024

@gmarouli it seems that the test passes you passed wrap the mappings inside _doc.

{
  "_doc": {
    "properties": {
      "@timestamp": {
        "format": "date_optional_time||epoch_millis",
        "type": "date"
      },
      "flag": {
        "type": "boolean"
      },
      "message": {
        "type": "match_only_text"
      }
    }
  }
}

@arteam
Copy link
Contributor

arteam commented May 3, 2024

If you store the compressed mapping in Template without _doc, for some reason it reads back the compressed mapping wrapped in _doc.

@arteam arteam assigned gmarouli and unassigned arteam May 3, 2024
@arteam arteam removed the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label May 3, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 3, 2024
@arteam
Copy link
Contributor

arteam commented May 6, 2024

#78746 seems to be related to this issue

@arteam
Copy link
Contributor

arteam commented May 6, 2024

Looks like the core issue is MetadataIndexTemplateService#wrapMappingsIfNecessary and Template#reduceMappings alter the mappings, that's why we end up with templates with different sizes.

@arteam
Copy link
Contributor

arteam commented May 6, 2024

Another way to fix the test is to use out.writeString(this.mappings.string()); and CompressedXContent.fromJSON(in.readString()) to store/read mappings but it defeats the purpose of storing them compressed.

@gmarouli
Copy link
Contributor Author

gmarouli commented May 8, 2024

@arteam thanks for looking into this! I am still a bit baffled why this got triggered by using the type "type": "match_only_text" on non template mapping as well. I will see if I can create a different example.

@lkts
Copy link
Contributor

lkts commented May 29, 2024

Hey, i was looking at this because it is tagged with storage engine but i don't think RestoreTemplateWithMatchOnlyTextMapperIT exists. Am i missing something?

@arteam
Copy link
Contributor

arteam commented May 30, 2024

@lkts I believe that test was added as a reproducer of the issue in #107514

@lkts
Copy link
Contributor

lkts commented May 31, 2024

Test from #107514 fails for me even with

{
  "_doc": {
    "properties": {
      "@timestamp": {
        "format": "date_optional_time||epoch_millis",
        "type": "date"
      },
      "flag": {
        "type": "boolean"
      },
      "message": {
        "type": "match_only_text"
      }
    }
  }
}

It passes ~10% of the time so there must be some randomness somewhere?

@lkts
Copy link
Contributor

lkts commented Jul 2, 2024

I'll remove storage engine team from here since the issue seems to be in MetadataIndexTemplateService as @arteam mentioned above and logically mappings are idential.

@lkts lkts removed Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings labels Jul 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@elasticsearchmachine elasticsearchmachine added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 2, 2024
@elasticsearchmachine elasticsearchmachine closed this as not planned Won't fix, can't repro, duplicate, stale Nov 5, 2024
@elasticsearchmachine
Copy link
Collaborator

This issue has been closed because it has been open for too long with no activity.

Any muted tests that were associated with this issue have been unmuted.

If the tests begin failing again, a new issue will be opened, and they may be muted again.

@elasticsearchmachine
Copy link
Collaborator

This issue is getting re-opened because there are still AwaitsFix mutes for the given test. It will likely be closed again in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium-risk An open issue or test failure that is a medium risk to future releases :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants