refactor bwc test suite to re-use existing resources between tests #1171

will-hwang · 2025-02-05T06:09:20Z

Description

Currently, models and pipelines are re-created and re-deployed per test, leading to redundant model loads and pipeline creations. This change avoids this redundancy by reusing the created resource between test cases.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: will-hwang <[email protected]>

codecov · 2025-02-05T07:49:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.72%. Comparing base (e8ed3a4) to head (89bbcb5).

Additional details and impacted files

@@             Coverage Diff             @@
##               main    #1171     +/-   ##
===========================================
  Coverage     81.72%   81.72%             
+ Complexity     2494     1247   -1247     
===========================================
  Files           186       93     -93     
  Lines          8426     4213   -4213     
  Branches       1428      714    -714     
===========================================
- Hits           6886     3443   -3443     
+ Misses         1000      500    -500     
+ Partials        540      270    -270

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: will-hwang <[email protected]>

vibrantvarun · 2025-02-05T17:05:30Z

Why do we need this change? Each test should be run independently in a dedicated environment with fresh, uncontaminated data, preventing interference from other tests and ensuring the test accurately evaluates the functionality of the component being tested in isolation.

vibrantvarun · 2025-02-05T17:06:19Z

We have faced earlier issues with this where during the test run due to shared names the pipeline either used to get deleted or wrong information has been shared amongst test.

vibrantvarun · 2025-02-05T17:08:40Z

qa/rolling-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/rolling/SemanticSearchIT.java

@@ -28,9 +28,9 @@ public void testSemanticSearch_E2EFlow() throws Exception {
        waitForClusterHealthGreen(NODES_BWC_CLUSTER, 90);
        switch (getClusterType()) {
            case OLD:
-                modelId = uploadTextEmbeddingModel();
+                modelId = getOrUploadTextEmbeddingModel(getIngestionPipeline(PIPELINE_NAME), TEXT_EMBEDDING_PROCESSOR);


In case of OLD it will always upload. Then why it is changed to getOrUpload?

vibrantvarun · 2025-02-05T17:10:06Z

src/testFixtures/java/org/opensearch/neuralsearch/util/TestUtils.java

@@ -373,11 +373,11 @@ private static int getHitCount(final Map<String, Object> searchResponseAsMap) {
    }

    public static String getModelId(Map<String, Object> pipeline, String processor) {
-        assertNotNull(pipeline);


This condition validates that pipeline created earlier has the model Id. At line 377 we are fetching information from a pipeline. So we need to ensure that it is not null.

Also, it is a validation condition

vibrantvarun · 2025-02-05T17:10:35Z

src/testFixtures/java/org/opensearch/neuralsearch/util/TestUtils.java

@@ -373,11 +373,11 @@ private static int getHitCount(final Map<String, Object> searchResponseAsMap) {
    }

    public static String getModelId(Map<String, Object> pipeline, String processor) {
-        assertNotNull(pipeline);
+        if (pipeline == null) return null;


In anycase, it will not be null.

heemin32 · 2025-02-05T17:12:17Z

I suggest reusing only the model. Pipeline is cheap to create. We likely need just one model per type: text, text_embedding, and sparse. There should be a common method to retrieve these models based solely on the type, without requiring the caller to specify anything else. This will ensure the models can be shared correctly without errors.

@vibrantvarun The reason for reusing the model is that during BWC tests, multiple tests deploy the same model simultaneously, leading to model deployment failures during node upgrades. This, in turn, causes flaky test failures.

vibrantvarun · 2025-02-05T17:33:26Z

@heemin32 I think test resources created for any test should be just meant to run it. The root cause of the model deployment failure is not the redeployment of the model. It should be something else.

heemin32 · 2025-02-05T17:57:57Z

I think test resources created for any test should be just meant to run it.

Our goal isn’t to test model deployment—that should be covered by ml-common. For us, model deployment is merely a prerequisite for running our feature tests.

Model deployment is resource-intensive, and if each test deploys its own model, the test cluster might not be large enough to handle them all at once. Sharing models across tests shouldn’t affect test coverage or validity in any way.

The primary objective here is to avoid flaky test failures caused by model deployment issues. We also want to ensure that problems in ml-common(issue with model deployment) don’t impact our test velocity. To further minimize test flakiness related to model deployment, we could even consider using lightweight or mock models.

vibrantvarun · 2025-02-05T18:29:41Z

@heemin32 in bwc we do test the model deployed in older version of node still exists in new version of node. Because of that test we are always able to find critical issues with ml-commons just before the release.

vibrantvarun · 2025-02-05T18:30:29Z

FYI: ml-commons till data does not have BWC framework in place. We as neural-search catch issues in BWC tests.

vibrantvarun · 2025-02-05T18:34:18Z

@heemin32 I am aligned with you on reducing test flakiness but we need to find the actual root cause. Here we are trying to reduce overhead on us by doing less model deployment at the cost of removing model deployment test in bwc. Model deployment is part of each feature we release.

This reverts commit 4cbf419.

This reverts commit 89bbcb5.

This reverts commit 2423e13.

This reverts commit 69edfc7.

…tests" This reverts commit 4fb6deb.

Signed-off-by: will-hwang <[email protected]>

will-hwang added 2 commits February 4, 2025 22:18

refactor bwc test suite to re-use existing resources between tests

4fb6deb

Signed-off-by: will-hwang <[email protected]>

make pipeline nullable

69edfc7

Signed-off-by: will-hwang <[email protected]>

will-hwang force-pushed the refactor_bwc_test_suite branch from dbb1174 to 69edfc7 Compare February 5, 2025 06:18

leave change in model load

2423e13

Signed-off-by: will-hwang <[email protected]>

will-hwang added 2 commits February 5, 2025 00:01

encapsulate get response in try catch clause

89bbcb5

Signed-off-by: will-hwang <[email protected]>

share pipeline names between tests

4cbf419

vibrantvarun reviewed Feb 5, 2025

View reviewed changes

will-hwang added 6 commits February 5, 2025 11:03

Revert "share pipeline names between tests"

3f3c506

This reverts commit 4cbf419.

Revert "encapsulate get response in try catch clause"

86013e9

This reverts commit 89bbcb5.

Revert "leave change in model load"

480b174

This reverts commit 2423e13.

Revert "make pipeline nullable"

134160f

This reverts commit 69edfc7.

Revert "refactor bwc test suite to re-use existing resources between …

1df1b7b

…tests" This reverts commit 4fb6deb.

initializing model before tests

0ec949c

Signed-off-by: will-hwang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor bwc test suite to re-use existing resources between tests #1171

refactor bwc test suite to re-use existing resources between tests #1171

will-hwang commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

vibrantvarun commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025

vibrantvarun Feb 5, 2025

vibrantvarun Feb 5, 2025

vibrantvarun Feb 5, 2025

vibrantvarun Feb 5, 2025

heemin32 commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025

heemin32 commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025 •

edited

Loading

refactor bwc test suite to re-use existing resources between tests #1171

Are you sure you want to change the base?

refactor bwc test suite to re-use existing resources between tests #1171

Conversation

will-hwang commented Feb 5, 2025 • edited Loading

Description

Related Issues

Check List

codecov bot commented Feb 5, 2025 • edited Loading

Codecov Report

vibrantvarun commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025

vibrantvarun Feb 5, 2025

Choose a reason for hiding this comment

vibrantvarun Feb 5, 2025

Choose a reason for hiding this comment

vibrantvarun Feb 5, 2025

Choose a reason for hiding this comment

vibrantvarun Feb 5, 2025

Choose a reason for hiding this comment

heemin32 commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025

heemin32 commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025

vibrantvarun commented Feb 5, 2025 • edited Loading

will-hwang commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

vibrantvarun commented Feb 5, 2025 •

edited

Loading