Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor bwc test suite to re-use existing resources between tests #1171

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

will-hwang
Copy link
Contributor

@will-hwang will-hwang commented Feb 5, 2025

Description

Currently, models and pipelines are re-created and re-deployed per test, leading to redundant model loads and pipeline creations. This change avoids this redundancy by reusing the created resource between test cases.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@will-hwang will-hwang force-pushed the refactor_bwc_test_suite branch from dbb1174 to 69edfc7 Compare February 5, 2025 06:18
Signed-off-by: will-hwang <[email protected]>
Copy link

codecov bot commented Feb 5, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.72%. Comparing base (e8ed3a4) to head (89bbcb5).

Additional details and impacted files
@@             Coverage Diff             @@
##               main    #1171     +/-   ##
===========================================
  Coverage     81.72%   81.72%             
+ Complexity     2494     1247   -1247     
===========================================
  Files           186       93     -93     
  Lines          8426     4213   -4213     
  Branches       1428      714    -714     
===========================================
- Hits           6886     3443   -3443     
+ Misses         1000      500    -500     
+ Partials        540      270    -270     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vibrantvarun
Copy link
Member

Why do we need this change? Each test should be run independently in a dedicated environment with fresh, uncontaminated data, preventing interference from other tests and ensuring the test accurately evaluates the functionality of the component being tested in isolation.

@vibrantvarun
Copy link
Member

We have faced earlier issues with this where during the test run due to shared names the pipeline either used to get deleted or wrong information has been shared amongst test.

@@ -28,9 +28,9 @@ public void testSemanticSearch_E2EFlow() throws Exception {
waitForClusterHealthGreen(NODES_BWC_CLUSTER, 90);
switch (getClusterType()) {
case OLD:
modelId = uploadTextEmbeddingModel();
modelId = getOrUploadTextEmbeddingModel(getIngestionPipeline(PIPELINE_NAME), TEXT_EMBEDDING_PROCESSOR);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of OLD it will always upload. Then why it is changed to getOrUpload?

@@ -373,11 +373,11 @@ private static int getHitCount(final Map<String, Object> searchResponseAsMap) {
}

public static String getModelId(Map<String, Object> pipeline, String processor) {
assertNotNull(pipeline);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition validates that pipeline created earlier has the model Id. At line 377 we are fetching information from a pipeline. So we need to ensure that it is not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is a validation condition

@@ -373,11 +373,11 @@ private static int getHitCount(final Map<String, Object> searchResponseAsMap) {
}

public static String getModelId(Map<String, Object> pipeline, String processor) {
assertNotNull(pipeline);
if (pipeline == null) return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In anycase, it will not be null.

@heemin32
Copy link
Collaborator

heemin32 commented Feb 5, 2025

I suggest reusing only the model. Pipeline is cheap to create. We likely need just one model per type: text, text_embedding, and sparse. There should be a common method to retrieve these models based solely on the type, without requiring the caller to specify anything else. This will ensure the models can be shared correctly without errors.

@vibrantvarun The reason for reusing the model is that during BWC tests, multiple tests deploy the same model simultaneously, leading to model deployment failures during node upgrades. This, in turn, causes flaky test failures.

@vibrantvarun
Copy link
Member

@heemin32 I think test resources created for any test should be just meant to run it. The root cause of the model deployment failure is not the redeployment of the model. It should be something else.

@heemin32
Copy link
Collaborator

heemin32 commented Feb 5, 2025

I think test resources created for any test should be just meant to run it.

Our goal isn’t to test model deployment—that should be covered by ml-common. For us, model deployment is merely a prerequisite for running our feature tests.

Model deployment is resource-intensive, and if each test deploys its own model, the test cluster might not be large enough to handle them all at once. Sharing models across tests shouldn’t affect test coverage or validity in any way.

The primary objective here is to avoid flaky test failures caused by model deployment issues. We also want to ensure that problems in ml-common(issue with model deployment) don’t impact our test velocity. To further minimize test flakiness related to model deployment, we could even consider using lightweight or mock models.

@vibrantvarun
Copy link
Member

@heemin32 in bwc we do test the model deployed in older version of node still exists in new version of node. Because of that test we are always able to find critical issues with ml-commons just before the release.

@vibrantvarun
Copy link
Member

FYI: ml-commons till data does not have BWC framework in place. We as neural-search catch issues in BWC tests.

@vibrantvarun
Copy link
Member

vibrantvarun commented Feb 5, 2025

@heemin32 I am aligned with you on reducing test flakiness but we need to find the actual root cause. Here we are trying to reduce overhead on us by doing less model deployment at the cost of removing model deployment test in bwc. Model deployment is part of each feature we release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants