Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort studies by name and add basic study intersection check. #1023

Merged
merged 3 commits into from
Apr 26, 2024
Merged

Conversation

goodov
Copy link
Member

@goodov goodov commented Apr 25, 2024

Sort studies by name before generating seed.

This is a required change to migrate to per-file study structure, because per-file structure implies study enumeration in alphabetical order (as files lay on the file system).

This change should be a no-op as long as we don't have studies intersection. This PR adds the required validation to ensure we do not have this kind of intersection, which means if the seed is validated, then it doesn't depend on the initial order of studies in seed.json and it's safe to reorder studies.

Related brave/brave-browser#33654

@goodov goodov marked this pull request as ready for review April 25, 2024 14:49
@goodov goodov requested review from atuchin-m and iefremov April 25, 2024 14:54
@goodov goodov changed the title Sort studies by name and add basic study intersection validation. Sort studies by name and add basic study intersection check. Apr 25, 2024
@@ -213,6 +281,7 @@ def main():

print("Load", args.seed_path.name)
seed_data = json.load(args.seed_path)
seed_data['studies'].sort(key=lambda study: study['name'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose python sort is stable?
Let's add a comment we can't reorder the study with the same name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we reorder the studies with the same name if they do not overlap by filters?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they don't overlap there is no problem.
If I'm not mistaken overlapped studies with the same name is used in Yandex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they don't overlap there is no problem. If I'm not mistaken overlapped studies with the same name is used in Yandex.

We should not allow overlapped studies, added checks ensures we don't.

seed/serialize.py Outdated Show resolved Hide resolved
version_list = []
for part in parts:
if part == '*':
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not 100% sure about this part.
* means different for min/max version.
as for Python is alway [1,2] > [1]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not 100% sure about this part. * means different for min/max version. as for Python is alway [1,2] > [1]

yes, you're right. This comparison is tricky. Fixed and add tests from the C++ version.

version1 = version_to_int_array(test_case[0])
version2 = version_to_int_array(test_case[1])
assert compare_versions(version1, version2) == test_case[2]

Copy link
Member Author

@goodov goodov Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty dirty (having tests and logic in the same file), but we should remove this file entirely in the near future.

test_version_comparison()
for studies in feature_names_to_studies.values():
for i, study1 in enumerate(studies):
study1_platform = get_study_platforms(study1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could study_platform or study_channel be empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could study_platform or study_channel be empty?

yep. improved the intersection check to cover this.

@goodov goodov enabled auto-merge (squash) April 26, 2024 10:43
@goodov goodov merged commit 86ceff9 into main Apr 26, 2024
6 checks passed
@goodov goodov deleted the sort-seed branch April 26, 2024 10:43
goodov added a commit that referenced this pull request Apr 26, 2024
* Sort studies by name and add basic study intersection validation.

* Fix wildcard version comparison, add tests.

* Make sure to properly cover empty channel/platform lists (although this is not allowed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants