Sort studies by name and add basic study intersection check. #1023

goodov · 2024-04-25T14:47:40Z

Sort studies by name before generating seed.

This is a required change to migrate to per-file study structure, because per-file structure implies study enumeration in alphabetical order (as files lay on the file system).

This change should be a no-op as long as we don't have studies intersection. This PR adds the required validation to ensure we do not have this kind of intersection, which means if the seed is validated, then it doesn't depend on the initial order of studies in seed.json and it's safe to reorder studies.

Related brave/brave-browser#33654

atuchin-m · 2024-04-25T15:05:27Z

seed/serialize.py

@@ -213,6 +281,7 @@ def main():

    print("Load", args.seed_path.name)
    seed_data = json.load(args.seed_path)
+    seed_data['studies'].sort(key=lambda study: study['name'])


I suppose python sort is stable?
Let's add a comment we can't reorder the study with the same name.

why can't we reorder the studies with the same name if they do not overlap by filters?

If they don't overlap there is no problem.
If I'm not mistaken overlapped studies with the same name is used in Yandex.

If they don't overlap there is no problem. If I'm not mistaken overlapped studies with the same name is used in Yandex.

We should not allow overlapped studies, added checks ensures we don't.

seed/serialize.py

atuchin-m · 2024-04-25T15:16:28Z

seed/serialize.py

+            version_list = []
+            for part in parts:
+                if part == '*':
+                    break


Hm, I'm not 100% sure about this part.
* means different for min/max version.
as for Python is alway [1,2] > [1]

Hm, I'm not 100% sure about this part. * means different for min/max version. as for Python is alway [1,2] > [1]

yes, you're right. This comparison is tricky. Fixed and add tests from the C++ version.

goodov · 2024-04-26T09:13:04Z

seed/serialize.py

+        version1 = version_to_int_array(test_case[0])
+        version2 = version_to_int_array(test_case[1])
+        assert compare_versions(version1, version2) == test_case[2]
+


this is pretty dirty (having tests and logic in the same file), but we should remove this file entirely in the near future.

atuchin-m · 2024-04-26T10:18:56Z

seed/serialize.py

+    test_version_comparison()
+    for studies in feature_names_to_studies.values():
+        for i, study1 in enumerate(studies):
+            study1_platform = get_study_platforms(study1)


Could study_platform or study_channel be empty?

Could study_platform or study_channel be empty?

yep. improved the intersection check to cover this.

…is is not allowed).

* Sort studies by name and add basic study intersection validation. * Fix wildcard version comparison, add tests. * Make sure to properly cover empty channel/platform lists (although this is not allowed).

goodov marked this pull request as ready for review April 25, 2024 14:49

Sort studies by name and add basic study intersection validation.

8c43c20

goodov force-pushed the sort-seed branch from 2b49923 to 8c43c20 Compare April 25, 2024 14:50

goodov requested review from atuchin-m and iefremov April 25, 2024 14:54

goodov changed the title ~~Sort studies by name and add basic study intersection validation.~~ Sort studies by name and add basic study intersection check. Apr 25, 2024

atuchin-m reviewed Apr 25, 2024

View reviewed changes

seed/serialize.py Outdated Show resolved Hide resolved

atuchin-m reviewed Apr 25, 2024

View reviewed changes

Fix wildcard version comparison, add tests.

4ed280c

goodov commented Apr 26, 2024

View reviewed changes

atuchin-m reviewed Apr 26, 2024

View reviewed changes

Make sure to properly cover empty channel/platform lists (although th…

e55d28b

…is is not allowed).

goodov enabled auto-merge (squash) April 26, 2024 10:43

atuchin-m approved these changes Apr 26, 2024

View reviewed changes

goodov merged commit 86ceff9 into main Apr 26, 2024
6 checks passed

goodov deleted the sort-seed branch April 26, 2024 10:43

goodov mentioned this pull request Apr 26, 2024

Sort studies by name and add basic study intersection check [prod]. (#1023) #1025

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort studies by name and add basic study intersection check. #1023

Sort studies by name and add basic study intersection check. #1023

goodov commented Apr 25, 2024

atuchin-m Apr 25, 2024

goodov Apr 26, 2024

atuchin-m Apr 26, 2024

goodov Apr 26, 2024

atuchin-m Apr 25, 2024

goodov Apr 26, 2024

goodov Apr 26, 2024 •

edited

Loading

atuchin-m Apr 26, 2024

goodov Apr 26, 2024

Sort studies by name and add basic study intersection check. #1023

Sort studies by name and add basic study intersection check. #1023

Conversation

goodov commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goodov Apr 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goodov Apr 26, 2024 •

edited

Loading