-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize DB requests in project list, job list, and other endpoints #8275
base: develop
Are you sure you want to change the base?
Conversation
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughRecent updates enhance the reliability and performance of the CVAT application by improving data serialization and query optimization. Key changes include filtering empty task subsets in serializers, refining dimension retrieval logic, and expanding related fields in database queries for the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- cvat/apps/engine/serializers.py (1 hunks)
- cvat/apps/engine/views.py (1 hunks)
- cvat/apps/iam/permissions.py (1 hunks)
Additional comments not posted (3)
cvat/apps/iam/permissions.py (1)
51-57
: LGTM! But verify the function usage in the codebase.The code changes are approved.
However, ensure that all function calls to
get_organization
handle the new logic correctly.Verification successful
The function usage is verified and no issues are found.
The
get_organization
function is used correctly in the codebase, handling the new logic appropriately.
get_iam_context(request, obj)
usesget_organization
to retrieve the organization and then gets membership based on it.- Middleware setup uses
get_organization
to setrequest.iam_context
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `get_organization` handle the new logic correctly. # Test: Search for the function usage. Expect: Only occurrences of the new logic. rg --type python -A 5 $'get_organization'Length of output: 71
Script:
#!/bin/bash # Description: Verify all function calls to `get_organization` handle the new logic correctly. # Find all Python files and search for the `get_organization` function usage within them. fd -e py -x rg 'get_organization' {}Length of output: 498
Script:
#!/bin/bash # Description: Verify all function calls to `get_organization` handle the new logic correctly. # Find all Python files and search for the `get_organization` function usage within them, including more context lines. fd -e py -x rg 'get_organization' -A 10 -B 5 {}Length of output: 3618
cvat/apps/engine/serializers.py (1)
1363-1370
: LGTM!The changes improve the robustness of the
to_representation
method by ensuring only valid, non-empty task subsets are included and a valid dimension is returned.cvat/apps/engine/views.py (1)
1708-1719
: Optimize database query performance by expandingselect_related
.The added fields in the
select_related
method (segment__task__source_storage
,segment__task__target_storage
,segment__task__organization
,segment__task__project__organization
,segment__task__owner
,segment__task__project__owner
) will help optimize database queries by pre-fetching related data. This change should improve performance when accessing these fields in the viewset.
/api/jobs now takes longer time (on queries metric) Each /api/jobs/id/preview also now takes longer time (on queries metric) [significantly] |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #8275 +/- ##
===========================================
- Coverage 83.38% 83.35% -0.03%
===========================================
Files 389 389
Lines 41526 41530 +4
Branches 3856 3856
===========================================
- Hits 34626 34619 -7
- Misses 6900 6911 +11
|
I've increased the page_size to get more representative output: Baseline: Select_related (source_storage, target_storage, organization, owner), 12 joins: Select_related (source_storage, target_storage) + prefetch_related(organization, owner), 10 joins: And the same, but for 12 jobs (as UI presents): I think, it makes sense to choose one of the 2 optimized variants. I think, the version with 10 joins is a good tradeoff for now, it provides a good balance between the two others. |
).prefetch_related( | ||
'segment__task__organization', | ||
'segment__task__owner', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May you explain why using prefetch_related is better here? (I see that it is according to experiments, but interesting why prefetch_related is quicker). And why exactly these two fields
From my expectations it just makes extra requests (what in general requires more time) to the database and does not have benefits in comparison with select_related
IMHO we just need to define queryset "per view", leaving in defaults only those fields, necessary to IAM logic (e.g. organization, owner)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May you explain why using prefetch_related is better here?
From my opinion it just makes extra requests (what in general requires more time) to the database and does not have benefits in comparison with select_related
Yes, it does extra requests and does joining on the server side instead of doing it on the DB side, like in select_related
. There are 2 "opposite" sides of prefetching - prefetch and join everything on the DB side and prefetch everything separately and join on the server side. Basically yes, we'd always prefer joining on the DB side, but separation can help in the following case: there is a small table, and its results are expected to be reused in many results. In this case it may be beneficial to fetch this small table separately and join it on the server side. Doing it on the DB side would significantly increase the DB response by populating it with repeated entries. Probably, it's the case in my setup - I have just several users and orgs. I think the best strategy for optimizing this is actually measure the changes on the production server. Or, at least, take the statistics from it, to make the split decisions informed.
IMHO we just need to define queryset "per view", leaving in defaults only those fields, necessary to IAM logic (e.g. organization, owner)
Probably, you meant "per endpoint usecase"? Ok, it makes sense, I agree.
Quality Gate passedIssues Measures |
Quality Gate passedIssues Measures |
Motivation and context
All this primarily affected GET /api/projects and GET /api/jobs, but other endpoints also benefited from this a little bit.
TODO:
How has this been tested?
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation