Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix export and export cleanup job hangs in scheduled state (#8198)
<!-- Raise an issue to propose your change (https://github.com/cvat-ai/cvat/issues). It helps to avoid duplication of efforts from multiple independent contributors. Discuss your ideas with maintainers to be sure that changes will be approved and merged. Read the [Contribution guide](https://docs.cvat.ai/docs/contributing/). --> <!-- Provide a general summary of your changes in the Title above --> ### Motivation and context <!-- Why is this change required? What problem does it solve? If it fixes an open issue, please link to the issue here. Describe your changes in detail, add screenshots. --> There are 2 schedulers supported by django_rq and by python RQ: `rq_scheduler` and a newer, builtin queue scheduler in RQ. rq_scheduler seems to die slowly in favor of the builtin scheduler. The schedulers have compatible API, but not the implementation. The existing job retry implementation relies on `retry()` calls, which, in turn, rely on the builtin RQ scheduler. CVAT uses rq_scheduler a for some tasks, so it its executed. The builtin RQ scheduler needs the `--with-scheduler` startup parameter on the worker processes. Thus, the jobs were hanging in the scheduled state, as the builtin RQ scheduler was not running on the queues. As CVAT is currently using rq_scheduler, it's decided to continue using it to avoid disruption and use of 2 schedulers together. The implementation in this PR does best efforts to be correct, but it's has potential problems with multiple same jobs running in parallel. In future we need to migrate to the builtin RQ scheduler, as it is the only one maintained as of February 2023. ### How has this been tested? <!-- Please describe in detail how you tested your changes. Include details of your testing environment, and the tests you ran to see how your change affects other areas of the code, etc. --> ### Checklist <!-- Go over all the following points, and put an `x` in all the boxes that apply. If an item isn't applicable for some reason, then ~~explicitly strikethrough~~ the whole line. If you don't do that, GitHub will show incorrect progress for the pull request. If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [ ] I submit my changes into the `develop` branch - [ ] I have created a changelog fragment <!-- see top comment in CHANGELOG.md --> - [ ] I have updated the documentation accordingly - [ ] I have added tests to cover my changes - [ ] I have linked related issues (see [GitHub docs]( https://help.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)) - [ ] I have increased versions of npm packages if it is necessary ([cvat-canvas](https://github.com/cvat-ai/cvat/tree/develop/cvat-canvas#versioning), [cvat-core](https://github.com/cvat-ai/cvat/tree/develop/cvat-core#versioning), [cvat-data](https://github.com/cvat-ai/cvat/tree/develop/cvat-data#versioning) and [cvat-ui](https://github.com/cvat-ai/cvat/tree/develop/cvat-ui#versioning)) ### License - [ ] I submit _my code changes_ under the same [MIT License]( https://github.com/cvat-ai/cvat/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved export and export cache clean operations by adding a retry mechanism to handle job retries, preventing hangs. - **Chores** - Updated internal process for handling job retries using RQ scheduler for better reliability. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
- Loading branch information