Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce cron jobs to clean up the export cache (data/cache/export/) and temporary files/directories (data/tmp/) #8804

Merged
merged 70 commits into from
Jan 14, 2025

Conversation

Marishka17
Copy link
Contributor

@Marishka17 Marishka17 commented Dec 9, 2024

Motivation and context

Depends on #8721
PR introduces the following changes:

  • [Export datasets|backups] Cache cleaning moved into a separate cron job. A separate RQ job is no longer enqueued after each export process
  • [Export datasets|backups] Temporary directory is created inside the common CVAT tmp directory (data/tmp/) instead of project|task|job/id/tmp/export_cache/
  • [Export datasets] One tmp directory (instead of 2) is created during the export process
  • Added the exportcachecleanup management command to remove outdated project|task|job/id/tmp/export_cache/ directories
  • Added a cron job to clean up the data/tmp/ directory. The new setting TMP_FILE_OR_DIR_RETENTION_DAYS is used to determine whether a file or directory should be removed
  • Locks are now used when working with backups as they are used for export

Notes:

  • data/project|task|job/id/tmp/ is still used during uploading annotations/datasets, but this should be fixed in a separate PR.

Breaking changes:

  • Enqueuing RQ jobs to backup a project/task: A filename argument was removed from the background function signature. Previously enqueued jobs will fail.
  • Cleaning cache after exporting backups or datasets: Scheduled RQ jobs will fail due to cvat.apps.dataset_manager.views.clear_export_cache was moved and cvat.apps.engine.backup._clear_export_cache was deleted.

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Summary by CodeRabbit

  • New Features

    • Introduced periodic tasks for automatic cleanup of export caches for projects, tasks, and jobs.
    • Added new functionality for managing and cleaning up export cache files.
  • Bug Fixes

    • Enhanced error handling for expired files in export processes.
  • Refactor

    • Restructured export and cleanup functionalities for improved clarity and maintainability.
    • Updated test methods for better focus on export and cleanup operations.
  • Documentation

    • Added deprecation warnings for outdated environment variables related to caching settings.
  • Chores

    • Expanded formatting script to include new files for consistent code style.

Copy link
Contributor

coderabbitai bot commented Dec 9, 2024

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces a comprehensive refactoring of the export and cache management system in the CVAT application. The changes span multiple files and focus on improving the modularity, error handling, and organization of export-related functionality. Key modifications include introducing new classes for file type management, restructuring export cache handling, adding periodic cleanup jobs for export caches, and enhancing the overall file management process across projects, tasks, and jobs.

Changes

File Change Summary
cvat/apps/dataset_manager/util.py Added ExportFileType enum, refactored filename parsing classes, introduced ExportCacheManager with new methods for file path generation
cvat/apps/dataset_manager/views.py Removed get_export_cache_dir, updated export function, renamed _retry_current_rq_job to public method
cvat/apps/engine/models.py Added abstract base classes _FileSystemRelatedModel and _Exportable, updated Project, Task, and Job models
cvat/apps/engine/cron.py New file for managing export cache cleanup, added clear_export_cache and cron_export_cache_cleanup functions
cvat/settings/base.py Added periodic jobs for export cache cleanup for projects, tasks, and jobs

Poem

🐰 Hop, hop, through export's maze,
Cleaning caches in clever ways,
Enum and manager, hand in paw,
Refactoring code without a flaw,
CVAT's rabbit leaps with glee! 🚀


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Marishka17 Marishka17 force-pushed the mk/clear_cache_cron_job branch 2 times, most recently from b885dac to f20eadf Compare December 10, 2024 16:00
@Marishka17 Marishka17 force-pushed the mk/clear_cache_cron_job branch from f20eadf to 92f9d0b Compare December 10, 2024 16:09
@Marishka17
Copy link
Contributor Author

/check

@cvat-ai cvat-ai deleted a comment from github-actions bot Dec 10, 2024
Copy link
Contributor

github-actions bot commented Dec 10, 2024

❌ Some checks failed
📄 See logs here

@Marishka17 Marishka17 force-pushed the mk/clear_cache_cron_job branch from 14084b8 to e4c24ea Compare December 11, 2024 11:02
@Marishka17 Marishka17 changed the title Move export cache cleaning into cron jobs [WIP] Move export cache cleaning into cron jobs Dec 11, 2024
@Marishka17 Marishka17 changed the title [WIP] Move export cache cleaning into cron jobs [Dependent] Move export cache cleaning into cron jobs Dec 17, 2024
@Marishka17 Marishka17 changed the title [Dependent] Move export cache cleaning into cron jobs Move export cache cleaning into cron jobs Dec 20, 2024
@Marishka17
Copy link
Contributor Author

/check

Copy link
Contributor

github-actions bot commented Dec 20, 2024

✔️ All checks completed successfully
📄 See logs here

cvat/settings/base.py Outdated Show resolved Hide resolved
@@ -542,7 +559,7 @@ def _export_task(self, zip_obj, target_dir=None):
self._write_annotations(zip_obj, target_dir)
self._write_annotation_guide(zip_obj, target_dir)

def export_to(self, file, target_dir=None):
def export_to(self, file: str | ZipFile, target_dir: str | None = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an error, but such annotations in inherited functions tend to become outdated quickly. Typically it's better to just use annotations from the original function instead of duplicating them everywhere.

cvat/apps/dataset_manager/cron.py Outdated Show resolved Hide resolved
cvat/apps/dataset_manager/cron.py Outdated Show resolved Hide resolved
cvat/apps/dataset_manager/cron.py Outdated Show resolved Hide resolved
cvat/apps/dataset_manager/cron.py Outdated Show resolved Hide resolved
cvat/apps/dataset_manager/cron.py Outdated Show resolved Hide resolved
cvat/apps/dataset_manager/util.py Outdated Show resolved Hide resolved
cvat/settings/base.py Outdated Show resolved Hide resolved
cvat/apps/engine/management/commands/runperiodicjob.py Outdated Show resolved Hide resolved
cvat/apps/dataset_manager/cron.py Outdated Show resolved Hide resolved
@Marishka17 Marishka17 force-pushed the mk/clear_cache_cron_job branch from e5b1236 to 6baf90b Compare January 14, 2025 12:04
@Marishka17 Marishka17 merged commit e276295 into develop Jan 14, 2025
35 checks passed
@Marishka17 Marishka17 deleted the mk/clear_cache_cron_job branch January 14, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants