Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Time Limit for Archiving Jobs #22973

Open
sgiehl opened this issue Jan 23, 2025 · 0 comments · May be fixed by #22979
Open

Implement Time Limit for Archiving Jobs #22973

sgiehl opened this issue Jan 23, 2025 · 0 comments · May be fixed by #22979
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Stability For issues that make Matomo more stable and reliable to run for sys admins.

Comments

@sgiehl
Copy link
Member

sgiehl commented Jan 23, 2025

Background

Matomo's archiving process handles invalidations through archiving jobs, which may run across multiple servers. These jobs process invalidations sequentially until either all available invalidations are handled or further processing is blocked due to collisions with other running jobs.

Currently, administrators can set limits on archiving jobs to control:

  • The number of jobs running in parallel.
  • The maximum number of invalidations processed per job (to prevent indefinite job execution).

However, in cases where there are numerous complex invalidations—such as those involving complex segments—archiving jobs can still run for extended periods, potentially lasting several days.

Problem Statement

Long-running archiving jobs can create significant issues, as Matomo relies on freshly started jobs to detect and process new invalidations, particularly for recent time periods such as yesterday or today. If all available archiving jobs are occupied for extended durations, no new invalidation requests may be generated. This can result in:

  1. Delayed Data Aggregation: Invalidation requests for smaller periods (e.g., daily data) may be indirectly processed through larger period aggregations.
  2. Missing Archives: In some cases, periods might be completely omitted from processing or might be left behind in an outdated state.

Proposed Solution

To mitigate these risks, we propose introducing a configurable time limit for archiving jobs. This feature would allow administrators to specify a maximum runtime for each job. Once the specified time limit is reached, the job should gracefully terminate and allow a new job to start.

Expected Behavior:

  • After processing each invalidation, the job should check whether the time limit has been reached.
  • If the limit is exceeded, the job should stop further processing and exit cleanly.
  • The next scheduled job will pick up remaining or new invalidations.

Benefits

  • Improved Data Freshness: Ensures timely processing of daily and recent period data.
  • Better Resource Allocation: Prevents system resources from being tied up by long-running jobs.
  • Increased Flexibility: Administrators gain greater control over job execution times to balance workload and performance.

Implementation Considerations

  • The time limit should be configurable via parameter for the console command
  • A log entry should be generated when a job exits due to reaching the time limit.
@sgiehl sgiehl added Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Stability For issues that make Matomo more stable and reliable to run for sys admins. labels Jan 23, 2025
@sgiehl sgiehl linked a pull request Jan 24, 2025 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. Stability For issues that make Matomo more stable and reliable to run for sys admins.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant