Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[scheduler] Add core logic to accomodate monthly tasks limit #1756

Merged
merged 8 commits into from
Jan 20, 2025

Conversation

anshul98ks123
Copy link
Collaborator

@anshul98ks123 anshul98ks123 commented Jan 16, 2025

Issue(s)

thirdeye tasks limit per month

Description

ThirdEye Free Tier Usage Quota

#1751 introduced TaskQuotasConfiguration as part of workspace configuration
This PR adds the core logic to incorporate task quotas in the job scheduler for both DETECTION and NOTIFICATION tasks

It computes, for each workspace, if it has exceeded the DETECTION/NOTIFICATION monthly task quota
If yes, then it stops the currently running jobs and does not let any new job get scheduled

Testing

e2e tests are out of scope, will be done in another PR.

Manually tested locally with following data

Related server.yaml config:

defaultWorkspaceConfiguration:
  ...
  ...
  namespaceQuotasConfiguration:
    taskQuotasConfiguration:
      maximumDetectionTasksPerMonth: 3100
      maximumNotificationTasksPerMonth: 2446
  ...

Tasks Count for Current Month:

Namespace Detection Tasks Notification Tasks
null 207 2444
namespace1 7598 2446
namespace3 2452 2446
ws_2exrxop2u3ad 0 0
ws_2ehw4puhqucn 0 0

As per the above data, namespace 1 has exceeded DETECTION tasks quota
and namespace1 and namespace3 have exceeded NOTIFICATION tasks quota
Scheduler should not schedule & should cleanup any tasks for such cases

Scheduler activity in Server log:
server-logs-with-quota.txt

INFO  [2025-01-17 11:03:35,239] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 3664 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:44,358] ai.startree.thirdeye.scheduler.SubscriptionCronScheduler: workspace namespace1 corresponding to SubscriptionGroupDTO with id 7344 has exceeded monthly quota. Skipping scheduling NOTIFICATION job.
INFO  [2025-01-17 11:03:44,359] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 5282 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,425] ai.startree.thirdeye.scheduler.SubscriptionCronScheduler: workspace namespace3 corresponding to SubscriptionGroupDTO with id 10308 has exceeded monthly quota. Skipping scheduling NOTIFICATION job.
INFO  [2025-01-17 11:03:49,425] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 7269 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,425] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 7345 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,425] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 7346 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,434] ai.startree.thirdeye.scheduler.SubscriptionCronScheduler: Scheduled NOTIFICATION job NOTIFICATION_14319
INFO  [2025-01-17 11:03:49,434] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_10309
INFO  [2025-01-17 11:03:49,434] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 10378 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,435] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 11304 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,436] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_11655
INFO  [2025-01-17 11:03:49,436] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 11910 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,437] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_12161
INFO  [2025-01-17 11:03:49,437] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 12297 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,437] ai.startree.thirdeye.scheduler.DetectionCronScheduler: workspace namespace1 corresponding to AlertDTO with id 12552 has exceeded monthly quota. Skipping scheduling DETECTION job.
INFO  [2025-01-17 11:03:49,438] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_12621

Reset the task quotas to null by removing namespaceQuotasConfiguration from server.yaml and using /api/workspace-configuration/reset
All jobs should be scheduled for all namespaces
without any detection/notification task quota check

Scheduler activity in Server log:
server-logs-without-quota.txt

INFO  [2025-01-17 11:29:29,060] ai.startree.thirdeye.scheduler.SubscriptionCronScheduler: Scheduled NOTIFICATION job NOTIFICATION_7344
INFO  [2025-01-17 11:29:29,060] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_3664
INFO  [2025-01-17 11:29:32,668] ai.startree.thirdeye.worker.task.TaskDriverRunnable: Task 82762 NOTIFICATION_7344: executing {"detectionAlertConfigId":7344}
INFO  [2025-01-17 11:29:32,669] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_5282
INFO  [2025-01-17 11:29:32,669] ai.startree.thirdeye.scheduler.SubscriptionCronScheduler: Scheduled NOTIFICATION job NOTIFICATION_10308
INFO  [2025-01-17 11:29:32,670] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_7269
INFO  [2025-01-17 11:29:32,671] ai.startree.thirdeye.scheduler.SubscriptionCronScheduler: Scheduled NOTIFICATION job NOTIFICATION_14319
INFO  [2025-01-17 11:29:32,671] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_7345
INFO  [2025-01-17 11:29:32,672] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_7346
INFO  [2025-01-17 11:29:32,672] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_10309
INFO  [2025-01-17 11:29:32,673] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_10378
INFO  [2025-01-17 11:29:32,674] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_11304
INFO  [2025-01-17 11:29:32,674] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_11655
INFO  [2025-01-17 11:29:32,675] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_11910
INFO  [2025-01-17 11:29:32,675] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_12161
INFO  [2025-01-17 11:29:32,676] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_12297
INFO  [2025-01-17 11:29:32,677] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_12552
INFO  [2025-01-17 11:29:32,678] ai.startree.thirdeye.scheduler.DetectionCronScheduler: Scheduled DETECTION job DETECTION_12621

Copy link

vercel bot commented Jan 16, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
thirdeye ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 20, 2025 10:43am

Predicate.EQ("namespace", namespace),
Predicate.EQ("type", taskType),
Predicate.GE("startTime", startOfMonthTimestamp),
Predicate.LE("startTime", endOfMonthTimestamp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Predicate.LE("startTime", endOfMonthTimestamp)
looks incorrect, was it endTime ?
btw do we really need this ?

my understanding is
Predicate.GE("startTime", startOfMonthTimestamp)
is enough
we don't create tasks in the future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I think we should use creation time created not startTime

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. changed to just createdTime > first day of month

@@ -97,43 +119,99 @@ public void shutdown() throws SchedulerException {
}

private void updateSchedules() throws SchedulerException {

final Supplier<HashMap<String, Boolean>> namespaceToQuotaExceededMap = scheduledRefreshSupplier(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be created as a field once, here it is created at every updateSchedules call

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call it cachedNamespaceToQuotaExceeded

not a fan of suffixing the type of the object ("Map") in Java, Java is statically typed and IDE can easily tell what's the type.
also xToY is pretty standard for a Map.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also when returning cache map, it may be a good practice to ensure this map is immutable
else a cache consumer could edit it.
we could do

scheduledRefreshSupplier(() -> Map.copyOf(getNamespaceToQuotaExceededMap()))

but see my comment below
I think we could just have getNamespaceToQuotaExceededMap return an ImmutableMap

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
} catch (final Exception e) {
log.error("Error removing job key {}", jobKey, e);
}
}
}

private void schedule(final E entity) {
private HashMap<String, Boolean> getNamespaceToQuotaExceededMap() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use Map for the interface

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
} catch (final Exception e) {
log.error("Error removing job key {}", jobKey, e);
}
}
}

private void schedule(final E entity) {
private HashMap<String, Boolean> getNamespaceToQuotaExceededMap() {
HashMap<String, Boolean> m = new HashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we want to return an Immutable Map.

Can be done with

  • guava ImmutableMap
  • using a stream
  • using Map.copyOf()

see my comment above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done using Map.copyOf()

@@ -103,37 +128,92 @@ private void updateSchedules() throws SchedulerException {
// also only fetch only active entities directly and remove is active from Schedulable interface
final List<E> allEntities = entityDao.findAll();

final HashMap<String, Boolean>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon i change every type from HashMap<String, Boolean> to Map<String, Boolean>
and make getNamespaceToQuotaExceededMap() return an immutable map

namespaceToQuotaExceededSupplier.get() keeps returning null map

not sure if i'm missing something totally basic?
@cyrilou242

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, fixed it

it was because null key was causing issue in immutable map, resulting into NPE when calling Map.copyOf()
but hashmap handles null string key well

final Map<Long, E> idToEntity = allEntities.stream()
.collect(Collectors.toMap(AbstractDTO::getId, e -> e));
final Set<JobKey> scheduledJobKeys = scheduler.getJobKeys(groupMatcher);
for (final JobKey jobKey : scheduledJobKeys) {
try {
final Long id = getIdFromJobKey(jobKey);
final E entity = idToEntity.get(id);
final String entityNamespace = nonNullNamespace(entity.namespace());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will throw an NPE if entity is null

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see null check below

@@ -42,7 +57,9 @@
import org.slf4j.LoggerFactory;

public class TaskCronSchedulerRunnable<E extends AbstractDTO> implements Runnable {


private static final String NULL_NAMESPACE_KEY = "__null__";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
See how it's implemented here:

private static final String NULL_NAMESPACE_KEY = "__NULL_NAMESPACE_" + RandomStringUtils.randomAlphanumeric(10).toUpperCase();

in case someone decides to use __null__ as a namespace name.
so I prefer to user a key with a random value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, i see. yes, this should be handled. on it!

@anshul98ks123 anshul98ks123 merged commit 713d3b4 into master Jan 20, 2025
12 checks passed
@anshul98ks123 anshul98ks123 deleted the TE-thirdeye-quota-logic branch January 20, 2025 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants