dask workers can be scheduled on hub pods with default config #59

scottyhq · 2019-07-15T20:24:10Z

Our current setup allows for dask pods on hub nodes:
https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/dask_config.yaml

This seems to be due to 'prefer' rather than 'require' when scheduling:
https://github.com/dask/dask-kubernetes/blob/ec4666a4af5acad03c24b84aca4fcf8ccd791b4f/dask_kubernetes/objects.py#L177

which results in the following for pods:

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: k8s.dask.org/node-purpose
            operator: In
            values:
            - worker
        weight: 100

not sure how we modify the config file to get the stricter 'require' condition like we have for notebook pods:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: k8s.dask.org/node-purpose
            operator: In
            values:
            - worker

@jhamman , @TomAugspurger

The text was updated successfully, but these errors were encountered:

jhamman · 2019-07-16T03:35:23Z

If you want to keep non-core pods off your core (hub) pool, you need to add a taint that only core pods can tolerate. I tend to just size the core pool to the smallest possible size to fit the hub pods. If you don't leave space, things wont try to schedule there. You can also up the node purpose scheduling requirements for dask pods, but in my experience, this is unnecessary.

For posterity, I should also link to this blog post that describes all of this in more detail: https://medium.com/pangeo/pangeo-cloud-cluster-design-9d58a1bf1ad3

scottyhq · 2019-07-16T04:58:57Z

@jhamman - i'm thinking we might want the core pool to autoscale eventually if we try to consolidate multiple hubs on a single EKS cluster. If we add a taint to the core pool, it seems like pods in the kube-system namespace might have trouble (for example aws-node, tiller-deploy, cluster-autoscaler).

Another approach is to expose match_node_purpose="require" in https://github.com/dask/dask-kubernetes/blob/ec4666a4af5acad03c24b84aca4fcf8ccd791b4f/dask_kubernetes/objects.py#L177

TomAugspurger · 2019-07-16T13:03:11Z

@jhamman is there a downside to the hard affinity (at least optionally)? It couldn't be the default, but it seems useful as an option.

TomAugspurger · 2019-07-16T13:49:44Z

FYI, rather than exposing it as a config / parameter in KubeCluster, we could document how to achieve it.

kind: Pod
metadata:
  labels:
    foo: bar
spec:
  restartPolicy: Never
  containers:
  - image: daskdev/dask:latest
    imagePullPolicy: IfNotPresent
    args: [dask-worker, --nthreads, '2', --no-bokeh, --memory-limit, 6GB, --death-timeout, '60']
    name: dask
    resources:
      limits:
        cpu: "2"
        memory: 6G
      requests:
        cpu: "2"
        memory: 6G
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          key: k8s.dask.org/node-purpose
          operator: In
          values:
            - worker

On master, that'll result in both the preferred and required affinity types being applied.

>>> a.pod_template.spec.affinity.node_affinity
{'preferred_during_scheduling_ignored_during_execution': [{'preference': {'match_expressions': [{'key': 'k8s.dask.org/node-purpose',
                                                                                                 'operator': 'In',
                                                                                                 'values': ['worker']}],
                                                                          'match_fields': None},
                                                           'weight': 100}],
 'required_during_scheduling_ignored_during_execution': {'node_selector_terms': [{'match_expressions': None,
                                                                                  'match_fields': None}]}}

I'm not sure how Kubernetes will handle that (presumably it's fine, just not the cleanest). Right now my preference would be to add a config option / argument to KubeCluster that's passed through to clean_pod_template, but I may be missing some context.

Closes pangeo-data/pangeo-stacks#59

jhamman · 2019-07-16T21:23:01Z

@jhamman is there a downside to the hard affinity (at least optionally)?

Not really. I think this is a fine approach. Of course, there is not way to enforce that users follow this pattern so dask workers may still end up in your core pool with this approach.

jhamman · 2019-08-02T18:52:40Z

In thinking about this a little more, it may be easier for some to simply add a taint to the core pool that the hub and ingress pods can tolerate.

scottyhq · 2019-09-17T18:22:59Z

In thinking about this a little more, it may be easier for some to simply add a taint to the core pool that the hub and ingress pods can tolerate.

@jhamman are you doing this now on the google clusters?

jhamman · 2019-09-17T18:30:58Z

No. Not yet, but we could.

bgroenks96 · 2019-12-08T15:05:14Z

If you don't feel like modifying all of the JupyterHub services' configurations to include the toleration, this can also be accomplished by 1) adding a taint to the worker pools to prevent scheduling from core services, with corresponding tolerances added to worker pods and 2) adding a node selector to the worker pods with corresponding labels on the worker nodes. This will pretty much guarantee that everything ends up on the right nodes without having to taint/tolerate the core services.

TomAugspurger added a commit to TomAugspurger/dask-kubernetes that referenced this issue Jul 16, 2019

API: Add keyword for match_node_purpose

49cd493

Closes pangeo-data/pangeo-stacks#59

TomAugspurger mentioned this issue Jul 16, 2019

API: Add keyword for match_node_purpose dask/dask-kubernetes#164

Closed

scottyhq mentioned this issue Mar 18, 2020

Fix tolerations on gateway dask worker pods pangeo-data/pangeo-cloud-federation#567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dask workers can be scheduled on hub pods with default config #59

dask workers can be scheduled on hub pods with default config #59

scottyhq commented Jul 15, 2019 •

edited

Loading

jhamman commented Jul 16, 2019

scottyhq commented Jul 16, 2019

TomAugspurger commented Jul 16, 2019

TomAugspurger commented Jul 16, 2019

jhamman commented Jul 16, 2019

jhamman commented Aug 2, 2019

scottyhq commented Sep 17, 2019

jhamman commented Sep 17, 2019

bgroenks96 commented Dec 8, 2019 •

edited

Loading

dask workers can be scheduled on hub pods with default config #59

dask workers can be scheduled on hub pods with default config #59

Comments

scottyhq commented Jul 15, 2019 • edited Loading

jhamman commented Jul 16, 2019

scottyhq commented Jul 16, 2019

TomAugspurger commented Jul 16, 2019

TomAugspurger commented Jul 16, 2019

jhamman commented Jul 16, 2019

jhamman commented Aug 2, 2019

scottyhq commented Sep 17, 2019

jhamman commented Sep 17, 2019

bgroenks96 commented Dec 8, 2019 • edited Loading

scottyhq commented Jul 15, 2019 •

edited

Loading

bgroenks96 commented Dec 8, 2019 •

edited

Loading