Skip to content

Amazon Elastic Kubernetes Service setup

Fernando Barreiro edited this page May 17, 2021 · 18 revisions

Actions to be performed as the AWS administrator on a private/trusted computer

Setting up the CLI

In our experience the most straightforward way to manage EKS clusters is through command line rather than through the web interface. Download aws and eksctl through the official web (a useful tutorial can be found here https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html). This command will ask you for the account credentials. For creating the cluster you need a user with enough privileges in AWS. You should run this step from your private/trusted computer.

aws configure

Create the cluster and other useful operations

  • To create a new EKS cluster you need to install the eksctl command line tool. A sample command to generate the cluster is:
eksctl create cluster --name test1 --version 1.16 --region us-west-2 --nodegroup-name standard-workers --node-type t3.2xlarge --nodes 1 --nodes-min 1 --nodes-max 2 --managed --vpc-nat-mode Disable

We have not tried it, but it seems you can specify where to write directly the kubeconfig file once the cluster has been created with

--kubeconfig <PATH>
  • Creating a new nodegroup in the cluster to change e.g. disk space
eksctl create nodegroup --cluster test1 --region us-west-2 --name workers-130GB --node-type t3.2xlarge --nodes 1 --nodes-min 1 --nodes-max 2 --managed --node-volume-size 130
[root@aipanda169 ~]# cat spot_nodegroups.yml
apiVersion: eksctl.io/v1alpha5
Kind: ClusterConfig
metadata:
    name: test1
    region: us-west-2
nodeGroups:
  - name: ng-capacity-optimized
    minSize: 1
    maxSize: 3
    volumeSize: 170
    instancesDistribution:
      maxPrice: 0.1
      instanceTypes: ["t3.2xlarge"] # At least one instance type should be specified
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotAllocationStrategy: "capacity-optimized"
    labels:
      lifecycle: Ec2Spot
      aws.amazon.com/spot: "true"
    tags:
      k8s.io/cluster-autoscaler/node-template/label/lifecycle: Ec2Spot

[root@aipanda169 ~]# eksctl create nodegroup -f spot_nodegroups.yml
...

Since 1 Dec 2020 it's possible to boot MANAGED spot node groups. This mode is currently under evaluation:

eksctl create nodegroup --cluster atlas-harvester --instance-types m5.2xlarge --node-volume-size 170 --managed --spot --name spot-managed --asg-access --nodes 1 --nodes-min 1 --nodes-max 2 --node-labels role=worker
  • Changing the size of a nodegroup or deleting a nodegroup
eksctl scale nodegroup --cluster=test1 --name=ng-capacity-optimized --nodes-min=1 --nodes=1
eksctl delete nodegroup --cluster test1 workers-130GB
  • Delete a cluster
eksctl delete cluster --region=us-west-1 --name=prod

Install CVMFS and Frontier on the cluster

Follow the CVMFS installation section here

Creating a service account for Harvester and adding permissions

At this point we need to create a service account for Harvester with less privileges than your "master" account. You create a new user with "Programmatic access" from the AWS web UI, no roles/policies need to be added.

Still with your own account, you will add the service account to the aws-auth file of the kubernetes cluster. The userarn/username information is available in the IAM > Users category of the AWS web UI.

We are going to create the RBAC permissions for the service account:

>>> vi harvester-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: harvester-role
  namespace: default
rules:
- apiGroups: [""]
  resources: ["secrets","jobs","pods"]
  verbs: ["*"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: harvester-rolebinding
roleRef:
  kind: Role
  name: harvester-role
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: Group
    name: harvester-group
    apiGroup: rbac.authorization.k8s.io
>>> kubectl create -f harvester-role.yaml

These permissions are a minimum, but you might want to consider some more permissions to allow the harvester admin do some debugging. For example there are no permissions on "persistenVolumeClaims","persistenVolumes","storageClasses", which are needed for CVMFS driver setup/debugging.

Now we are going to add the role to our service account:

>>> kubectl edit -n kube-system configmap/aws-auth
...
apiVersion: v1
data:
  mapRoles: |
...EDITED OUT...
  mapUsers: |
    - userarn: arn:aws:iam::304642596360:user/harvester
      username: harvester
      groups:
      - harvester-group
kind: ConfigMap
metadata:
...EDITED OUT...  

Now you should be able to perform the typical kubectl commands on jobs/pods/secrets on the default namespace, but not be able to see anything outside.

More documentation can be found here:

Actions to be performed as the AWS service account on the Harvester node

Get the kubeconfig file for harvester

Now on the Harvester node, you need to install aws and log in with the service account information. We gave this account permissions only in the cluster, but can't otherwise manage any of your other resources.

aws configure

Now you can download the kubeconfig file:

export KUBECONFIG=<WHERE YOU WANT THE KUBECONFIG FILE>
aws eks --region us-west-2 update-kubeconfig --name test1

The cluster is ready to be configured in the Harvester queue config file

Known issues

Old systemd

Amazon images use an old version of systemd that does not clean up correctly mounts after pods have finished. After some days of running jobs, pods will stay in ContainerCreating status with a description similar as below.

kubectl describe pods <POD ID YOU SAW STUCK IN CONTAINERCREATING>
...
Events:
  Type     Reason             Age   From                                                  Message
  ----     ------             ----  ----                                                  -------
  Warning  FailedScheduling   35m   default-scheduler                                     0/19 nodes are available: 19 Insufficient cpu.
  Normal   NotTriggerScaleUp  35m   cluster-autoscaler                                    pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 in backoff after failed scale-up
  Normal   Scheduled          35m   default-scheduler                                     Successfully assigned default/grid-job-6452376-sx8wq to ip-192-168-27-39.us-west-2.compute.internal
  Warning  FailedMount        35m   kubelet, ip-192-168-27-39.us-west-2.compute.internal  MountVolume.SetUp failed for volume "default-token-h62rn" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~secret/default-token-h62rn --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~secret/default-token-h62rn
Output: Failed to start transient scope unit: Argument list too long
  Warning  FailedMount  35m (x2 over 35m)  kubelet, ip-192-168-27-39.us-west-2.compute.internal  MountVolume.SetUp failed for volume "cvmfs-config-sft" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-sft --scope -- mount  -o bind /cvmfs-k8s/sft.cern.ch /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-sft
Output: Failed to start transient scope unit: Argument list too long
  Warning  FailedMount  33m                 kubelet, ip-192-168-27-39.us-west-2.compute.internal  Unable to attach or mount volumes: unmounted volumes=[sft-nightlies unpacked], unattached volumes=[sft-nightlies default-token-h62rn atlas-condb sft proxy-secret atlas-nightlies unpacked pilots-starter atlas grid]: timed out waiting for the condition
  Warning  FailedMount  30m                 kubelet, ip-192-168-27-39.us-west-2.compute.internal  Unable to attach or mount volumes: unmounted volumes=[unpacked sft-nightlies], unattached volumes=[sft unpacked pilots-starter default-token-h62rn sft-nightlies grid proxy-secret atlas-nightlies atlas atlas-condb]: timed out waiting for the condition
  Warning  FailedMount  28m                 kubelet, ip-192-168-27-39.us-west-2.compute.internal  Unable to attach or mount volumes: unmounted volumes=[unpacked sft-nightlies], unattached volumes=[unpacked sft pilots-starter default-token-h62rn sft-nightlies grid atlas atlas-condb atlas-nightlies proxy-secret]: timed out waiting for the condition
  Warning  FailedMount  26m                 kubelet, ip-192-168-27-39.us-west-2.compute.internal  Unable to attach or mount volumes: unmounted volumes=[sft-nightlies unpacked], unattached volumes=[grid proxy-secret atlas pilots-starter atlas-nightlies sft sft-nightlies unpacked atlas-condb default-token-h62rn]: timed out waiting for the condition
  Warning  FailedMount  24m                 kubelet, ip-192-168-27-39.us-west-2.compute.internal  Unable to attach or mount volumes: unmounted volumes=[unpacked sft-nightlies], unattached volumes=[unpacked grid atlas-nightlies pilots-starter atlas proxy-secret atlas-condb sft sft-nightlies default-token-h62rn]: timed out waiting for the condition
  Warning  FailedMount  2m8s (x5 over 32m)  kubelet, ip-192-168-27-39.us-west-2.compute.internal  MountVolume.SetUp failed for volume "cvmfs-config-sft-nightlies" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-sft-nightlies --scope -- mount  -o bind /cvmfs-k8s/sft-nightlies.cern.ch /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-sft-nightlies
Output: Failed to start transient scope unit: Argument list too long
  Warning  FailedMount  78s (x13 over 21m)  kubelet, ip-192-168-27-39.us-west-2.compute.internal  (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[unpacked sft-nightlies], unattached volumes=[atlas-condb unpacked atlas-nightlies pilots-starter atlas grid default-token-h62rn sft sft-nightlies proxy-secret]: timed out waiting for the condition
  Warning  FailedMount  6s (x4 over 30m)    kubelet, ip-192-168-27-39.us-west-2.compute.internal  MountVolume.SetUp failed for volume "cvmfs-config-unpacked" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-unpacked --scope -- mount  -o bind /cvmfs-k8s/unpacked.cern.ch /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-unpacked
Output: Failed to start transient scope unit: Argument list too long

One possible solution is to schedule a cronjob that marks these nodes as unhealthy, so that AWS removes them: https://github.com/HSF/harvester/blob/master/pandaharvester/harvestercloud/aws_unhealthy_nodes.py

*/30 * * * * atlpan  /opt/harvester/bin/python /data/atlpan/k8_configs/maintenance/aws_unhealthy_nodes.py  > /tmp/aws 2>&1
Clone this wiki locally