-
Notifications
You must be signed in to change notification settings - Fork 16
Amazon Elastic Kubernetes Service setup
In our experience the most straightforward way to manage EKS clusters is through command line rather than through the web interface. Download aws and eksctl through the official web (a useful tutorial can be found here https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html). This command will ask you for the account credentials. For creating the cluster you need a user with enough privileges in AWS. You should run this step from your private/trusted computer.
aws configure
- To create a new EKS cluster you need to install the eksctl command line tool. A sample command to generate the cluster is:
eksctl create cluster --name test1 --version 1.16 --region us-west-2 --nodegroup-name standard-workers --node-type t3.2xlarge --nodes 1 --nodes-min 1 --nodes-max 2 --managed --vpc-nat-mode Disable
We have not tried it, but it seems you can specify where to write directly the kubeconfig file once the cluster has been created with
--kubeconfig <PATH>
- Creating a new nodegroup in the cluster to change e.g. disk space
eksctl create nodegroup --cluster test1 --region us-west-2 --name workers-130GB --node-type t3.2xlarge --nodes 1 --nodes-min 1 --nodes-max 2 --managed --node-volume-size 130
- Adding a spot nodegroup. If you want it to use autoscaling first read this: https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html
[root@aipanda169 ~]# cat spot_nodegroups.yml
apiVersion: eksctl.io/v1alpha5
Kind: ClusterConfig
metadata:
name: test1
region: us-west-2
nodeGroups:
- name: ng-capacity-optimized
minSize: 1
maxSize: 3
volumeSize: 170
instancesDistribution:
maxPrice: 0.1
instanceTypes: ["t3.2xlarge"] # At least one instance type should be specified
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
spotAllocationStrategy: "capacity-optimized"
labels:
lifecycle: Ec2Spot
aws.amazon.com/spot: "true"
tags:
k8s.io/cluster-autoscaler/node-template/label/lifecycle: Ec2Spot
[root@aipanda169 ~]# eksctl create nodegroup -f spot_nodegroups.yml
...
Since 1 Dec 2020 it's possible to boot MANAGED spot node groups. This mode is currently under evaluation:
eksctl create nodegroup --cluster atlas-harvester --instance-types m5.2xlarge --node-volume-size 170 --managed --spot --name spot-managed --asg-access --nodes 1 --nodes-min 1 --nodes-max 2 --node-labels role=worker
- Changing the size of a nodegroup or deleting a nodegroup
eksctl scale nodegroup --cluster=test1 --name=ng-capacity-optimized --nodes-min=1 --nodes=1
eksctl delete nodegroup --cluster test1 workers-130GB
- Delete a cluster
eksctl delete cluster --region=us-west-1 --name=prod
Follow the CVMFS installation section here
At this point we need to create a service account for Harvester with less privileges than your "master" account. You create a new user with "Programmatic access" from the AWS web UI, no roles/policies need to be added.
Still with your own account, you will add the service account to the aws-auth file of the kubernetes cluster. The userarn/username information is available in the IAM > Users category of the AWS web UI.
We are going to create the RBAC permissions for the service account:
>>> vi harvester-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: harvester-role
namespace: default
rules:
- apiGroups: [""]
resources: ["secrets","jobs","pods"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: harvester-rolebinding
roleRef:
kind: Role
name: harvester-role
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: Group
name: harvester-group
apiGroup: rbac.authorization.k8s.io
>>> kubectl create -f harvester-role.yaml
These permissions are a minimum, but you might want to consider some more permissions to allow the harvester admin do some debugging. For example there are no permissions on "persistenVolumeClaims","persistenVolumes","storageClasses", which are needed for CVMFS driver setup/debugging.
Now we are going to add the role to our service account:
>>> kubectl edit -n kube-system configmap/aws-auth
...
apiVersion: v1
data:
mapRoles: |
...EDITED OUT...
mapUsers: |
- userarn: arn:aws:iam::304642596360:user/harvester
username: harvester
groups:
- harvester-group
kind: ConfigMap
metadata:
...EDITED OUT...
Now you should be able to perform the typical kubectl commands on jobs/pods/secrets on the default namespace, but not be able to see anything outside.
More documentation can be found here:
- https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
- https://kubernetes.io/docs/reference/access-authn-authz/rbac/#default-roles-and-role-bindings
Now on the Harvester node, you need to install aws and log in with the service account information. We gave this account permissions only in the cluster, but can't otherwise manage any of your other resources.
aws configure
Now you can download the kubeconfig file:
export KUBECONFIG=<WHERE YOU WANT THE KUBECONFIG FILE>
aws eks --region us-west-2 update-kubeconfig --name test1
The cluster is ready to be configured in the Harvester queue config file
Amazon images use an old version of systemd that does not clean up correctly mounts after pods have finished. After some days of running jobs, pods will stay in ContainerCreating
status with a description similar as below.
kubectl describe pods <POD ID YOU SAW STUCK IN CONTAINERCREATING>
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 35m default-scheduler 0/19 nodes are available: 19 Insufficient cpu.
Normal NotTriggerScaleUp 35m cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 in backoff after failed scale-up
Normal Scheduled 35m default-scheduler Successfully assigned default/grid-job-6452376-sx8wq to ip-192-168-27-39.us-west-2.compute.internal
Warning FailedMount 35m kubelet, ip-192-168-27-39.us-west-2.compute.internal MountVolume.SetUp failed for volume "default-token-h62rn" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~secret/default-token-h62rn --scope -- mount -t tmpfs tmpfs /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~secret/default-token-h62rn
Output: Failed to start transient scope unit: Argument list too long
Warning FailedMount 35m (x2 over 35m) kubelet, ip-192-168-27-39.us-west-2.compute.internal MountVolume.SetUp failed for volume "cvmfs-config-sft" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-sft --scope -- mount -o bind /cvmfs-k8s/sft.cern.ch /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-sft
Output: Failed to start transient scope unit: Argument list too long
Warning FailedMount 33m kubelet, ip-192-168-27-39.us-west-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[sft-nightlies unpacked], unattached volumes=[sft-nightlies default-token-h62rn atlas-condb sft proxy-secret atlas-nightlies unpacked pilots-starter atlas grid]: timed out waiting for the condition
Warning FailedMount 30m kubelet, ip-192-168-27-39.us-west-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[unpacked sft-nightlies], unattached volumes=[sft unpacked pilots-starter default-token-h62rn sft-nightlies grid proxy-secret atlas-nightlies atlas atlas-condb]: timed out waiting for the condition
Warning FailedMount 28m kubelet, ip-192-168-27-39.us-west-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[unpacked sft-nightlies], unattached volumes=[unpacked sft pilots-starter default-token-h62rn sft-nightlies grid atlas atlas-condb atlas-nightlies proxy-secret]: timed out waiting for the condition
Warning FailedMount 26m kubelet, ip-192-168-27-39.us-west-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[sft-nightlies unpacked], unattached volumes=[grid proxy-secret atlas pilots-starter atlas-nightlies sft sft-nightlies unpacked atlas-condb default-token-h62rn]: timed out waiting for the condition
Warning FailedMount 24m kubelet, ip-192-168-27-39.us-west-2.compute.internal Unable to attach or mount volumes: unmounted volumes=[unpacked sft-nightlies], unattached volumes=[unpacked grid atlas-nightlies pilots-starter atlas proxy-secret atlas-condb sft sft-nightlies default-token-h62rn]: timed out waiting for the condition
Warning FailedMount 2m8s (x5 over 32m) kubelet, ip-192-168-27-39.us-west-2.compute.internal MountVolume.SetUp failed for volume "cvmfs-config-sft-nightlies" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-sft-nightlies --scope -- mount -o bind /cvmfs-k8s/sft-nightlies.cern.ch /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-sft-nightlies
Output: Failed to start transient scope unit: Argument list too long
Warning FailedMount 78s (x13 over 21m) kubelet, ip-192-168-27-39.us-west-2.compute.internal (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[unpacked sft-nightlies], unattached volumes=[atlas-condb unpacked atlas-nightlies pilots-starter atlas grid default-token-h62rn sft sft-nightlies proxy-secret]: timed out waiting for the condition
Warning FailedMount 6s (x4 over 30m) kubelet, ip-192-168-27-39.us-west-2.compute.internal MountVolume.SetUp failed for volume "cvmfs-config-unpacked" : mount failed: exit status 1
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-unpacked --scope -- mount -o bind /cvmfs-k8s/unpacked.cern.ch /var/lib/kubelet/pods/b29c3945-a603-412d-88e6-2c1fe95eb6bb/volumes/kubernetes.io~local-volume/cvmfs-config-unpacked
Output: Failed to start transient scope unit: Argument list too long
One possible solution is to schedule a cronjob that marks these nodes as unhealthy, so that AWS removes them: https://github.com/HSF/harvester/blob/master/pandaharvester/harvestercloud/aws_unhealthy_nodes.py
*/30 * * * * atlpan /opt/harvester/bin/python /data/atlpan/k8_configs/maintenance/aws_unhealthy_nodes.py > /tmp/aws 2>&1
Getting started |
---|
Installation and configuration |
Testing and running |
Debugging |
Work with Middleware |
Admin FAQ |
Development guides |
---|
Development workflow |
Tagging |
Production & commissioning |
---|
Scale up submission |
Condor experiences |
Commissioning on the grid |
Production servers |
Service monitoring |
Auto Queue Configuration with CRIC |
SSH+RPC middleware setup |
Kubernetes section |
---|
Kubernetes setup |
X509 credentials |
AWS setup |
GKE setup |
CERN setup |
CVMFS installation |
Generic service accounts |
Advanced payloads |
---|
Horovod integration |