Skip to content

ognif/simphera-reference-architecture-aws-main

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIMPHERA Reference Architecture for AWS

This repository contains the reference architecture of the infrastructure needed to deploy dSPACE SIMPHERA to AWS. It does not contain the helm chart needed to deploy SIMPHERA itself, but only the base infrastructure such as Kubernetes, PostgreSQL, storage accounts, etc.

You can use the reference architecture as a starting point for your SIMPHERA installation if you plan to deploy SIMPHERA to AWS. You can use the reference architecture as it is and only have to configure few individual values. If you have special requirements feel free to adapt the architecture to your needs. For example, the reference architecture does not contain any kind of VPN connection to a private, on-premise network because this is highly user specific. But the reference architecture is configured in such a way that the ingress points are available in the public internet.

Using the reference architecture you can deploy a single or even multiple instances of SIMPHERA, e.g. one for production and one for testing.

Architecture

The following figure shows the main resources of the architecture: SIMPHERA Reference Architecture for AWS The main building brick of the SIMPHERA reference architecture for AWS is the Amazon EKS cluster. The cluster contains two auto scaling groups: The first group is reserved for SIMPHERA services and other auxiliary third-party services like Keycloak, nginx, etc. The second group is for the executors that perform the testing of the system under test. The data for SIMPHERA projects is stored in a Amazon RDS PostgreSQL instance. Keycloak stores SIMPHERA users in a separate Amazon RDS PostgreSQL instance. Executors need licenses to execute tests and simulations. They obtain the licenses from a license server. The license server is deployed on an EC2 instance. Project files and test results are stored in an non-public Amazon S3 bucket. For the initial setup of the license server, several files need to be exchanged between an administration PC and the license server. These files are exchanged via an non-public S3 bucket that can be read and written from the administration PC and the license server. A detailed list of the AWS resources that are mandatory/optional for the operation of SIMPHERA can be found in the AWSCloudSpec.

Billable Resources and Services

Charges may apply for the following AWS resources and services:

Service Description Mandatory?
Amazon Elastic Kubernetes Service A Kubernetes cluster is required to run SIMPHERA. Yes
Amazon Virtual Private Cloud Virtual network for SIMPHERA. Yes
Elastic Load Balancing SIMPHERA uses a network load balancer. Yes
Amazon EC2 Auto Scaling SIMPHERA automatically scales compute nodes if the capacity is exhausted. Yes
Amazon Relational Database Project and authorization data is stored in Amazon RDS for PostgreSQL instances. Yes
Amazon Simple Storage Service Binary artifacts are stored in an S3 bucket. Yes
Amazon Elastic File System Binary artifacts are stored temporarily in EFS. Yes
AWS Key Management Service (AWS KMS) Encryption for Kubernetes secrets is enabled by default.
Amazon Elastic Compute Cloud Optionally, you can deploy a dSPACE license server on an EC2 instance. Alternatively, you can deploy the server on external infrastructure.
Amazon CloudWatch Metrics and container logs to CloudWatch. It is recommended to deploy the dSPACE monitoring stack in Kubernetes.

Usage Instructions

To create the AWS resources that are required for operating SIMPHERA, you need to accomplish the following tasks:

  1. install Terraform on your local administration PC
  2. register an AWS account where the resources needed for SIMPHERA are created
  3. create an IAM user with least privileges required to create the resources for SIMPHERA
  4. create security credentials for that IAM user
  5. request service quota increase for gpu instances if needed
  6. create non-public S3 bucket for Terraform state
  7. create IAM policy that gives the IAM user access to the S3 bucket
  8. clone this repository onto your local administration PC
  9. create Secrets manager secrets
  10. adjust Terraform variables
  11. apply Terraform configuration
  12. connect to the Kubernetes cluster

Install Terraform

This reference architecture is provided as a Terraform configuration. Terraform is an open-source command line tool to automatically create and manage cloud resources. A Terraform configuration consists of various .tf text files. These files contain the specifications of the resources to be created in the cloud infrastructure. That is the reason why this approach is called infrastructure-as-code. The main advantage of this approach is reproducibility because the configuration can be mainted in a source control system such as Git.

Terraform uses variables to make the specification configurable. The concrete values for these variables are specified in .tfvars files. So it is the task of the administrator to fill the .tfvars files with the correct values. This is explained in more detail in a later chapter.

Terraform has the concept of a state. On the one hand side there are the resource specifications in the .tf files. On the other hand there are the resources in the cloud infrastructure that are created based on these files. Terraform needs to store mapping information which element of the specification belongs to which resource in the cloud infrastructure. This mapping is called the state. In general you could store the state on your local hard drive. But that is not a good idea because in that case nobody else could change some settings and apply these changes. Therefore the state itself should be stored in the cloud.

This reference architecture has been tested with Terraform version v1.1.7.

Request service quota for gpu computing instances

If you want to run AURELION with your SIMPHERA solution, you need to add gpu instances to your cluster.

In case you want to add a gpu node pool to your AWS infrastructure, you might have to increase the quota for the gpu instance type you have selected. Per default, the SIMPHERA Reference Architecture for AWS uses p3.2xlarge instances. The quota Running On-Demand P instances sets the maximum number of vCPUs assigned to the Running On-Demand P instances for a specific AWS region. Every p3.2xlarge instance has 8 vCPUs, which is why the quota has to be at least 8 for the AWS region where you want to deploy the instances.

Create Security Credentials

You can create security credentials for that IAM user with the AWS console. Terraform uses these security credentials to create AWS resources on your behalf.

On your administration PC you need to install the Terraform command and the AWS CLI. To configure your aws account run the following command:

aws configure --profile <profile-name>

AWS Access Key ID [None]: *********
AWS Secret Access Key [None]: *******
Default region name [None]: eu-central-1
Default output format [None]: json

If you have been provided with session token, you can add it via following command:

aws configure set aws_session_token "<your_session_token>" --profile <profile-name>

Access credentials are typically stored in ~/.aws/credentials and configurations in ~/.aws/config. There are various ways on how to authenticate, to run Terraform. This depends on your specific setup.

Verify connectivity and your access credentials by executing following command:

aws sts get-caller-identity

{
    "UserId": "REWAYDCFMNYCPKCWRZEHT:[email protected]",
    "Account": "592245445799",
    "Arn": "arn:aws:sts::592245445799:assumed-role/AWSReservedSSO_AdministratorAccess_vmcbaym7ueknr9on/[email protected]"
}

Create State Bucket

As mentioned before, Terraform stores the state of the resources it creates within an S3 bucket. The bucket name needs to be globally unique.

After you have created the bucket, you need to link it with Terraform: To do so, please make a copy of the file state-backend-template, name it state-backend.tf and open the file in a text editor. The values have to point to an existing storage account to be used to store the Terraform state:

terraform {
  backend "s3" {
    #The name of the bucket to be used to store the terraform state. You need to create this container manually.
    bucket = "terraform-state"

    #The name of the file to be used inside the container to be used for this terraform state.
    key    = "simphera.tfstate"
    
    #The region of the bucket (same region as your deployment, ie. var.region).
    region = "eu-central-1"
  }
}

Important: It is highly recommended to enable server-side encryption of the state file. Encryption is not enabled per default.

Create IAM Policy for State Bucket

Create the following IAM policy for accessing the Terraform state bucket and assign it to the IAM user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "<your_account_arn>"
            },            
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::terraform-state"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "<your_account_arn>"
            },            
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::terraform-state/<storage_key_state_backend>"
        }
    ]
}

Your account ARN (Amazon Resource Number) is in the output of aws sts get-caller-identity command.

Create Secrets Manager Secrets

Username and password for the PostgreSQL databases are stored in AWS Secrets Manager. Before you let Terraform create AWS resources, you need to manually create a Secrets Manager secret that stores the username and password. It is recommended to create individual secrets per SIMPHERA instance (e.g. production and staging instance). To create the secret, open the Secrets Manager console and click the button Store a new secret. As secret type choose Other type of secret. The password must contain from 8 to 128 characters and must not contain any of the following: / (slash), '(single quote), "(double quote) and @ (at sign). Open the Plaintext tab and paste the following JSON object and enter your usernames and passwords:

{
  "postgresql_password": "<your password>"
}

Alternatively, you can create the secret with the following Powershell script:

$region = "<your region>"
$postgresqlCredentials = @"
{
    "postgresql_password" : "<your password>"
}
"@ | ConvertFrom-Json | ConvertTo-Json -Compress
$postgresqlCredentials = $postgresqlCredentials -replace '([\\]*)"', '$1$1\"'
aws secretsmanager create-secret --name <secret name> --secret-string $postgresqlCredentials --region $region

On the next page you can define a name for the secret. Automatic credentials rotation is currently not supported by SIMPHERA, but you can rotate secrets manually. You have to provide the name of the secret in your Terraform variables. The next section describes how you need to adjust your Terraform variables.

Adjust Terraform Variables

For your configuration, please make a copy of the file terraform.tfvars.example, name it terraform.tfvars and open the file in a text editor. This file contains all variables that are configurable including documentation of the variables. Please adapt the values before you deploy the resources.

simpheraInstances = {
  "production" = {
+    secretname = "<secret name>"
    }
}

Apply Terraform Configuration

Before you can deploy the resources to AWS you have to initialize Terraform:

terraform init

Afterwards you can deploy the resources:

terraform apply

Terraform automatically loads the variables from your terraform.tfvars variable definition file. Installation times may very, but it is expected to take up to 30 min to complete the deployment. It is recommended to use AWS admin account, or ask your AWS administrator to assign necessary IAM roles and permissions to your user.

Destroy Infrastructure

Resources that contain data, i.e. the databases, S3 storage, and the recovery points in the backup vault are protected against unintentional deletion. :warning: If you continue with the procedure described in this section, your data will be irretrievably deleted.

Before the backup vault can be deleted, all the continuous recovery points for S3 storage and the databases need to be deleted, for example by using the following Powershell snippet:

$vaults = terraform output backup_vaults | ConvertFrom-Json
$profile = "<profile_name>"
foreach ($vault in $vaults){
  Write-Host "Deleting $vault"
  $recoverypoints = aws backup list-recovery-points-by-backup-vault --profile $profile --backup-vault-name $vault | ConvertFrom-Json
  foreach ($rp in $recoverypoints.RecoveryPoints){
    aws backup delete-recovery-point --profile $profile --backup-vault-name $vault --recovery-point-arn $rp.RecoveryPointArn
  }
  foreach ($rp in $recoverypoints.RecoveryPoints){
    Do  
    {  
      Start-Sleep -Seconds 10
      aws backup describe-recovery-point --profile $profile --backup-vault-name $vault --recovery-point-arn $rp.RecoveryPointArn | ConvertFrom-Json
    } while( $LASTEXITCODE -eq 0)
  }  
  aws backup delete-backup-vault --profile $profile --backup-vault-name $vault
}

Before the databases can be deleted, you need to remove their delete protection:

$databases = terraform output database_identifiers | ConvertFrom-Json
foreach ($db in $databases){
  Write-Host "Deleting database $db"
  aws rds modify-db-instance --profile $profile --db-instance-identifier $db --no-deletion-protection
  aws rds delete-db-instance --profile $profile --db-instance-identifier $db --skip-final-snapshot
}

You can remove the S3 buckets like this:

$buckets = terraform output s3_buckets | ConvertFrom-Json
foreach ($bucket in $buckets){
  aws s3 rb s3://$bucket --force --profile $profile
}

The remaining infrastructure resources can be deleted via Terraform. Due to a bug, Terraform is not able to properly plan the removal of resources in the right order which leads to a deadlock. To workaround the bug, you need to need to remove the eks-addons module at first:

terraform destroy -target="module.eks-addons"

⚠️ It is important that you have completed the preceding steps. Otherwise, the following command will not finish completly, leaving you in a deadlock state.

To delete the remaining resources, run the following command:

terraform destroy

Connect to Kubernetes Cluster

This deployment contains a managed Kubernetes cluster (EKS). In order to use command line tools such as kubectl or helm you need a kubeconfig configuration file. You can update your kubeconfig using the aws cli update-kubeconfig command:

aws eks --region <region> update-kubeconfig --name <cluster_name> --kubeconfig <filename>

Backup and Restore

SIMPHERA stores data in the PostgreSQL database and in S3 buckets (MinIO) that needs to be backed up. AWS supports continuous backups for Amazon RDS for PostgreSQL and S3 that allows point-in-time recovery. Point-in-time recovery lets you restore your data to any point in time within a defined retention period.

This Terraform module creates an AWS backup plan that makes continuous backups of the PostgreSQL database and S3 buckets. The backups are stored in an AWS backup vault per SIMPHERA instance. An IAM role is also automatically created that has proper permissions to create backups. To enable backups for your SIMPHERA instance, make sure you have the flag enable_backup_service et in your .tfvars file:

simpheraInstances = {
  "production" = {
        enable_backup_service    = true
    }
}

Amazon RDS for PostgreSQL

Create an target RDS instance (backup server) that is a copy of a source RDS instance (production server) of a specific point-in-time. The command restore-db-instance-to-point-in-time creates the target database. Most of the configuration settings are copied from the source database. To be able to connect to the target instance the easiest way is to explicitly set the same security group and subnet group as used for the source instance.

Restoring an RDS instance can be done via Powershell as described in the remainder:

aws rds restore-db-instance-to-point-in-time --source-db-instance-identifier simphera-reference-production-simphera --target-db-instance simphera-reference-production-simphera-backup --vpc-security-group-ids sg-0b954a0e25cd11b6d --db-subnet-group-name simphera-reference-vpc --restore-time 2022-06-16T23:45:00.000Z --tags Key=timestamp,Value=2022-06-16T23:45:00.000Z

Execute the following command to create the pgdump pod using the standard postgres image and open a bash:

kubectl run pgdump -ti -n simphera --image postgres --kubeconfig .\kube.config -- bash

In the pod's Bash, use the pg_dump and pg_restore commands to stream the data from the backup server to the production server:

pg_dump -h simphera-reference-production-simphera-backup.cexy8brfkmxk.eu-central-1.rds.amazonaws.com -p 5432 -U dbuser -Fc simpherareferenceproductionsimphera | pg_restore --clean --if-exists -h simphera-reference-production-simphera.cexy8brfkmxk.eu-central-1.rds.amazonaws.com -p 5432 -U dbuser -d simpherareferenceproductionsimphera

Alternatively, you can restore the RDS instance via the AWS console.

S3

This Terraform creates an S3 bucket for project data and results and enables versioning of the S3 bucket which is a requirement for point-in-time recovery.

To restore the S3 buckets to an older version you need to create an IAM role that has proper permissions:

$rolename = "restore-role"
$trustrelation = @"
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": ["sts:AssumeRole"],
      "Effect": "allow",
      "Principal": {
        "Service": ["backup.amazonaws.com"]
      }
    }
  ]
}
"@

echo $trustrelation > trust.json

aws iam create-role --role-name $rolename --assume-role-policy-document file://trust.json --description "Role to restore"

aws iam attach-role-policy --role-name $rolename --policy-arn="arn:aws:iam::aws:policy/AWSBackupServiceRolePolicyForS3Restore"

aws iam attach-role-policy --role-name $rolename --policy-arn="arn:aws:iam::aws:policy/service-role/AWSBackupServiceRolePolicyForRestores"

$rolearn=aws iam get-role --role-name $rolename --query 'Role.Arn'

Restoring an S3 bucket can be done via Powershell as described in the remainder: You can restore the S3 data in-place, into another existing bucket, or into a new bucket.

$uuid = New-Guid
$metadata = @"
{
  "DestinationBucketName": "man-validation-platform-int-results",
  "NewBucket": "true",
  "RestoreTime": "2022-06-20T23:45:00.000Z",
  "Encrypted": "false",
  "CreationToken": "$uuid"
}
"@
$metadata = $metadata -replace '([\\]*)"', '$1$1\"'
aws backup start-restore-job `
--recovery-point-arn "arn:aws:backup:eu-central-1:012345678901:recovery-point:continuous:simphera-reference-production-0f51c39b" `
--iam-role-arn $rolearn `
--metadata $metadata

Alternatively, you can restore the S3 data via the AWS console.

Encryption

Encryption is enabled at all AWS resources that are created by Terraform:

  • PostgreSQL databases
  • S3 buckets
  • EFS (Elastic file system)
  • CloudWatch logs
  • Backup Vault

Rotating Credentials

Credentials can be manually rotated: Open the secret in the Secrets Manager console and change the passwords manually. Fill in the placeholders <namespace> and the <path_to_kubeconfig> and run the following command to remove SIMPHERA from your Kubernetes cluster:

helm delete simphera -n <namespace> --kubeconfig <path_to_kubeconfig>

Reinstall the SIMPHERA Quickstart Helmchart so that all Kubernetes pods and jobs will retrieve the new credentials. Important: During credentials rotation, SIMPHERA will not be available for a short period.

Requirements

Name Version
terraform >= 1.1.7
aws ~> 4.47
helm 2.9.0
kubernetes 2.18.1

Providers

Name Version
aws ~> 4.47

Modules

Name Source Version
eks git::https://github.com/aws-ia/terraform-aws-eks-blueprints.git v4.32.1
eks-addons git::https://github.com/aws-ia/terraform-aws-eks-blueprints.git//modules/kubernetes-addons v4.32.1
security_group terraform-aws-modules/security-group/aws ~> 4
simphera_instance ./modules/simphera_aws_instance n/a
vpc terraform-aws-modules/vpc/aws v3.11.0

Resources

Name Type
aws_cloudwatch_log_group.flowlogs resource
aws_cloudwatch_log_group.ssm_install_log_group resource
aws_cloudwatch_log_group.ssm_scan_log_group resource
aws_flow_log.flowlog resource
aws_iam_instance_profile.license_server_profile resource
aws_iam_policy.flowlogs_policy resource
aws_iam_policy.license_server_policy resource
aws_iam_role.flowlogs_role resource
aws_iam_role.license_server_role resource
aws_iam_role_policy_attachment.flowlogs_attachment resource
aws_iam_role_policy_attachment.license_server_ssm resource
aws_iam_role_policy_attachment.minio_policy_attachment resource
aws_instance.license_server resource
aws_kms_key.kms_key_cloudwatch_log_group resource
aws_s3_bucket.bucket_logs resource
aws_s3_bucket.license_server_bucket resource
aws_s3_bucket_acl.license_server_bucket_acl resource
aws_s3_bucket_logging.logging resource
aws_s3_bucket_policy.buckets_logs_ssl resource
aws_s3_bucket_policy.license_server_bucket_ssl resource
aws_s3_bucket_public_access_block.buckets_logs_access resource
aws_s3_bucket_server_side_encryption_configuration.bucket_logs_encryption resource
aws_ssm_maintenance_window.install resource
aws_ssm_maintenance_window.scan resource
aws_ssm_maintenance_window_target.install resource
aws_ssm_maintenance_window_target.scan resource
aws_ssm_maintenance_window_target.scan_eks_nodes resource
aws_ssm_maintenance_window_task.install resource
aws_ssm_maintenance_window_task.scan resource
aws_ssm_patch_baseline.production resource
aws_ssm_patch_group.patch_group resource
aws_ami.amazon_linux_kernel5 data source
aws_availability_zones.available data source
aws_eks_cluster.cluster data source
aws_eks_cluster_auth.cluster data source
aws_region.current data source

Inputs

Name Description Type Default Required
account_id The AWS account id to be used to create resources. string n/a yes
cloudwatch_retention Global cloudwatch retention period for the EKS, VPC, SSM, and PostgreSQL logs. number 7 no
enable_aws_for_fluentbit Install FluentBit to send container logs to CloudWatch. bool false no
enable_ingress_nginx Enable Ingress Nginx add-on bool false no
enable_patching Scans license server EC2 instance and EKS nodes for updates. Installs patches on license server automatically. EKS nodes need to be updated manually. bool false no
gpuNodeCountMax The maximum number of nodes for gpu job execution number 12 no
gpuNodeCountMin The minimum number of nodes for gpu job execution number 0 no
gpuNodePool Specifies whether an additional node pool for gpu job execution is added to the kubernetes cluster bool false no
gpuNodeSize The machine size of the nodes for the gpu job execution list(string)
[
"p3.2xlarge"
]
no
infrastructurename The name of the infrastructure. e.g. simphera-infra string n/a yes
install_schedule 6-field Cron expression describing the install maintenance schedule. Must not overlap with variable scan_schedule. string "cron(0 3 * * ? *)" no
kubernetesVersion The version of the EKS cluster. string "1.22" no
licenseServer Specifies whether a license server VM will be created. bool false no
linuxExecutionNodeCountMax The maximum number of Linux nodes for the job execution number 10 no
linuxExecutionNodeCountMin The minimum number of Linux nodes for the job execution number 0 no
linuxExecutionNodeSize The machine size of the Linux nodes for the job execution list(string)
[
"m5a.4xlarge",
"m5a.8xlarge"
]
no
linuxNodeCountMax The maximum number of Linux nodes for the regular services number 12 no
linuxNodeCountMin The minimum number of Linux nodes for the regular services number 1 no
linuxNodeSize The machine size of the Linux nodes for the regular services list(string)
[
"m5a.4xlarge",
"m5a.8xlarge"
]
no
maintainance_duration How long in hours for the maintenance window. number 3 no
map_accounts Additional AWS account numbers to add to the aws-auth ConfigMap list(string) [] no
map_roles Additional IAM roles to add to the aws-auth ConfigMap
list(object({
rolearn = string
username = string
groups = list(string)
}))
[] no
map_users Additional IAM users to add to the aws-auth ConfigMap
list(object({
userarn = string
username = string
groups = list(string)
}))
[] no
profile The AWS profile used. string "default" no
region The AWS region to be used. string "eu-central-1" no
scan_schedule 6-field Cron expression describing the scan maintenance schedule. Must not overlap with variable install_schedule. string "cron(0 0 * * ? *)" no
simpheraInstances A list containing the individual SIMPHERA instances, such as 'staging' and 'production'.
map(object({
name = string
postgresqlVersion = string
postgresqlStorage = number
postgresqlMaxStorage = number
db_instance_type_simphera = string
postgresqlStorageKeycloak = number
postgresqlMaxStorageKeycloak = number
db_instance_type_keycloak = string
k8s_namespace = string
secretname = string
enable_backup_service = bool
backup_retention = number
enable_deletion_protection = bool

}))
n/a yes
tags The tags to be added to all resources. map(any) {} no
vpcCidr The CIDR for the virtual private cluster. string "10.1.0.0/18" no
vpcDatabaseSubnets List of CIDRs for the database subnets. list(any)
[
"10.1.24.0/22",
"10.1.28.0/22",
"10.1.32.0/22"
]
no
vpcPrivateSubnets List of CIDRs for the private subnets. list(any)
[
"10.1.0.0/22",
"10.1.4.0/22",
"10.1.8.0/22"
]
no
vpcPublicSubnets List of CIDRs for the public subnets. list(any)
[
"10.1.12.0/22",
"10.1.16.0/22",
"10.1.20.0/22"
]
no

Outputs

Name Description
backup_vaults Backups vaults from all SIMPHERA instances.
database_endpoints Identifiers of the SIMPHERA and Keycloak databases from all SIMPHERA instances.
database_identifiers Identifiers of the SIMPHERA and Keycloak databases from all SIMPHERA instances.
s3_buckets S3 buckets from all SIMPHERA instances.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published