diff --git a/rfcs/rfc-111.md b/rfcs/rfc-111.md new file mode 100644 index 000000000..e03a5be24 --- /dev/null +++ b/rfcs/rfc-111.md @@ -0,0 +1,354 @@ +# RFC-111 - CloudFormation Guard Block Evaluations +This RFC details rule, type, and conditional block evaluations that allow for easier and more succinct rule authoring. + +# Motivation + +Related Issues: [#105](https://github.com/aws-cloudformation/cloudformation-guard/issues/105), [#75](https://github.com/aws-cloudformation/cloudformation-guard/issues/75), [#49](https://github.com/aws-cloudformation/cloudformation-guard/issues/49) + +Since releasing guard in October of 2020, the CloudFormation team has gathered feedback from a variety of sources on the current rule language authoring experience. From this feedback, we are proposing some enhancements to the existing ruleset language in the form of grouped type and rule blocks to address the below issues. + +## Tenets (unless you know better ones) +**Simple**: The language must be simple for customers to author rules, simple for IDE integrations, readable for human comprehension while being machine enforceable. + +**Unambiguous**: The language must not allow for ambiguous interpretations that makes it hard to comprehend what is being evaluated. The tool is targeted for security and compliance related attestations that need the auditor to consistent and unambiguously understand rules and their evaluations. + +**Deterministic**: The language design must allow language implementers to have deterministic, consistent and isolated evalutions. Results for repeated evaluations for the same context and rule set must evaluate to the same result everytime. Time to evaluate results inside near identical environments must be within acceptable tolerance limits. + +**Composable**: The language makes composition of higher order rule sets from multiple different rules sets simple with consistent interpretation and syntax. Composition should not add complexity to interpretation and customers can easily navigate across them. + +## Issues with Current Language + +Below are the issues we are attempting to address with the introduction of block evaluations, along with samples of the proposed enhancements to the language. + +### **Increased verbosity** + +Current rulesets are extremely verbose, with information required to be repeated for every rule, as seen in the example below: + +``` +AWS::ApiGateway::Method AuthorizationType == NONE +AWS::ApiGateway::Method HttpMethod == ANY +AWS::ApiGateway::Method Integration == {"IntegrationHttpMethod":"POST","Type":"AWS_PROXY","Uri":"arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaWAFBadBotParserFunction.Arn}/invocations"} +AWS::ApiGateway::Method RequestParameters == {"method.request.header.X-Forwarded-For":false} +AWS::ApiGateway::Method ResourceId == ApiGatewayBadBot.RootResourceId |OR| AWS::ApiGateway::Method ResourceId == ApiGatewayBadBotResource +AWS::ApiGateway::Method RestApiId == ApiGatewayBadBot +``` + +In the above ruleset, every statement clause needs to repeat the type information like AWS::ApiGateway::Method, type has to be repeated across OR clauses, and OR clauses have to present on a single line which makes it hard to read end to end. This violates our tenet of simplicity. The current rule authoring experience is quite tedious because of this, with a large amount of information replicated line to line that is the same for a given resource type. + +With the proposed rule evaluations, rule authors can convert the above rule into a type block, which allows for grouping multiple checks on a certain resource type together in a simple way: + +``` +AWS::ApiGateway::Method { + Properties.AuthorizationType == "NONE" + Properties.HttpMethod == "ANY" + Properties.Integration == {"IntegrationHttpMethod":"POST","Type":"AWS_PROXY","Uri":"arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaWAFBadBotParserFunction.Arn}/invocations"} + Properties.RequestParameters == {"method.request.header.X-Forwarded-For":false} + Properties.ResourceId == "ApiGatewayBadBot.RootResourceId" or + Properties.ResourceId == "ApiGatewayBadBotResource" + Properties.RestApiId == "ApiGatewayBadBot" +} +``` + +More information on type blocks is available in the [proposal](#type-blocks) section. + + +### **Insufficient modelling capability** + +The current syntax does not allow customers to model clauses with disjunction (OR). Today it is difficult and sometimes impossible for the customer to express rules of the form (a and b and c) or (d and f) that is native to an [CNF](https://en.wikipedia.org/wiki/Conjunctive_normal_form) notation. + +For example, if one wanted to enforce that all S3 Buckets defined in a template that used KMS encryption used a specific key for encryption or if this is not the case perform some other checks, one could try to construct this rule on a single line, but evaluation is ambiguous: + +``` +AWS::S3::Bucket WHEN BucketEncryption.ServerSideEncryptionConfiguration.*.ServerSideEncryptionByDefault.SSEAlgorithm == aws:kms CHECK AWS::S3::Bucket BucketEncryption.ServerSideEncryptionConfiguration.*.ServerSideEncryptionByDefault.SSEAlgorithm.KMSMasterKeyId == /some kms regex/ |OR| AWS::S3::Bucket BucketName == /someNonEncryptedBucketRegex/ +``` + +In the above, the rule is extremely verbose, and it is unclear to end rule authors if the bucket name check on the right of the "|OR|" clause is evaluated as part of the "WHEN" check or if the "WHEN" check fails evaluation. This violates the tenet of the language being unambiguous. Ambiguity should not be present in a tool used for enforcement of security and organizational best practices. Complex rules like this are both hard to author and to interpret. + +Using proposed named and conditional block evaluations, rule authors can name a rules for reuse and composability. Named blocks allow rule authors to explicitly name a certain check in a ruleset. Conditional blocks operate similarly to the existing "WHEN CHECK" functionality while taking advantage of the new block evaluation syntax. With the new additions, the aforementioned ruleset can be authored in a much less ambiguous way: + +``` +rule bucket_kms_enabled { + AWS::S3::Bucket { + Properties.BucketEncryption.ServerSideEncryptionConfiguration.*.ServerSideEncryptionByDefault.SSEAlgorithm == "aws:kms" + } +} +rule bucket_encryption_naming_rule { + AWS::S3::Bucket when bucket_kms_enabled { + Properties.BucketEncryption.ServerSideEncryptionConfiguration.*.ServerSideEncryptionByDefault.SSEAlgorithm.KMSMasterKeyId == /some kms regex/ + } or + AWS::S3::Bucket Properties.BucketName == /someNonEncryptedBucketRegex/ +} +``` + +More information on the named and conditional blocks can be found in the [proposal](#named-rules) section. + +### **No re-use capability** + +The current syntax does not allow for a set of clauses to be defined and consistently re-used across multiple sets. Often when writing clauses inside a rule set, there are common set of clauses that is applicable to multiple sets. E.g. customers want to define that all S3 buckets be named with the same prefix and has encryption on. They would then combine that with different ways in which encryption is handled. The current syntax does not support this. This violates the composability tenet that we are introducing. + +The above ruleset example shows how rules can be reused across a ruleset to compose rules dependent on other rules without writing out checks in their entirety each place they need to be done. The rule `bucket_kms_enabled` can be added to subsequent checks relating to KMS encrypted buckets. + +### **Inability to identify rules during evaluation** + +Current rule evaluation results in a binary PASS/FAIL output with a somewhat verbose output that makes it hard to see which rules failed exactly. Customers can add rule evaluation failure messages to each unique rule line, but parsing this output is not user friendly since it is not formatted in a structured way e.g. JSON that would allow users to quickly identify failures or for additional tooling to easily interpret results. The rule output here violates the unambiguous tenet. + +For example, in the below output from the tool's README, the verbose output is great for fixing templates but does not provide a ton of information on which rules in particular passed/failed evaluation. + +``` +cfn-guard check -t ebs_volume_template.json -r ebs_volume_rule_set +"[NewVolume2] failed because [AvailabilityZone] is [us-east-1b] and the pattern [us-east-.*] is not permitted" +"[NewVolume2] failed because [Encrypted] is [true] and that value is not permitted" +"[NewVolume2] failed because [us-east-1b] is in [us-east-1a,us-east-1b,us-east-1c] which is not permitted for [AvailabilityZone]" +"[NewVolume] failed because [AvailabilityZone] is [us-east-1b] and the pattern [us-east-.*] is not permitted" +"[NewVolume] failed because [Size] is [100] and the permitted value is [101]" +"[NewVolume] failed because [Size] is [100] and the permitted value is [99]" +"[NewVolume] failed because [Size] is [100] and the permitted value is [>= 101]" +"[NewVolume] failed because [us-east-1b] is in [us-east-1a,us-east-1b,us-east-1c] which is not permitted for [AvailabilityZone]" +Number of failures: 7 +``` + +With the proposed changes, evaluation output will contain an evaluation result for each named rule block in a ruleset, allowing one to easily detect issues with templates when evaluating. + +# Proposal: Rule Block Evaluation + +Named rule, type, and conditional blocks are enhancements to the current ruleset language that give end rule authors and interpreters the ability to define rules once as reuse specific checks across rules. This allows users to compose complex rules easily in a way that they cannot be misinterpreted like in the previous Guard language. + +Doing so also allows for specific rule identifiers to be present in the output, allowing evaluators to map passes/failures of specific rules to named rules in the ruleset. + +## Proposed Language Additions + +The language enhancements are made to more fit a [Conjunctive Normal Form](https://en.wikipedia.org/wiki/Conjunctive_normal_form), a fancy way to say that the language is a set of logical ANDs across a set of logical ORs clauses. E.g. (A and B and C), where C = (D or F). Here is example of the language enhancements that demonstrates the proposed features. + +``` +rule example_rule when stage == 'prod' { + let ec2_instance_types := [/^t*/, /^m*/] # scoped variable assignments + + # clause can reference another rule for composition + dependent_rule # named rule reference + + # IN (disjunction, one of them) + AWS::EC2::Instance InstanceType IN %ec2_instance_types + + # Block groups for evaluating groups of clauses together. + # The "type" "AWS::EC2::Instance" is static + # type information that help validate if access query inside the block is + # valid or invalid + AWS::EC2::Instance { # Either an EBS volume + Properties.BlockDeviceMappings[*].Ebs EXISTS + Properties.BlockDeviceMappings[*].DeviceName == /^\/dev\/ebs-/ # must have ebs in the name + Properties.BlockDeviceMappings[*].Ebs.Encrypted == true # Ebs volume must be encrypted + Properties.BlockDeviceMappings[*].Ebs.DeleteOnTermination == true # Ebs volume must have deletion protection + } or + AWS::EC2::Instance { # OR a regular volume (disjunction) + Properties.BlockDeviceMappings[*].device_name == /^\/dev\/sdc-\d/ # all other local must have sdc + } +} + +rule dependent_rule { ... } +``` + +## Rule Blocks and Type Blocks in detail + +Below are definitions of the proposed features and examples of how rulesets can be converted to use the new syntax. + +### **Type Blocks** + +Type Blocks reduce the verbosity in specifying individual clauses for the same type in a succinct manner. They use the below format: + +
+[type-name] [when conditions] { + clauses + assignments +} ++ +Any clauses in the type block without a resource type specified check objects rooted at the same level where the resource's type is defined, for example an AWS::S3::Bucket rule block and evaluating the below template: + +```yaml +Resources: + MyBucket: + Type: AWS::S3::Bucket + Properties: + BucketName: MyTestBucket + DeletionPolicy: Retain + Metadata: + IsProd: true +``` +The following properties would be available to the type block without further specification of resource type: + +```json +{ + "Type": "AWS::S3::Bucket", + "Properties": { + "BucketName": "MyTestBucket" + }, + "DeletionPolicy": "Retain", + "Metadata": { + "IsProd": true + } +} +``` + +This is a divergence from the initial implementation, as users would have to specify "Properties" in a path before checking properties, but this allows for a more standardized way to check resource properties; authors no longer need to use the dot prefix to access keys on this level like DeletionPolicy and Metadata. + +E.g. taking a modified example from verbosity issue we described above +``` +AWS::ApiGateway::Method AuthorizationType == NONE +AWS::ApiGateway::Method Integration == {"IntegrationHttpMethod":"POST","Type":"AWS_PROXY","Uri":"arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaWAFBadBotParserFunction.Arn}/invocations"} +AWS::ApiGateway::Method ResourceId == ApiGatewayBadBot.RootResourceId |OR| AWS::ApiGateway::Method ResourceId == ApiGatewayBadBotResource +AWS::ApiGateway::Method .DeletionPolicy == Retain +AWS::ApiGateway::Method .Metadata.CreatedBy == SecureCFNGenerationTool +``` + +The ruleset above converted to use a type block: +``` +AWS::ApiGateway::Method { + Properties.AuthorizationType == "NONE" + Properties.Integration == {"IntegrationHttpMethod":"POST","Type":"AWS_PROXY","Uri":"arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaWAFBadBotParserFunction.Arn}/invocations"} + Properties.ResourceId == "ApiGatewayBadBot.RootResourceId" or + Properties.ResourceId == "ApiGatewayBadBotResource" + DeletionPolicy == "Retain" + Metadata.CreatedBy == "SecureCFNGenerationTool" +} +``` +Customers can continue to use the simplicity of expression from the previous language without the repetition of the type name. + +### **Named-Rules** + +A named rule is a collection of clauses in the CNF form. Named rules allow for expressing powerful combinations to provide more complex evaluation than primitive rules. Named rules allow for re-use and improved composition and remove verbosity and repetition. This also allows rule evaluators to concisely identify passing/failing rules in the evaluation output. Named rules take the following form: + +
+[rule rule_name] [when conditions] { + clauses + type-blocks + assignments +} ++ +Here, _rule_ keyword designates the start of a named rule block. The keyword is followed by the *rule_name* that is a human readable name. Rule names are optional but provide context in output as to what exactly failed. When evaluating the rules file, the *rule_name* is displayed along with with status for the evaluation PASS. FAIL or SKIP. The rule name can be followed by optional conditions (_When_ guards) that act as a guard to determine if the rule is application for evaluation or must be skipped, a.k.a conditionally evaluated (akin to WHEN CHECK in the previous language). + + +The block contains a set of clauses in Conjunctive Normal Form. To simplify authoring clauses and provide a consistent interpretation model, the following rules apply: +1. each clause present on its own newline provides an implicit AND in CNF notation. +2. Any clause joined with an "or" keyword represents a disjunction or OR clause with the next one. + +As an example + +
+rule example { + + clause1 + clause2 + + clause3 OR + clause4 + + clause5 OR clause6 +} ++ +represents ```clause1 AND clause2 AND (clause3 or clause4) AND (clause5 OR clause6)``` + + + +Here is an example of a named rule: + +``` +rule s3_encrypted_buckets { + AWS::S3::Bucket { + Properties.BucketName == /Encrypted/ + Properties.BucketEncryption != null + } +} + +rule s3_with_kms { + s3_encrypted_buckets # reusing above rule since encryption applies here as well + AWS::S3::Bucket { + let algo := Properties.BucketEncryption.ServerSideEncryptionConfiguration.*.ServerSideEncryptionByDefault + %algo.SSEAlgorithm == "aws:kms" + %algo.KMSMasterKeyID in [/kms-xxx/, /kms-yyy/] + } +} + +``` + +This allows for common rules to be written once and re-used in other rules. In the above example `s3_encrypted_buckets` is defined once and can be re-used in `s3_with_kms` and other rules. + +Complex rules expressions like the following `((r1 or r2) and (r3 or r4)) or (r6 and (r7 or (r8 and r9)) and (r10 and (r11 or r12)))` form is hard for the reader/auditor to even understand and breakdown the evaluation expressed. The intention with the named rules is to eliminate the parenthesis and allow reuse to simplify. + +### Advantages of Named Rules and Type Blocks + +Named-rules and type-blocks also address the problem of “*Insufficient modelling capability”.* Here is an example with type blocks use for the `AWS::EC2::Instance` that requires a comparison of the form `(a and b and c) or (d and f)`. Let us take an example where customers need to test either **all** volumes are EBS and have certain features turned on, or they are all local and have a common mount name. + +``` +AWS::EC2::Instance { # Either an EBS volume + Properties.BlockDeviceMappings.*.Ebs != null # Ebs is setup + Properties.BlockDeviceMappings.*.device_name == /^\/dev\/ebs-/ # must have ebs in the name + Properties.BlockDeviceMappings.*.Ebs.encrypted == true # Ebs volume must be encrypted + Properties.BlockDeviceMappings.*.Ebs.delete_on_termination == true # Ebs volume must have delete protection +} OR +AWS::EC2::Instance { # OR a regular volume (disjunction) + Properties.BlockDeviceMappings.*.device_name == /^\/dev\/sdc-\d/ # all other local must have sdc +} +``` + +### Enhanced outputs with named rules + +Because rules can be explicitly named in rulesets, we can also provide this to customers in the rule evaluation output, with an explicit PASS/FAIL/SKIP (When a "when" block is not satisfied) next to each rule's name. Evaluation will output an overall status of a ruleset against a given template, along with evaluation status of all named rules and their dependent clauses as well. This gives customers a large amount of visibility into which exact rule/clause failed for easier debugging. Console output would be of the form: + +``` +Ruleset RULESET_FILE_NAME: Overall Status PASSED +/named_rule_one: PASSED +/named_rule_one/AWS::S3::Bucket/Clause#linenumber: PASSED +/AWS::S3::Bucket/Clause#linenumber: FAILED (additional info on failures) +``` + +With such a structured format for output, this will allow the tool to handle different output formats like JSON in addition to the console output for easier parsing of rule evaluation results. + +## Drawbacks/Open Questions + +As mentioned above, this approach would change the root at which type clauses are evaluated. Instead of the properties section of a given resource type, the properties available for evaluation are from the object containing properties, e.g. + +```json +{ + "Type": "AWS::S3::Bucket", + "Properties": { + "BucketName": "MyBucket" + }, + "DeletionPolicy": "Retain" +} +``` + +instead of + +```json +{ + "BucketName": "MyBucket" +} +``` + +the change would require checks on resource properties to be prefixed with "Properties." to maintain parity. Requesting more feedback on what users prefer. The change makes certain operations like accessing deletion policy less ambiguous and does away with the [relative operator](https://github.com/aws-cloudformation/cloudformation-guard/tree/master/cfn-guard#checking-resource-properties-and-attributes) used to check deletion policy and resource attributes. We feel this makes the language less ambiguous in an exchange for more explicitly specifying property paths. + +An alternative considered is to allow for nested blocks inside of type blocks to allow for easier checking, so a block like: +``` +AWS::EC2::Instance { + Properties.BlockDeviceMappings.*.Ebs != null + Properties.BlockDeviceMappings.*.device_name == /^\/dev\/ebs-/ + Properties.BlockDeviceMappings.*.Ebs.encrypted == true + Properties.BlockDeviceMappings.*.Ebs.delete_on_termination == true +} +``` + +is equivalent to: +``` +AWS::EC2::Instance { + Properties { + BlockDeviceMappings.*.Ebs != null + BlockDeviceMappings.*.device_name == /^\/dev\/ebs-/ + BlockDeviceMappings.*.Ebs.encrypted == true + BlockDeviceMappings.*.Ebs.delete_on_termination == true + } +} +``` + +Would like to see the preference of users on this specific point as to what is easiest to use.