-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermitent errors creating aws-native:ec2:SubnetRouteTableAssociation #1186
Comments
As per, pulumi/pulumi-aws-native#1186 we're seeing intermittent errors in the creation of `aws-native:ec2:SubnetRouteTableAssociation` This change enables retries for the affected tests to reduce the noise until we have an upstream fix Fixes #96
I have been looking a this a little bit, it appears notoriously difficult to reproduce locally. I have boiled down the relevant resource-set from the CDK examples to something minimal as follows: import * as aws from "@pulumi/aws-native";
const vpc = new aws.ec2.Vpc("my-vpc", {
cidrBlock: "10.0.0.0/16",
enableDnsHostnames: true,
enableDnsSupport: true,
instanceTenancy: "default",
tags: [
{
key: "Name",
value: "fargatestack/MyVpc"
}
]
});
const mySubnet = new aws.ec2.Subnet("my-subnet", {
availabilityZone: "us-west-2a",
cidrBlock: "10.0.0.0/18",
mapPublicIpOnLaunch: true,
tags: [
{
key: "aws-cdk:subnet-name",
value: "Public"
},
{
key: "aws-cdk:subnet-type",
value: "Public"
},
{
key: "Name",
value: "fargatestack/MyVpc/PublicSubnet1"
}
],
vpcId: vpc.id,
});
const myRT = new aws.ec2.RouteTable("my-rt", {
vpcId: vpc.id,
tags: [
{
key: "Name",
value: "fargatestack/MyVpc/MyRouteTable"
}
]
})
const myRTA = new aws.ec2.SubnetRouteTableAssociation("my-rta", {
routeTableId: myRT.id,
subnetId: mySubnet.id,
})
export const routeTableID = myRTA.id; Unfortunately standing this up and down does not quite reproduce the issue. |
Linking some more context in. Control reaches this code:
We have been staring at this with @flostadler . The code seems to be written well. What is happening is that the awaiter
The message is
|
Something that feels suspect is that the CDK workflow is scheduled with
https://github.com/pulumi/pulumi-cdk/blob/main/.github/workflows/main.yml#L89 It seems not entirely impossible that this would race with the aws-account-cleanup lambda scheduled to run every 12hrs: https://github.com/pulumi/aws-account-cleanup/blob/master/pkg/cleanvpc/cleanvpc.go#L208 Specifically there is no code to cleanup RTAs, but there is code to cleanup RTs. |
We could try tagging resources with |
Based on standup: Anton to search in CloudTrail by resource name and try to see what that gives it. |
We looked at CloudControl events, and it seems consistent with an eventual consistency issue. Adding aws-cloudformation/cloudformation-coverage-roadmap#2178 |
Introduces a retry for NotFound errors from GetResource executed right after Create. Based on the logs from our CI runs we suspect eventual consistency in AWS may cause some resources to fail with NotFound in Get even after WaitForResourceOpCompletion succeeded. Relates: #1186
Expecting this to be fixed by #1809 introducing retry but not being able to repro outside of pulumi-cdk CI. I will close for now and we reopen as needed. |
Looks like what finally solved it is moving tests to another region. @corymhall suspected interference with pulumi-eks tests. |
What happened?
Pulumi-cdk has been experiencing flaky test runs (pulumi/pulumi-cdk#96) due to the following error:
This error is encountered here: https://github.com/pulumi/pulumi-aws-native/blob/master/provider/pkg/provider/provider.go#L803 when cloud control fails to find the resource it has just created.
We'll need to work this down to a repro case to send to the CC API maintainers; we can also look into a workaround in aws-native (eg. adding our own retries for this particular case).
Example
https://github.com/pulumi/pulumi-cdk/tree/main/examples/alb is our best repro case at the moment.
Output of
pulumi about
N/A
Additional context
No response
Contributing
Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).
The text was updated successfully, but these errors were encountered: