Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HCP Cluster Resource Deletion Cascading Subscription Delete #920

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mbarnes
Copy link
Collaborator

@mbarnes mbarnes commented Dec 3, 2024

What this PR does

When a subscription state changes to "Deleted", the RP now triggers a deletion of all HCP clusters under the subscription as per the Resource Provider Contract.

Behind the scenes, this introduces the concept of "implicit" and "explicit" async operations:

  • An "implicit" async operation has an "Operation" item in Cosmos DB, but no status endpoint for ARM to poll.
  • An "explicit" async operation starts as an "implicit" operation. The Frontend.ExposeOperation method enriches the "Operation" item with information necessary to make the status endpoint accessible to ARM, and adds appropriate async headers to an http.ResponseWriter.

Importantly, the backend pod does not distinguish between implicit and explicit async operations. The sole purpose of an "implicit" async operation at the moment, which is only used for deletions, is for the backend to delete the "Resource" item in Cosmos DB after the actual resource is deleted.

Jira: ARO-13321 - Implement Cascading Subscription Deletion

Special notes for your reviewer

  • This duplicates a few database iterator commits from #883, which is still outstanding.
  • I'll add unit tests for this in a follow-up PR after I convert our existing unit tests to use gomock for Cosmos DB operations. To add unit tests now would just create extra work for myself in the conversion effort.
  • I still need to document asynchronous operation mechanics in general and this "implicit" vs "explicit" concept will be part of it. I've been holding off on writing documentation until I can make some (imo) necessary changes to our database design. This is just to say I haven't forgotten about it.

Copy link

github-actions bot commented Jan 8, 2025

Please rebase pull request.

Matthew Barnes added 7 commits January 8, 2025 10:49
Add "externalID" and "internalID" parameters so the returned
document is a minimum valid OperationDocument for writing.
The operation item must now be created in the database prior to
calling ExposeOperation. ExposeOperation does all its processing
in a database update callback.

This is because there is an increasing number of cases where we
create an implicit async operation with no visible status endpoint.
Calling ExposeOperation makes an implicit async operation explicit,
with a status endpoint for ARM to poll. Hence the rename.

The tradeoff is explicit asyncrhonous operations now require two
database operations (create and update) but it helps make the RP
logic cleaner. This could possibly be mitigated in the future by
using Cosmos DB's transactional batch operations, but it's gonna
take some serious refactoring to get there.
CancelActiveOperation marks the status of any active operation on
the resource as canceled.
Will be reusing DeleteResource for subscription deletion.

Add database bookkeeping for the resource and any child resources.
This includes creating implicit operations for each resource being
deleted. The caller may then expose the returned operation ID.
By my read of the Subscription Lifecycle API Reference [1], we
should favor 200 OK over 201 Created when creating or updating
a subscription.

[1]
https://github.com/cloud-and-ai-microsoft/resource-provider-contract/blob/master/v1.0/subscription-lifecycle-api-reference.md#response
Called when a subscription is deleted. The method is idempotent in
case of multiple subscription PUT requests.
Don't count on OperationID being set in OperationDocuments.
Implicit async operations will not have this field set. Get
the subscription ID from ExternalID instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant