Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Relation.active, exclude inactive relations from Model.relations #1091

Merged
merged 21 commits into from
Jan 12, 2024

Conversation

tonyandrewmeyer
Copy link
Contributor

@tonyandrewmeyer tonyandrewmeyer commented Dec 14, 2023

This PR allows Charmers to easily check whether a relation is broken during a relation-broken event (and also framework events that share the same Juju context) without having to store state or pass the event object around. In addition, the broken relation is excluded from the Model.relations list, to make it easy for charms to work on only the relations that will continue to exist after the relation-broken events complete (without having to do (rel for rel in self.model.relations if rel.active) themselves.

To do this:

  • In relation-broken events, set a new attribute active on the event's Relation object to False (and have this be True in all other cases).
  • In relation-broken events, RelationMapping __getitem__ excludes the event's relation.

Builds on work from @ca-scribner in #958.

Additional background notes (limited to Canonical).

Fixes #940

@tonyandrewmeyer
Copy link
Contributor Author

tonyandrewmeyer commented Dec 14, 2023

I found 25 Charms (in the 150+ Canonical charms list we have collated) that have relation-broken events. 4 of those 25 use Model.relations in a relation-broken handler, and all should continue to work with (and be better for) this change. 3 of the 25 charms have an existing form of "relation is active" so could have simplified code after this change.

Details are in the doc linked in the description.

@tonyandrewmeyer
Copy link
Contributor Author

Among the use cases this should address:

  • Collect status - to put the unit/app into a Blocked state when a required relation is broken (example) - this can just do if not self.model.relations[name] and it will cover both the relation being broken and the relation not existing.
  • Avoiding setting relation data during relation broken (example) - this can just do and event.active
  • Update configuration to exclude the broken relation without passing around the event (example) - this should also be covered by relations excluding the broken relation. The only open question is the idea mentioned in that issue that an event after relation-broken could still list the broken relation.
  • Know which properties can be accessed (example) - covered by .active.

@tonyandrewmeyer tonyandrewmeyer marked this pull request as ready for review December 14, 2023 10:09
Copy link
Collaborator

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions and some minor comments.

CHANGES.md Outdated Show resolved Hide resolved
ops/model.py Outdated Show resolved Hide resolved
ops/model.py Outdated Show resolved Hide resolved
ops/testing.py Show resolved Hide resolved
test/test_model.py Outdated Show resolved Hide resolved
ops/model.py Show resolved Hide resolved
@benhoyt
Copy link
Collaborator

benhoyt commented Jan 2, 2024

3 of the 25 charms have an existing form of "relation is active" so could have simplified code after this change.

@tonyandrewmeyer Do you happen to have links to the code where these 3 charms do this? I'd be curious to see examples of that and how people are doing it now.

@benhoyt
Copy link
Collaborator

benhoyt commented Jan 2, 2024

Oh wait, I see you've linked to a couple of them in your follow-up comment above -- that's fine, thanks.

ops/model.py Show resolved Hide resolved
Copy link
Collaborator

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Couple of very minor optional comments.

ops/model.py Outdated Show resolved Hide resolved
test/charms/test_main/src/charm.py Outdated Show resolved Hide resolved
@benhoyt benhoyt merged commit cc29ea3 into canonical:main Jan 12, 2024
17 checks passed
@carlcsaposs-canonical
Copy link
Contributor

@tonyandrewmeyer @benhoyt I think I might be missing something—how do charms access the relations that are breaking if they are excluded from Model.relations?

mysql-router relies on being able to access relations that are breaking so that it can properly clean up database users: https://github.com/canonical/mysql-router-k8s-operator/blob/d89b0f820cfc574b04e0b86b5412db9da0d7ff2e/src/relations/database_provides.py#L200-L208

More context: #940 (comment)

@tonyandrewmeyer tonyandrewmeyer deleted the relation-broken-940 branch January 12, 2024 08:34
@tonyandrewmeyer
Copy link
Contributor Author

tonyandrewmeyer commented Jan 12, 2024

@tonyandrewmeyer @benhoyt I think I might be missing something—how do charms access the relations that are breaking if they are excluded from Model.relations?

mysql-router relies on being able to access relations that are breaking so that it can properly clean up database users: https://github.com/canonical/mysql-router-k8s-operator/blob/d89b0f820cfc574b04e0b86b5412db9da0d7ff2e/src/relations/database_provides.py#L200-L208

More context: #940 (comment)

If you're in the relation-broken event, then event.relation has the relation.

For the case above, you're basically doing the same thing when you raise the _RelationBreaking exception (checking if it's relation-broken and if the ID matches the one in the event). So it looks to me like the behaviour would not change - at the moment you raise that exception and the relation doesn't get appended to the list, and with the newer version of ops you'd not actually have that relation in the loop (so it would still not get appended to the list) - and you'd be able to remove the custom "is this relation breaking" code that you have, since ops will be doing that for you.

Am I missing something?

(For what it's worth, I think that the Juju folk would say that relation-departed is probably where the clean-up ought to happen, rather than relation-broken).

@carlcsaposs-canonical
Copy link
Contributor

If a relation is breaking, we expect it to be in self._shared_users but not requested_users (which will cause it to get deleted here https://github.com/canonical/mysql-router-k8s-operator/blob/d89b0f820cfc574b04e0b86b5412db9da0d7ff2e/src/relations/database_provides.py#L223)

This change in ops means that the relation won't be in self._shared_users

@carlcsaposs-canonical
Copy link
Contributor

(For what it's worth, I think that the Juju folk would say that relation-departed is probably where the clean-up ought to happen, rather than relation-broken).

During relation-departed, we don't know if the relation is breaking or the unit is shutting down

In the first case, we need to remove the user. In the second case, we need to not remove the user. More details: #940 (comment)

@tonyandrewmeyer
Copy link
Contributor Author

tonyandrewmeyer commented Jan 12, 2024

This change in ops means that the relation won't be in self._shared_users

Ah, that's the part I had missed, thanks!

We (mostly I) reviewed a lot of .relations use across 150+ charms in working on this, focusing on relation-broken but cases like this where it's difficult to know that .relations is being used from relation-broken could be (and in this case were) missed. When reviewing I would have got to the reconcile_users method but missed that _shared_users was a property that uses .relations as well.

I did run the tests (whichever ones a plain tox runs) in several charms (even some that don't handle running "tox" out of the box, like here), including this one, but I guess the tests don't cover this specific scenario, unfortunately (maybe there are integration tests or some others that a plain tox doesn't handle that do?).

We did know that this is backwards incompatible, which is why we tried to find cases where it might break charms, not just assume it would be ok. Obviously we (again, mostly I) didn't manage to do that well enough (and particularly given that you had commented in the original ticket: sorry - I did read that ticket many many times but it's tough to follow).

It would be easy enough to change the charm to work with the ops change - e.g. have event.relation.delete_user(shell=shell) in a relation-broken handler, or have something like:

if not event.relation.active:
    relation.delete_user(shell=shell)

In the reconcile_users method. (And you'd need to pass the event to delete_all_databags and handle it there too).

If you want all the relations, even the broken one, then as long as you have the event object, you can get it and append it to the list the model provides.

Note that the intention is that Juju also makes this change, and that ops was just 'leading the way'.

@benhoyt is finished for the week, but I'll talk this over with him on Monday to decide whether this is too much of a change to make after all, and we should limit this to just the .active attribute.

Thanks for bringing this up (again), especially before the 2.10 release!

@carlcsaposs-canonical
Copy link
Contributor

It would be easy enough to change the charm to work with the ops change [...] in a relation-departed handler

I don't think this would work

or have something like [...] In the reconcile_users method. (And you'd need to pass the event to delete_all_databags and handle it there too).

then as long as you have the event object, you can get it and append it to the list

These would work, although it somewhat defeats the purpose of not needing to pass the event around

@carlcsaposs-canonical
Copy link
Contributor

Note that the intention is that Juju also makes this change, and that ops was just 'leading the way'.

What exactly does this mean? Will we no longer be able to access local (not remote) databags during relation broken events?

@tonyandrewmeyer
Copy link
Contributor Author

Note that the intention is that Juju also makes this change, and that ops was just 'leading the way'.

What exactly does this mean? Will we no longer be able to access local (not remote) databags during relation broken events?

It means that Juju would behave like current main ops does: relation-ids would not return the event that's broken in the relation-broken hook. Whether or not relation data is accessible is a separate concern.

@tonyandrewmeyer
Copy link
Contributor Author

It would be easy enough to change the charm to work with the ops change [...] in a relation-departed handler

I don't think this would work

Sorry, that was a typo (fixed now) - I meant in a relation-broken handler.

@carlcsaposs-canonical
Copy link
Contributor

@tonyandrewmeyer does this potentially affect the keys in self.model.relations (e.g. if there was only one relation on an endpoint, would that endpoint still show up in the keys if we were in the relation broken event for that relation)?

wondering if we need to update this code: https://github.com/canonical/mysql-router-k8s-operator/blob/ecb17ddf94af9f8b4bba498bae19e9dab9cdfd8f/src/lifecycle.py#L57

@carlcsaposs-canonical
Copy link
Contributor

@tonyandrewmeyer does this potentially affect the keys in self.model.relations (e.g. if there was only one relation on an endpoint, would that endpoint still show up in the keys if we were in the relation broken event for that relation)?

wondering if we need to update this code: https://github.com/canonical/mysql-router-k8s-operator/blob/ecb17ddf94af9f8b4bba498bae19e9dab9cdfd8f/src/lifecycle.py#L57

from testing, looks like keys are populated even if no relations exist on endpoint

@carlcsaposs-canonical
Copy link
Contributor

carlcsaposs-canonical commented Jun 26, 2024

then as long as you have the event object, you can get it and append it to the list the model provides.

@tonyandrewmeyer does this work for collect status ops events or deferred ops events in the relation broken juju event? from a glance at the PR, it looks like we won't be able to access the relation data for the breaking relation during other ops events in the same juju event

carlcsaposs-canonical added a commit to canonical/mysql-router-k8s-operator that referenced this pull request Jun 26, 2024
Workaround for breaking change in ops 2.10: canonical/operator#1091 (comment)

Workaround may break during deferred ops events or collect status ops events (canonical/operator#1091 (comment))
@tonyandrewmeyer
Copy link
Contributor Author

Sorry, ran out of time for this today, but will check tomorrow (NZ time) morning. I also need to double check we're consistent now that the Juju fix has landed, so the same behaviour happens regardless of Juju version.

@tonyandrewmeyer
Copy link
Contributor Author

Sorry, turns out that should have been "not NZ time" 😞.

does this potentially affect the keys in self.model.relations (e.g. if there was only one relation on an endpoint, would that endpoint still show up in the keys if we were in the relation broken event for that relation)?

It will still be there, the self.model.relations keys are built based on the metadata, not on what Juju says. Even if you haven't ever done integrate, the relation will be in the keys (with an empty value).

wondering if we need to update this code: https://github.com/canonical/mysql-router-k8s-operator/blob/ecb17ddf94af9f8b4bba498bae19e9dab9cdfd8f/src/lifecycle.py#L57

If I'm understanding that correctly, I think it would still be fine.

@tonyandrewmeyer does this work for collect status ops events or deferred ops events in the relation broken juju event? from a glance at the PR, it looks like we won't be able to access the relation data for the breaking relation during other ops events in the same juju event

If the broken event is deferred then the data will be gone, since Juju doesn't know anything about the deferring.

If it's some previously deferred event, or the lifecycle/ops events like collect-status, then they should all see the same behaviour, because we set the broken relation ID based on the environment and that doesn't actually change for the other events.

This does indeed mean that you can't get the data in e.g. a previously deferred event, because it's missing from relations[rel_name]. It seems somewhat correct in the case of the lifecycle/ops events, but indeed not for the deferred one. I missed this when originally working on this 😞.

It's not immediately obvious to me how to fix this - the model gets reused for the deferral, and I don't think we would want to change that, and the model knows nothing about what event(s) are being handled.

Could you open an issue for this? If you can describe where it's causing issues that will help prioritise fixing it and provide context (it might be that someone else picks it up rather than me, although I can try to get to it soon).

I still have to try out the new Juju behaviour to make sure we're consistent there: will try to get to that today still, but it might end up being Monday (Friday is a holiday here).

@carlcsaposs-canonical
Copy link
Contributor

Could you open an issue for this? If you can describe where it's causing issues that will help prioritise fixing it and provide context (it might be that someone else picks it up rather than me, although I can try to get to it soon).

Opened issue here: #1279

mysql-router-k8s is not currently affected, since it uses no deferred events, no custom events during relation-broken, and no ops collect status events—but it's certainly possible in the future that the charm or a lib that we depend on could add a deferred event or custom event during relation-broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Relation.broken to allow querying if a relation is broken (breaking)
4 participants