Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to access breaking relation data during deferred/custom ops event processed in relation broken juju event #1279

Open
carlcsaposs-canonical opened this issue Jul 2, 2024 · 2 comments
Assignees
Labels
25.04 bug Something isn't working

Comments

@carlcsaposs-canonical
Copy link
Contributor

ops 2.10 introduced a breaking change that excludes breaking relations from Model.relations: #1091

In order to access relation data for the breaking relation, we need to use event.relation—but this doesn't work in deferred ops events, custom ops events, or collect status ops events.

With ops, it is no longer possible to access the breaking relation data in those ops events during the relation-broken Juju event

Example of impact:
mysql-router-k8s is currently written with no deferred events, no custom events during the relation-broken event, and no ops collect status events (because of limitations with status prioritization)—so it is not currently affected
However, if mysql-router-k8s were to use deferred events in the future, or one of the charm libs it depends on were to use a deferred event or a custom event during relation-broken, it would be affected as following:

  1. Juju agent executes charm with relation-broken event
  2. ops emits deferred or custom event
  3. This block of code is executed: https://github.com/canonical/mysql-router-k8s-operator/blob/156da2da8c68c65ae262b2721cc01fb1b974104f/src/abstract_charm.py#L315-L345
            if self._unit_lifecycle.authorized_leader:
                if self._database_requires.is_relation_breaking(event):
                    if self._upgrade.in_progress:
                        logger.warning(
                            "Modifying relations during an upgrade is not supported. The charm may be in a broken, unrecoverable state. Re-deploy the charm"
                        )
                    self._database_provides.delete_all_databags()
                elif (
                    not self._upgrade.in_progress
                    and isinstance(workload_, workload.AuthenticatedWorkload)
                    and workload_.container_ready
                ):
                    self._reconcile_node_port(event=event)
                    self._database_provides.reconcile_users(
                        event=event,
                        router_read_write_endpoint=self._read_write_endpoint,
                        router_read_only_endpoint=self._read_only_endpoint,
                        exposed_read_write_endpoint=self._exposed_read_write_endpoint,
                        exposed_read_only_endpoint=self._exposed_read_only_endpoint,
                        shell=workload_.shell,
                    )
            if workload_.container_ready:
                workload_.reconcile(
                    event=event,
                    tls=self._tls_certificate_saved,
                    unit_name=self.unit.name,
                    exporter_config=self._cos_exporter_config(event),
                    key=self._tls_key,
                    certificate=self._tls_certificate,
                    certificate_authority=self._tls_certificate_authority,
                )

if self._database_requires.is_relation_breaking(event): evaluates to if False and is skipped

                elif (
                    not self._upgrade.in_progress
                    and isinstance(workload_, workload.AuthenticatedWorkload)
                    and workload_.container_ready
                ):

evaluates to elif False because relation data for database_requires is missing (trace: workload_ is not AuthenticatedWorkload because relation is missing)

workload_.reconcile() disables MySQL Router because workload_ is not AuthenticatedWorkload because relation is missing

Current state: MySQL Router is disabled, but relation data in database_provides relations is still populated—mysql-router-k8s charm is giving consuming charms an endpoint and telling them that it should be accessible when Router is disabled & unreachable—this causes a poor UX (consuming applications will probably display incorrect status message) and could cause issues with exponential backoffs (i.e. a delay in service restoration) when MySQL Router is re-enabled
4. ops emits relation broken event
5. If charm encounters issue (uncaught exception in charm code, transient network failure with MySQL Server, etc.) it will fail to execute this code

                if self._database_requires.is_relation_breaking(event):
                    if self._upgrade.in_progress:
                        logger.warning(
                            "Modifying relations during an upgrade is not supported. The charm may be in a broken, unrecoverable state. Re-deploy the charm"
                        )
                    self._database_provides.delete_all_databags()

potentially for an extended period of time—which causes the aforementioned issues with UX & exponential backoff

@carlcsaposs-canonical
Copy link
Contributor Author

for comparison, before this change, the order would've been self._database_provides.delete_all_databags(), then disable the workload

(instead of disable the workload, then self._database_provides.delete_all_databags())

@benhoyt
Copy link
Collaborator

benhoyt commented Sep 27, 2024

Tony is unlikely to get to this this cycle, but we still want to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
25.04 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants