Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DB condition informational + Log sniffing #76

Merged
merged 1 commit into from
Jun 14, 2024
Merged

Make DB condition informational + Log sniffing #76

merged 1 commit into from
Jun 14, 2024

Conversation

rafidka
Copy link
Contributor

@rafidka rafidka commented Jun 14, 2024

Issue #, if available: N/A

Description of changes:

Previously, AirflowDbReachableCondition would result in a container restart if there is a problem reaching the database. While this is the desired result eventually, I am changing it for now to be informational only, i.e. report connection problem, for two reasons:

  1. This is the current behaviour in our internal images, so we want to reduce how much we deviate from it.
  2. More importantly, a restart would need to be a bit more intelligent to avoid unnecessary restarts, which results in a considerable wait time while the container is being replaced.

The re-introduction of restarts will be tracked in Issue #75.

Along with this PR, I also re-introduced the generation of processable logs to capture the DB health metrics in the MWAA service.

Finally, the PR also includes the porting of the log sniffing logic to detect common Airflow problems.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Previously, `AirflowDbReachableCondition` would result in a container
restart if there is a problem reaching the database. While this is the
desired result eventually, I am changing it for now to be informational
only, i.e. report connection problem, for two reasons:

1. This is the current behaviour in our internal images, so we want to
   reduce how much we deviate from it.
2. More importantly, a restart would need to be a bit more intelligent
   to avoid unnecessary restarts, which results in a considerable wait
   time while the container is being replaced.

The re-introduction of restarts will be tracked in Issue #75.

Along with this PR, I also re-introduced the generation of processable
logs to capture the DB health metrics in the MWAA service.

Finally, the PR also includes the porting of the log sniffing logic to
detect common Airflow problems.
@rafidka rafidka merged commit 13bc3e3 into aws:main Jun 14, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants