Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX (Extension) @W-17311922@ Fix pager duty alert subject #151

Merged
merged 1 commit into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions .github/workflows/daily-smoke-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,16 +38,10 @@ jobs:
- name: Report problems
shell: bash
env:
IS_CRITICAL: ${{ contains(join(steps.*.outcome), 'failure') || contains(join(steps.*.outcome), 'skipped') }}
RUN_LINK: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
if [[ ${{ env.IS_CRITICAL }} == true ]]; then
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue seems to be the way we are assigning IS_CRITICAL. There are two ways to fix this: either assign it correctly or altogether remove this. I prefer removing this because I think the reason this was added was for testing. But this isn't greatly helpful since the if condition in line 36 will anyways needs to be altered to test the script. Let me know if you think otherwise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed... but should we remove the || cancelled() from the if statement then on line 36? What does that do?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, there are technically ways that a job can end up in cancelled() due to something that would more accurately be described as a failure. But I can't remember what they are off the top of my head.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, let's err in the side of caution and leave cancelled() as is.

ALERT_SEV="critical"
ALERT_SUMMARY="Daily smoke test failed on ${{ runner.os }}"
else
ALERT_SEV="info"
ALERT_SUMMARY="Daily smoke test succeeded with retries on ${{ runner.os }}"
fi
ALERT_SEV="critical"
ALERT_SUMMARY="Daily smoke test failed on ${{ runner.os }}"

generate_post_data() {
cat <<EOF
Expand Down
14 changes: 2 additions & 12 deletions .github/workflows/production-heartbeat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,23 +118,13 @@ jobs:
if: ${{ failure() || cancelled() }}
shell: bash
env:
# If we're here because steps failed or were skipped, then that's a critical problem. Otherwise it's a normal one.
# We can't use the `failure()` or `cancelled()` convenience methods outside of the `if` condition, hence the
# `contains()` calls.
IS_CRITICAL: ${{ contains(join(steps.*.outcome), 'failure') || contains(join(steps.*.outcome), 'skipped') }}
# A link to this run, so the PagerDuty assignee can quickly get here.
RUN_LINK: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}

run: |

if [[ ${{ env.IS_CRITICAL }} == true ]]; then
ALERT_SEV="critical"
ALERT_SUMMARY="Production heartbeat script failed on ${{ runner.os }}"
else
# Leaving the else part here to help with running end-to-end sanity test with real alerts being created.
ALERT_SEV="info"
ALERT_SUMMARY="Production heartbeat script succeeded with retries on ${{ runner.os }}"
fi
ALERT_SEV="critical"
ALERT_SUMMARY="Production heartbeat script failed on ${{ runner.os }}"
# Define a helper function to create our POST request's data, to sidestep issues with nested quotations.
generate_post_data() {
# This is known as a HereDoc, and it lets us declare multi-line input ending when the specified limit string,
Expand Down
Loading